Workflow bulk operations best practices

Updated 

Recommendation

When performing bulk operations in a workflow, design for the expected asset count to be processed.

Impact

Long running workflows that operate in bulk can have performance implications that affect a variety of other processes and user activity in Collibra.

  • Can lead to high CPU consumption that impacts the end-user experience of page-load times.

  • Can cause resource starvation for newly-initiated processes.

  • As a cloud customer, you may face network latency issues from the heavy traffic of bulk operations.

  • Can reduce workflow efficiency and enterprise performance.

Recommended action

  1. Use Java APIs within workflows that execute in a job.

  2. Use the respective Java APIs that are designed for bulk activity and processing, these are the Import API and OutputModuleAPI.

  3. Execute bulk processing workflows outside of business hours.

  4. Use the "Asynchronous" in workflow tasks that require bulk processing logic.

  5. Use scripted batching logic so as not to overwhelm an individual API and/or process sets of data all at once.

  6. Do not execute multiple bulk workflow processes at once; segment execution outside of business hours and/or throughout the day.

  7. Do not perform bulk operations in a workflow that is intended to be state/lifecycle oriented.

Validation Criteria

  • Review the workflow process definintion.

  • Log info statements to get the asset count during bulk operations.

Additional information

For more information, go to the following resources (requires Collibra login):