Data export best practices
Updated
Recommendation
The following methods are the main ways to export data from Collibra Data Intelligence Platform:
Core REST API.
Output Module.
Reporting Insights API.
Exporting functionality in the Collibra UI.
Choose the method that best supports your use case to ensure the best performance and user experience.
Impact
Ensures that your exports provide the right data for your purpose with the appropriate performance characteristics, avoiding timeouts and interference with other running processes.
Best practice recommendations
The Core REST API is best for instantaneous information requests for small amounts of data. The Core REST API provides access to a range of data elements and it is important to pre-determine the specific APIs that to call the data you are seeking to export.
The Output Module is best for extracting data sets greater than 50 and up to 10,000 records per page, with a requirement for multiple repeated calls in a day. It is a lightweight graph query engine exposed through the public API and allows for different output formats, such as JSON, XML, Excel and CSV. The Output Module uses SQL-like filtering capabilities to query most of the
Collibra
entities, such as assets, communities, domains and types through a single API. You should be mindful of the total number of records and assets that are requested when you create the
Table View Configuration
to reduce the risk of a performance bottleneck.
The default value for number of extracted records is 50, but this can be increased.
Use a value of -1 to extract all records for a given filter or View ID, but keep in mind that queries that run on complicated or large amounts of data may be slower than expected.
The best approach is to paginate the results.
You can implement a timeout to break up the execution if the complexity or amount of data is unknown.
Insights Data Access, for cloud customers with an Insights Data Access license, is used for strategic reporting purposes and requires a large data set of 8 key components to be used for analysis, reporting and use cases such as adoption monitoring. The Reporting Data Layer can retrieve vast amounts of data, representing a snapshot in time, without jeopardizing Collibra front-end performance. You can use the Insights widget to show Tableau reports, or any report that can be shown as an iframe, on your Collibra dashboard. It yields 8 data files of the underlying metadata from the Collibra model which can then be used to develop a logical model for any analytic or BI custom solution, including SQL.
The exporting functionality in the UI is best for exporting assets, characteristics and relations in a readable format like Excel or CSV. This data can then be updated and imported back into Collibra. You should first create a global or community/domain level view and include all of the relevant characteristics and relations. This view should be saved and shared with everyone who is using the export/import method for data entry. The exporting functionality only works on the selected view, so fields that exist on the asset but are not in the view are not included in the export file. Hierarchy views cannot be selected for export.
Excel is the preferred format if the purpose of the export operation is to re-import the file with new and updated data.
CSV is the preferred format if the purpose of the export operation is to pass the file to another storage system.
Authentication
Authentication must be used to access the APIs for the above export operations. Available methods include BasicAuth and JWT. JWT is recommended as the more secure method of accessing the API.
Validation Criteria
If you are encountering export errors, you can check the logs in Collibra Console and review whether you are using the appropriate export method as described above. Common errors are also described in the documentation listed below.
Additional Information
For more information see the following resources:
For the Output module, visit the developer portal or download the Hitchhiker's Guide to the Output Module via the product resources page.
See the insights data access diagram.
The Data Maturity Report template may also be useful for exploring and using the 8 output files described in option 3, above.
For advanced users and advanced use cases there are Spring Boot integrations in the Marketplace.
Learn more about Insights Data Access on AWS or GCP. You can find additional resources on the Marketplace.
For more information on the exporting functionality in the DGC UI, see the product documentation.