21 Messages
DBT to Collibra Integration for data lineage
Hello,
Has anyone implemented DBT to Collibra marketplace spring boot application to create data lineage?
Would like to get the feedback.
We have started configuring the spring boot application and are successfully able to create the compiled SQL and just after this process fails. It seems some Collibra parameter issue but actually not able to figure it out.
Any help appreciated!!
ludovicjanssens
31 Messages
2 years ago
Hi @SumitTiwari,
I think you should consider an integration based on OpenLineage which supports DBT (see https://openlineage.io/integration/dbt/) . OpenLineage is meant as a harvester and has a fairly simple API and metamodel that should be easily integratable in Collibra.
Best regards,
Ludovic
0
0
sumittiwari
21 Messages
2 years ago
Hi @ludovic.janssens,
Thanks for your response.
Does openLineage provide column level lineage?
Can we integrate the lineage to show in Collibra?
0
0
paulotaylor
38 Messages
2 years ago
@spring-team.collibra.com please review this issue from Sumit here, thanks.
0
0
springboot_team
368 Messages
2 years ago
Hi @SumitTiwari,
We want to ask for more details to investigate the error you are facing. Firstly, what error are you facing, and where is it showing up, from the lineage harvester or the spring boot application?
Could you send us a private message with a copy of the logs? However, make sure to hide any sensitive information, such as credentials.
Thanks
1
springboot_team
368 Messages
2 years ago
Hi @SumitTiwari,
Thanks for the information provided.
Can you please confirm the following:
mvn org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install-file -Dfile=<path-to-jar-file>
Thanks
0
0
sumittiwari
21 Messages
2 years ago
Hi Team,
Please see information below.
Which version of the DBT integration is being used?
– dbt api v2
Where any modifications made to the DBT integration code?
– no change to the integration code
Which version of the Collibra Integration Library being used?
– 1.1.2
Was the Collibra Integration Library installed using the following Maven command?
mvn org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install-file -Dfile=
– Using Eclipse, the maven install command was run and missing dependencies were added to the the pom.xml,
currently the spring application run correctly but after the extraction of sqls from the dbt api, it fails once it tries to find assets in collibra.
Also, would like to clarify on what information to pass under below parameters in application.properties file.
dbt.collibra-database-domain
dbt.collibra-system-asset-name
0
0
springboot_team
368 Messages
2 years ago
Hi @SumitTiwari,
Thanks for the update.
Can you please try the following:
Install the Integration Library as explained in section “Installing the Library JAR File” of the documentation downloaded from point 2.
Since there were some fixes in the latest version of the DBT to Collibra Spring Boot integration, download v1.1.3 and its documentation from here:
Regarding the following two domains, they should be set to the IDs of the Collibra domains where the assets created by this integration would be imported.
Thanks
0
0
sumittiwari
21 Messages
2 years ago
Thanks a lot!!
The new version was very quick to configure and it finished successfully with other issue though.
The package ran for around an hour and created one system under domain(as per property file). It failed with below error message :
ERROR com.collibra.dbt.component.DBTLineageExtraction - DBT project doesn’t provide solid metadata Account id:3456 Project id:4545 Detailed Error:
{“message”:“An unexpected error occurred during execution.”,“details”:“java.io.IOException: Error writing to server”,“cause”:“Error writing to server”}
The error message is not very clear to debug.
Is there a way to enable detailed log to see exact error?
Would like to have quick call to get this resolved. Please let me know if we can.
Thanks
Sumit
0
0
springboot_team
368 Messages
2 years ago
Hi @SumitTiwari,
Thanks for the update.
The issue appears to be that no metadata is being returned which might be due to the current configuration of lack of configuration on DBT. Therefore, it would be ideal to check the following from the DBT Cloud instance:
Additionally, regarding the exception, can you please confirm that the account being used has read and write access to the directory specified as the value of property
dbt.lineage-path
and its parent?For the more information from DBT’s side, you can refer to DBT job logs. While for more information from the DBT to Collibra Spring Boot integration side, you can enable
DEBUG
orTRACE
logs from the src/main/resources/log4j2.xml file .Thanks
0
0
sumittiwari
21 Messages
2 years ago
Hi @spring-team.collibra.com
Thanks for the information!!
We have triggered the process by limiting it to run for only one project. It finished successfully, as you can see the log below.
INFO com.collibra.dbt.controller.EntryPointController - Extract SQL for provided projects*
INFO com.collibra.dbt.component.DBTLineageExtraction - Processing Account id:xxxx Project id:xxxx*
INFO com.collibra.dbt.component.DBTLineageExtraction - ETL operations successfully has been completed*
When checking the Collibra’s domain for assets, I can only see the System asstes created as per dbt.collibra-system-asset-name parameter. Remaining assets(Database, Schema, Table, Column) did not land in Collibra.
Also, compiled sql files got created on local machine but lineage.json did not create(provided local machine’s path for dbt.lineage-path parameter).
Please let me know if there is anything we are missing.
Thanks
Sumit
0
0
springboot_team
368 Messages
2 years ago
Hi @SumitTiwari,
The integration uses the /graphql DBT endpoint to fetch the databases/tables/columns. Therefore, if no errors are marked in the integration logs, we recommend manually sending a request to this endpoint to view whether this information is being retrieved. These steps will help us better understand why the assets aren’t being created.
The following script is a request to the /graphql request using CURL. However, feel free to import this into tools such as postman.
Please replace the BEARER_TOKEN with your personal access token, containing the following format ‘Bearer X’ and replace the COLLIBRA_JOB_ID with your DBT project ID.
You can get the job ID using the following steps:
Thanks
0
0
sumittiwari
21 Messages
2 years ago
Hi @spring-team.collibra.com,
We executed the /graphql request and getting the desired output (Database, table, column in JSON format).
Can you please let me know if there is a way we can connect to resolve this issue?
Thanks
Sumit
0
0
springboot_team
368 Messages
2 years ago
Hi @SumitTiwari,
Thanks for checking.
Regarding the above questions:
dbt.lineage-path
should be an existing local path that is used by the DBT Spring Boot integration to write the SQL files and for the Collibra Lineage Harvester to read the SQL files and ingest them. They would not be JSON files.dbt.collibra-database-domain
should be aPhysical Data Dictionary
domain.Can you please also add the following property to the application.properties files or if it exists, ensure that its value is set to
true
? Thanks0
0
sumittiwari
21 Messages
2 years ago
Hi @spring-team.collibra.com,
Thanks for the reply.
Yes, we have dbt.create-database-assets-enabled as true.
Here is the description I see in your documentation for dbt.lineage-path parameter.
“The directory where the generated lineage.json file should be written to and
used by the Collibra Lineage Harvester.”
Thanks
Sumit
0
0
springboot_team
368 Messages
2 years ago
Hi @SumitTiwari,
Thanks for pointing this out.
We will update the documentation accordingly.
The DBT to Collibra Spring Boot integration creates SQL files and the Collibra Lineage Harvester ingests these files to generate the technical lineage. In case you still encountering an issue with the ingestion of the database/schema/table/column assets, would it be possible to provide us a sample graphql call response as a private message?
This will help us investigate what the issue might be. Thanks
0
0