S

Tuesday, November 8th, 2022 9:38 PM

DBT to Collibra Integration for data lineage

Hello,

Has anyone implemented DBT to Collibra marketplace spring boot application to create data lineage?

Would like to get the feedback.
We have started configuring the spring boot application and are successfully able to create the compiled SQL and just after this process fails. It seems some Collibra parameter issue but actually not able to figure it out.

Any help appreciated!!

31 Messages

2 years ago

Hi @SumitTiwari,

I think you should consider an integration based on OpenLineage which supports DBT (see https://openlineage.io/integration/dbt/) . OpenLineage is meant as a harvester and has a fairly simple API and metamodel that should be easily integratable in Collibra.

Best regards,
Ludovic

21 Messages

2 years ago

Hi @ludovic.janssens,

Thanks for your response.
Does openLineage provide column level lineage?
Can we integrate the lineage to show in Collibra?

38 Messages

2 years ago

@spring-team.collibra.com please review this issue from Sumit here, thanks.

368 Messages

2 years ago

Hi @SumitTiwari,

We want to ask for more details to investigate the error you are facing. Firstly, what error are you facing, and where is it showing up, from the lineage harvester or the spring boot application?
Could you send us a private message with a copy of the logs? However, make sure to hide any sensitive information, such as credentials.

Thanks

21 Messages

@spring-team.collibra.com, I sent a private message. can someone please address that?

368 Messages

2 years ago

Hi @SumitTiwari,

Thanks for the information provided.

Can you please confirm the following:

  • Which version of the DBT integration is being used?
  • Where any modifications made to the DBT integration code?
  • Which version of the Collibra Integration Library being used?
  • Was the Collibra Integration Library installed using the following Maven command?
    mvn org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install-file -Dfile=<path-to-jar-file>

Thanks

21 Messages

2 years ago

Hi Team,

Please see information below.

Which version of the DBT integration is being used?
– dbt api v2
Where any modifications made to the DBT integration code?
– no change to the integration code
Which version of the Collibra Integration Library being used?
– 1.1.2
Was the Collibra Integration Library installed using the following Maven command?
mvn org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install-file -Dfile=

– Using Eclipse, the maven install command was run and missing dependencies were added to the the pom.xml,
currently the spring application run correctly but after the extraction of sqls from the dbt api, it fails once it tries to find assets in collibra.

Also, would like to clarify on what information to pass under below parameters in application.properties file.

dbt.collibra-database-domain
dbt.collibra-system-asset-name

368 Messages

2 years ago

Hi @SumitTiwari,

Thanks for the update.

Can you please try the following:

  1. Ensure that Maven is installed and configured on your machine
  2. Download the latest version of the Collibra Integration Library and documentation from here – v1.1.10:
  1. Install the Integration Library as explained in section “Installing the Library JAR File” of the documentation downloaded from point 2.

  2. Since there were some fixes in the latest version of the DBT to Collibra Spring Boot integration, download v1.1.3 and its documentation from here:

  1. Configure the DBT integration properties according to the “Configuration” section.
  2. Try to test the integration.

Regarding the following two domains, they should be set to the IDs of the Collibra domains where the assets created by this integration would be imported.

  • dbt.collibra-database-domain
  • dbt.collibra-system-asset-name

Thanks

21 Messages

2 years ago

Thanks a lot!!
The new version was very quick to configure and it finished successfully with other issue though.

The package ran for around an hour and created one system under domain(as per property file). It failed with below error message :

ERROR com.collibra.dbt.component.DBTLineageExtraction - DBT project doesn’t provide solid metadata Account id:3456 Project id:4545 Detailed Error:
{“message”:“An unexpected error occurred during execution.”,“details”:“java.io.IOException: Error writing to server”,“cause”:“Error writing to server”}

The error message is not very clear to debug.
Is there a way to enable detailed log to see exact error?

Would like to have quick call to get this resolved. Please let me know if we can.

Thanks
Sumit

368 Messages

2 years ago

Hi @SumitTiwari,

Thanks for the update.

The issue appears to be that no metadata is being returned which might be due to the current configuration of lack of configuration on DBT. Therefore, it would be ideal to check the following from the DBT Cloud instance:

  1. From the “Deploy” tab, in the “Environments” section they should be configured for both deployment and development. If these are not set, the Develop IDE would not be reachable.
  2. From the “Develop” tab, can you please confirm that the project being used is loading correctly. Also, it might be ideal to try running a command to confirm that everything is working fine.

Additionally, regarding the exception, can you please confirm that the account being used has read and write access to the directory specified as the value of property dbt.lineage-path and its parent?

For the more information from DBT’s side, you can refer to DBT job logs. While for more information from the DBT to Collibra Spring Boot integration side, you can enable DEBUG or TRACE logs from the src/main/resources/log4j2.xml file .

Thanks

21 Messages

2 years ago

Hi @spring-team.collibra.com

Thanks for the information!!

We have triggered the process by limiting it to run for only one project. It finished successfully, as you can see the log below.

INFO com.collibra.dbt.controller.EntryPointController - Extract SQL for provided projects*
INFO com.collibra.dbt.component.DBTLineageExtraction - Processing Account id:xxxx Project id:xxxx*
INFO com.collibra.dbt.component.DBTLineageExtraction - ETL operations successfully has been completed*

When checking the Collibra’s domain for assets, I can only see the System asstes created as per dbt.collibra-system-asset-name parameter. Remaining assets(Database, Schema, Table, Column) did not land in Collibra.

Also, compiled sql files got created on local machine but lineage.json did not create(provided local machine’s path for dbt.lineage-path parameter).

Please let me know if there is anything we are missing.

Thanks
Sumit

368 Messages

2 years ago

Hi @SumitTiwari,

The integration uses the /graphql DBT endpoint to fetch the databases/tables/columns. Therefore, if no errors are marked in the integration logs, we recommend manually sending a request to this endpoint to view whether this information is being retrieved. These steps will help us better understand why the assets aren’t being created.

The following script is a request to the /graphql request using CURL. However, feel free to import this into tools such as postman.

 curl --location --request POST 'https://metadata.cloud.getdbt.com/graphql' \
--header 'Authorization: BEARER_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{"query":"{\n  models(jobId: COLLIBRA_JOB_ID) {\n    compiledSql\n    uniqueId\n    name\n    description\n    schema\n    database\n    projectId\n    meta\n    tests{\n        uniqueId\n        name\n        description\n        columnName\n        rawCode\n        dependsOn\n    }\n    columns{\n      name\n      type\n      description\n      meta\n      \n    }\n  }\n}","variables":{}}

Please replace the BEARER_TOKEN with your personal access token, containing the following format ‘Bearer X’ and replace the COLLIBRA_JOB_ID with your DBT project ID.

You can get the job ID using the following steps:

  • Access the Deploy> Jobs page on DBT Cloud
  • Select your job
  • View the URL on your browser and it should end with /jobs/{ID}
  • Copy the ID

Thanks

21 Messages

2 years ago

Hi @spring-team.collibra.com,

We executed the /graphql request and getting the desired output (Database, table, column in JSON format).

  • Does this request create any file physically?
  • What value should we pass to dbt.lineage-path parameter? Is this path going to be local? We do not see any lineage.json file.
  • what type of Domain should be provided for dbt.collibra-database-domain parameter? we have created domain type as Physical Data Dictionary for this parameter and passed it’s ID.

Can you please let me know if there is a way we can connect to resolve this issue?

Thanks
Sumit

368 Messages

2 years ago

Hi @SumitTiwari,

Thanks for checking.

Regarding the above questions:

  • Point 1 -> The DBT integration creates SQL files that would be ingested by the Collibra Lineage Harvester.
  • Point 2 -> dbt.lineage-path should be an existing local path that is used by the DBT Spring Boot integration to write the SQL files and for the Collibra Lineage Harvester to read the SQL files and ingest them. They would not be JSON files.
  • Point 3 -> Correct, yes, the dbt.collibra-database-domain should be a Physical Data Dictionary domain.

Can you please also add the following property to the application.properties files or if it exists, ensure that its value is set to true ? Thanks

dbt.create-database-assets-enabled=true

21 Messages

2 years ago

Hi @spring-team.collibra.com,

Thanks for the reply.

Yes, we have dbt.create-database-assets-enabled as true.

Here is the description I see in your documentation for dbt.lineage-path parameter.

“The directory where the generated lineage.json file should be written to and
used by the Collibra Lineage Harvester.”

Thanks
Sumit

368 Messages

2 years ago

Hi @SumitTiwari,

Thanks for pointing this out.

We will update the documentation accordingly.

The DBT to Collibra Spring Boot integration creates SQL files and the Collibra Lineage Harvester ingests these files to generate the technical lineage. In case you still encountering an issue with the ingestion of the database/schema/table/column assets, would it be possible to provide us a sample graphql call response as a private message?

This will help us investigate what the issue might be. Thanks

Loading...