12 Messages
Google Cloud Dataflow to Collibra Integration
Hi All,
I am trying to set up Google Cloud Dataflow to Collibra Integration application in my laptop.
I am able to start the spring boot application successfully but when i hit the service i get below error
{
“message”: “Internal error during execution.”,
“exceptionMessage”: “302 Found from GET https://console.cloud.google.com/dataflow/jobs; nested exception is org.springframework.web.reactive.function.UnsupportedMediaTypeException: Content type ‘application/binary’ not supported for bodyType=com.collibra.marketplace.gcp.dataflow.pojo.ListJobsResponse”
}
Could you please let me know what properties to be set for below fields
google.cloud.platform.base-url=
google.cloud.platform.scopes[0]=
Thanks & Regards,
Naveen
Community_Alex
701 Messages
•
18.5K Points
2 years ago
@spring-team.collibra.com, is this something you could help @naveen.jayaram with or no?
0
0
springboot_team
368 Messages
2 years ago
Hi @naveen.jayaram,
Thanks for trying the GCP Dataflow to Collibra integration.
Can you please advise regarding the below:
For the specified properties, these need to be set as follows:
https://dataflow.googleapis.com/v1b3/projects/<myproject>
https://www.googleapis.com/auth/cloud-platform
More information is available in the GCP Dataflow to Collibra integration documentation.
Thanks
1
0
springboot_team
368 Messages
2 years ago
Hi @naveen.jayaram,
Thanks for the information provided.
It seems that this error is related to the GCP Dataflow endpoint name not being resolved to its IP address (DNS issue). Can you please try again or check whether there might be any network-related issues? Thanks
0
0
naveenjayaram
12 Messages
2 years ago
Thanks,
After adding location it worked fine
https://dataflow.googleapis.com/v1b3/projects// locations/
Is there any example for the dataflow job … we have deployed a sample dataflow Hello world but looks like we are getting a error as it expects Dataset details
It would be handy if you have some dataflow examples for which this microservice works
// Retrieves the table name from Display Data
DisplayData datasetNameDisplayData = jobDetails.getPipelineDescription().getDisplayDataByKey(“datasetName”);
if (datasetNameDisplayData == null) {
LOGGER.info(“No lineage will be added for Job having name ‘{}’”, jobDetails.getName());
return false;
}
final String tableName = datasetNameDisplayData.getStrValue();
Thanks & Regards,
Naveen
0
0
springboot_team
368 Messages
2 years ago
Hi @naveen.jayaram,
Thanks for the update.
Regarding GCP Dataflow, the job type would be a Streaming one, where under the job Pipeline Options, there should be the dataset Name specified. Unfortunately we do not have a sample Job available. Thanks
Type:

Dataset Name:

0
0
naveenjayaram
12 Messages
2 years ago
Thanks @spring-team.collibra.com.
I was able to export the metadata of dataflow to collibra.
But i don’t see any transformation details exported as part of GCP Dataflow Steps .
In the lineage can we show the transformation logic as well ?
Does it support column level lineage.
Thanks & Regards,
Naveen
0
0
springboot_team
368 Messages
2 years ago
Hi @naveen.jayaram,
The GCP Dataflow integration imports both normal assets and relations, and technical linage through the use of the Collibra Lineage Harvester.
For the normal assets and relations part, a metamodel (Diagram view) similar to the one depicted in section “Sample Relationship Diagram” of the documentation should be made available.
Additionally, for technical lineage, in case there is the lineage information in the Job data, technical lineage would be imported. To be able to view this type of lineage, from the Collibra instance, you should open a Column asset and then select the “Technical Lineage” tab from the left-hand side menu.
To be able to confirm whether there was lineage that was processed by this integration, can you please forward the integration logs as a private message?
Also, you can check whether there is a log stating that “No lineage will be added for Job having name” which would indicate that no technical lineage would be imported. Thanks
0
0
springboot_team
368 Messages
2 years ago
Hi @naveen.jayaram,
Thanks for the information provided.
Can you please provide a fairly detailed explanation (and maybe including examples) of what is expected to be able to open an internal request?
Since this does not have an ETA and the Java source code is provided, you can also do the code modifications as required. Thanks
0
0
naveenjayaram
12 Messages
2 years ago
Hi @spring-team.collibra.com
Thanks for the information.
As part of our poc we took a example of dataflow provided here https://github.com/GoogleCloudPlatform/professional-services/tree/main/examples/dataflow-bigquery-transpose
and updated code to the parameters of dataflow example
like below
DisplayData datasetNameDisplayData = jobDetails.getPipelineDescription().getDisplayDataByKey(“inputTableSpec”);
Instead of datasetname
After running the dataflow integration library we got the lineage json, i have uploaded
I was expecting the library could parse each transformation step in the dataflow and in the source code of the lineage it could show up. But looks like it just extracts the dataflow step name and adds it in the source code.
Thanks & Regards,
Naveen
0
0
springboot_team
368 Messages
2 years ago
Hi @naveen.jayaram,
Thanks for the information provided.
0
0