FU

Tuesday, June 8th, 2021 2:18 PM

Azure Data Lake template

I am trying to check if there are any users using the Azure Data Lake Template from the Marketplace.

The Azure Data Lake platform has a bunch of sub-directories within its parent level directory and has multiple .csv files that requires a recursive discovery of data files for processing and ingestion. The template needs to go thru all levels of sub-directory structure and find the .csv files and act on it.

Appreciate if any of you used this template before and I would like to understand the challenges you are facing. Maybe, we can be able to help each other.

Thanks & Regards,
Andy Vaidya

1.2K Messages

4 years ago

I don’t know if you guys are working together, but similar question 24h appart. Connecting Collibra Cloud to Azure Blob storage - :chart_with_upwards_trend: Use Cases - The Data Citizens Community

Anyways, the answer holds again:

  1. Collibra has not developed something similar to their “Amazon S3” system integration, and the foundation on which it relies (AWS Glue) does not exist on Azure
  2. Collibra is not a good system to document millions of small CSV files, but Azure Purview is: Azure Purview for Unified Data Governance | Microsoft Azure

262 Messages

Is it recommend to have multiple reference systems for metadata lookup? In this case - Azure Purview, Collibra Catalog?

40 Messages

4 years ago

Hi Anand,

Most of the drivers for CSV, Parquet, AVRO, … work with Azure data lake.

So it’s based on the types you store rather than on Azure Data Lake as one thing.

I hope that this makes sense

@arthur.burkhardt I can imagine you could use Azure data fabric to look into the Azure data lake and then do an API call to Collibra to create the jobserver jobs based on the file type and address. I have never tried this for Azure :no_mouth:

262 Messages

3 years ago

Hi Anand,

I couldn’t find a template with the name Azure Data Lake.

Can you please share the link of the same?

262 Messages

3 years ago

Hi Stijn,

So, it is recommended to use a parquet driver to document the parquet type files on adls gen 2?

36 Messages

3 years ago

Hi everyone,

seems that there is now an Azure Purview integration available in the marketplace, which is based on Spring Boot: https://marketplace.collibra.com/listings/azure-purview-to-collibra-integration/

It seems to be possible to import the ADLS metadata from Azure with this integration, but I haven’t taken a closer look at it yet.

262 Messages

Can I get more info. on the “Spring Boot” integrations? What are they actually? A code package that needs to be installed on a machine (in the end acting like a Job server?)?

Anyone used these already in their productive environments?

3 years ago

@noor.shaik ,
Hi Noor ,

Did you used the Marketplace spring boot integration for Azure purview to Collibra .Did it worked for you .

Could you please help , I am getting below error :

server.ssl.enabled=false

2022-05-04 10:53:12,728 [RMI TCP Connection(3)-127.0.0.1] INFO org.springframework.web.servlet.DispatcherServlet - Completed initialization in 1 ms
2022-05-04 10:53:16,347 [http-nio-8081-exec-1] WARN org.apache.catalina.util.SessionIdGeneratorBase - Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [225] milliseconds.
2022-05-04 10:53:16,372 [http-nio-8081-exec-1] INFO com.collibra.marketplace.azure.purview.controller.EntryPointController - Integration triggered via API request for full sync
2022-05-04 10:53:16,373 [http-nio-8081-exec-1] INFO com.collibra.marketplace.azure.purview.FullSyncProcessor - Started Purview full sync
2022-05-04 10:53:16,373 [http-nio-8081-exec-1] INFO com.collibra.marketplace.azure.purview.component.CollibraCoreAPIComponent - Removing all mappings with external system id azure_purview …
org.springframework.web.client.ResourceAccessException: I/O error on POST request for : Connection timed out: connect; nested exception is java.net.ConnectException: Connection timed out: connect
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:785)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:711)
at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:602)
at com.collibra.marketplace.azure.purview.component.CollibraCoreAPIComponent.removeMappings(CollibraCoreAPIComponent.java:171)
at com.collibra.marketplace.azure.purview.FullSyncProcessor.start(FullSyncProcessor.java:461)

3 years ago

Hi Team,

Since I have posted for the above issue and I have found a solution to it so posting as it can help other in community :

This is resolved and as you said it was to do with the firewall so I used a proxy and it help me in accessing the collibra env through the spring boot integration.

Let me know if anyone needs more info on this.

Warm Regards,
Rohit

Loading...