I

Wednesday, May 4th, 2022 5:43 AM

Azure Purview to Collibra Spring Boot Integration Build error

Hi ,

I am trying to use the Azure purview to Collibra Spring boot Integration. I have done the set up in eclipse with springboot framework and trying to trigger the integration with the below url through postman :

POST : http://localhost:8081/sync

But it is throwing me error mentioned below , I am kind of stuck and not sure why it is happening .

Any suggestion or leads here :
I have set the property in the application.properties
server.ssl.enabled=false

ERROR MESSAGE :

2022-05-04 10:53:12,728 [RMI TCP Connection(3)-127.0.0.1] INFO org.springframework.web.servlet.DispatcherServlet - Completed initialization in 1 ms
2022-05-04 10:53:16,347 [http-nio-8081-exec-1] WARN org.apache.catalina.util.SessionIdGeneratorBase - Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [225] milliseconds.
2022-05-04 10:53:16,372 [http-nio-8081-exec-1] INFO com.collibra.marketplace.azure.purview.controller.EntryPointController - Integration triggered via API request for full sync
2022-05-04 10:53:16,373 [http-nio-8081-exec-1] INFO com.collibra.marketplace.azure.purview.FullSyncProcessor - Started Purview full sync
2022-05-04 10:53:16,373 [http-nio-8081-exec-1] INFO com.collibra.marketplace.azure.purview.component.CollibraCoreAPIComponent - Removing all mappings with external system id azure_purview …
org.springframework.web.client.ResourceAccessException: I/O error on POST request for : Connection timed out: connect; nested exception is java.net.ConnectException: Connection timed out: connect
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:785)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:711)
at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:602)
at com.collibra.marketplace.azure.purview.component.CollibraCoreAPIComponent.removeMappings(CollibraCoreAPIComponent.java:171)
at com.collibra.marketplace.azure.purview.FullSyncProcessor.start(FullSyncProcessor.java:461)

3 years ago

@spring-team.collibra.com Could you please help on above issue.

368 Messages

3 years ago

Hi Rohit,

The first API request to Collibra is timing out. This may be happening due to:

  • An incorrect Collibra URL in the application.properties file (if you’re using Collibra Cloud, the URL must be in this format: https://your-domain.collibra.com)
  • A firewall is blocking requests from the server hosting the integration to Collibra

27 Messages

I am trying to use the Azure purview to Collibra Spring boot Integration. I have done the set up in eclipse with springboot framework and trying to trigger the integration with the below url through postman :

POST : http://localhost:8081/sync

But it is throwing me error mentioned below , I am kind of stuck and not sure why it is happening .

Any suggestion or leads here :
I have set the property in the application.properties
server.ssl.enabled=false

I am not getting any error but recieving a 401 unauthorized status …whereas i have provided all the details for authorization

3 years ago

Hi Team @spring-team.collibra.com

This is resolved and as you said it was to do with the firewall so I used a prxoy and it help me in accessing the collibra env through the spring boot integration.

I have a follow up question from azure purview prospective any lead will be much appreciated :

purview.rest.api.host=
purview.rest.api.port=
purview.rest.api.base.path=/api/atlas/v2

Where from I will get the value of these properties ?
secondly :

Do we need to create these relation if so what asset type should be on Head ?
collibra.relation.adf.pipeline.activity= Must be provided if ADF entities are to be synced.

Same for these Also :

collibra.relation.adf.pipeline.activity=
collibra.relation.adf.activity.operation=
collibra.relation.adf.input.dataset.activity=
collibra.relation.adf.output.dataset.activity=
collibra.relation.adf.input.dataset.operation=
collibra.relation.adf.output.dataset.operation="

This will be much helpful for me.

368 Messages

Hi Rohit,

purview.rest.api.host needs to be set to: {your-domain}.catalog.purview.azure.com (do not include “https://”).
purview.rest.api.port needs to be set to 443


Regarding ADF relation types, you need to first set the following asset types so that the relation types match these:

  • collibra.asset.adf.pipeline
  • collibra.asset.adf.activity
  • collibra.asset.adf.operation

So, for example, the relation type collibra.relation.adf.pipeline.activity must be applicable on the asset types you set in collibra.asset.adf.pipeline and collibra.asset.adf.activity .

Also, you need to add either :SOURCE or :TARGET to the relation type ID. In this example, add :SOURCE if ADF Pipeline is the Head asset type, otherwise add :TARGET if ADF Pipeline is the Tail asset type. So, the general rule is that for every relation property collibra.relation.adf.X.Y , if the asset type X is the Head asset type add :SOURCE , otherwise if X is the Tail add :TARGET .

Here’s a full example:
You create custom asset types for ADF Pipeline and ADF Activity and set their IDs in the asset properties:

  • collibra.asset.adf.pipeline : 80f1b131-435c-4c96-b8c9-0176d8716521
  • collibra.asset.adf.pipeline : e2c77db7-6b64-44b1-9b4c-e2c81fbd0ca3

You also create a custom relation type “ ADF Pipeline contains/is contained in ADF Activity ” and set its ID in the relation property. You add :SOURCE because ADF Pipeline is in the Head of the relation:

  • collibra.relation.adf.pipeline.activity: 71f1ec59-17f1-440e-a9f9-763db6b71cc8:SOURCE

Re. these 4 relation types that include an input or output dataset:

  • collibra.relation.adf.input.dataset.activity
  • collibra.relation.adf.output.dataset.activity
  • collibra.relation.adf.input.dataset.operation
  • collibra.relation.adf.output.dataset.operation

Note that there aren’t the properties collibra.asset.adf.input.dataset and collibra.asset.adf.output.dataset for you to set these asset types. This is because the input and output dataset assets can represent any type of asset imported by this integration. Therefore, we suggest that for the 4 relation types above, you use the generic asset type Asset to represent the input and output dataset. So, for example, you can create a custom relation “Asset is input to/has input ADF Activity” for collibra.relation.adf.input.dataset.activity .

3 years ago

@spring-team.collibra.com
Hi Team ,

Thanks for the details I am still not able to connect to Azure using the Spring boot application .

Domain : Details I am getting from the this link for my login “https://portal.azure.com/#settings/directory” hopefully I am getting the correct domain .
Same value I am passing in the below property : “purview.rest.api.host” in application properties .

After doing all this and triggering the application I am getting below error which is in the “AzurePurviewSearchComponent.java” for making the
request POST,GET to Azure rest api :

one of the URL will be : https://{purview.rest.api.host}:443/api/atlas/v2/entity/bulk?

INFO com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - Preparing and posting to Purview advanced search for AtlasGlossaryTerm starting at 0 and limit to 10 with:
{filter={and=[{entityType=AtlasGlossaryTerm, includeSubTypes=false}]}, keywords=, offset=0, limit=10}
2022-05-05 14:08:47,759 [http-nio-8080-exec-1] ERROR com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - 503 Service Unavailable: “<!doctype html>”
2022-05-05 14:08:48,763 [http-nio-8080-exec-1] INFO com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - Preparing and posting to Purview advanced search for AtlasGlossaryTerm starting at 0 and limit to 10 with:
{filter={and=[{entityType=AtlasGlossaryTerm, includeSubTypes=false}]}, keywords=
, offset=0, limit=10}

Any suggestion why is it so ? Do I need to do any changes on the Azure or is this a firewall issue but I highly doubt this being a firewall issue .

Any lead on this , will be great help for sure .

Regards,
Rohit

368 Messages

Hi Rohit,

First of all, are you able to access Azure Purview from a browser on the same machine where the integration is deployed? If not, then it is a firewall issue.

Secondly, is the value of purview.rest.api.host exactly in this format?: {your-domain}.catalog.purview.azure.com ? If not, then the issue is with the host name.

Finally, can you try replicating the same API request manually (via Postman)? Is the same response being returned? Note that you need to add an Authorization header with a Bearer token obtained via a login request to https://login.microsoftonline.com.

3 years ago

Hi @spring-team.collibra.com

Yes Azure purview is accessible from the local browser.
Yes I have used the same format for the but still getting the error ,

format 1

purview.rest.api.host:.*.com.catalog.purview.azure.com
error1 : 503 same error

format 2

purview.rest.api.host=..com.catalog.purview.azure.com
error 2: 503 same error.

Format 3
purview.rest.api.host:{..com}.catalog.purview.azure.com
error 3 : The variable is not having enough value to expand.

Format 4
purview.rest.api.host={..com}.catalog.purview.azure.com
error 4 : The variable is not having enough value to expand.

I have tried in the [post man also with the individual api and getting same error.

I have tried to capture the screen shot so that it is easy to convey.

I am looking for your support.


Regards,
Rohit



368 Messages

Hi Rohit,

I was assuming that your Azure Purview host name ended with “catalog.purview.azure.com”, but it looks like this is not the case.

So, can you try not including “catalog.purview.azure.com” in the host name, and instead set the host name up until “.com”?

3 years ago

@spring-team.collibra.com

Hi Team ,

I think the correct hostname for me will be : *****.azure.com

I am trying through the Postman directly and getting MissingApiVersion error when passwed the apiversion in the parameter then I am getting “MissingSubscription”

Have we seen these before looks like something is missing.

could you please suggest.

Regards,
Rohit

3 years ago

@spring-team.collibra.com

Hi Team,

Any help here or any one seen this error issue before.

Regards,
Rohit

368 Messages

Hi Rohit,

The management.azure.com host is used for Azure Resource Manager APIs, not for Purview REST API calls.

For Purview REST API calls, you need to use the host name of your Purview instance, which should end with purview.azure.com or catalog.purview.azure.com, as shown in the description of the Endpoint parameter here: “The catalog endpoint of your Purview account. Example: https://{accountName}.purview.azure.com”

3 years ago

Hi @spring-team.collibra.com ,

This was really great hep .

Do we have any document which briefs of the changes that we need to do in the application (Client Id ) for api permission that we need to do for this spring boot integration. Since in the available document I dont see anything w.r.t the Azure side changes for the api permissions.

Regards,
Rohit Chandra

368 Messages

Hi Rohit,

These are the only two types of Purview API operations performed by the Integration:

  1. POST /api/atlas/v2/search/advanced
  2. GET /api/atlas/v2/entity/bulk?guid=&guid=&…

The first operation searches for Purview entities satisfying the provided query in the request body. The second operation retrieves details of entities identified by their GUIDs in query parameters. So, both operations only require READ access on Purview entities.

Therefore, a Data Reader role is sufficient for the integration.

2 years ago

@spring-team.collibra.com

Hi Team ,

I am getting below error when I am making the GET call to a particalur entity type :

GET Call
https://:443/catalog/api/atlas/v2/entity/bulk?guid=50e4261b-7a24-

Entity Type : azure_synapse_dedicated_sql_db

Error Message : {
“requestId”: “7b44565d-8b86-4a98-bc03-204be6470d5c”,
“errorCode”: “RequestTimeout”,
“errorMessage”: “Request timed out. This could be a transient issue and you may re-run the operation. If it fails again continuously, contact customer support.”
}

Could you help me understand why it is happening and how can it be corrected ?

2 years ago

Please note : I am getting same error thorugh the integration code as well as through postman also.

368 Messages

2 years ago

Hi @rohit.chandra.1,

The issue you are encountering appears to be a known issue with the Purview to Collibra integration which should be resolved in the latest release – v1.2.2.

Therefore, can you download and try using the latest version to check whether this issue is resolved please? Thanks

2 years ago

Hi @spring-team.collibra.com ,

Thanks for your reply. I have already done a lot of customization and currently using 1.2.1 version of the code. I think this issue is in the “Azurepurviewsearchcomponent.java” file so could you help me with the line numbers of the code which is fixing the issue.

It would be great help indeed.

Regards,
Rohit Chandra

Hi @rohit.chandra.1,

All the changes to fix this issue are in AzurePurviewSearchComponent.java. One other small change is the addition of a new application property azure.token.refresh.seconds.

Please find attached the source code of AzurePurviewSearchComponent.java for v1.2.1 and v.1.2.2. You can use a text compare tool to view all the changes.

AzurePurviewSearchComponent_v1.2.1.txt (12.6 KB)

AzurePurviewSearchComponent_v1.2.2.txt (15.6 KB)

2 years ago

@spring-team.collibra.com

Hi Team ,

I have done the changes but it is still failing in the get call.

Any help on this.

Regards,
Rohit Chandra

2 years ago

Hi @spring-team.collibra.com & @james.scicluna ,

I have done the needful changes but the code is still failing at the below line .
Code :



Eclipse Local Snaps :
2022-11-04 14:57:26,634 [http-nio-8443-exec-1] INFO com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - entityResults 3.2 1 :
2022-11-04 14:57:26,693 [http-nio-8443-exec-1] INFO com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - entityResults 6
2022-11-04 14:59:02,010 [http-nio-8443-exec-1] ERROR com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - Request https://.com:443/catalog/api/atlas/v2/entity/bulk?guid=&minExtInfo=false failed with a 408 Request Timeout exception
2022-11-04 15:00:38,454 [http-nio-8443-exec-1] ERROR com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - Request https://.com:443/catalog/api/atlas/v2/entity/bulk?guid=&minExtInfo=false failed with a 408 Request Timeout exception
2022-11-04 15:02:14,755 [http-nio-8443-exec-1] ERROR com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - Request https://.com:443/catalog/api/atlas/v2/entity/bulk?guid=&minExtInfo=false failed with a 408 Request Timeout exception
2022-11-04 15:03:51,251 [http-nio-8443-exec-1] ERROR com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - Request https://.com:443/catalog/api/atlas/v2/entity/bulk?guid=&minExtInfo=false failed with a 408 Request Timeout exception
2022-11-04 15:05:27,610 [http-nio-8443-exec-1] ERROR com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - Request https://.com:443/catalog/api/atlas/v2/entity/bulk?guid=&minExtInfo=false failed with a 408 Request Timeout exception
2022-11-04 15:07:03,926 [http-nio-8443-exec-1] ERROR com.collibra.marketplace.azure.purview.component.AzurePurviewSearchComponent - Request https://.com:443/catalog/api/atlas/v2/entity/bulk?guid=&minExtInfo=false failed with a 408 Request Timeout exception

Please Note : I have removed the the host and guid while replying from the URL.










2 years ago

Hi @james.scicluna @spring-team.collibra.com ,

Any lead on this , the code is still failing with the timeout 408 error shared in the last post.

Regards,
Rohit Chandra

Loading...