M

Wednesday, July 14th, 2021 1:38 PM

REVIEWED-FINAL: Technical: DQ Connections / Connectors

Top Links

Supported DQ Connectors: [Click Here]

Connectors / Support

Q: Is there a list of all supported DQ connectors?
A: https://docs.owl-analytics.com/connecting-to-dbs-in-owl-web/owl-db-connection/supported-drivers
A: Please note that some drivers marked as ‘Some Testing’ e.g. MongoDB are in Tech Preview i.e. may work but cannot guarantee as we have not yet developed full compatibility. #mongodb

Q: If not on the list above, what is general guidance for supported data sources for DQ?
A: Generally, best guidance is yes to JDBC and Spark file connectors. If it’s on the Collibra CData list, likely. Anything else, need to confirm before definitive answer. #jdbc #spark

Q: What authentication methods are supported?
A: Password vaults, Kerberos, Anything JDBC Connection Property

Q: Can You Connect To GCP, Talend, GIT, Salesforce, Teradata?
A: We have native connectors for GCP, Teradata and 30 other solutions. For Salesforce, you may be able to plug into Salesforce Postgres. Talend or Git aren’t normally sources customers would scan.

Connections / GTM

Q: Will we charge for DQ Connectors?
A: Likely in the future but plan not yet formalized until they are re-certified.

Connectors / Compatibility

Q: Will We Re-Certify Connectors For DQ?
A: Planned but not yet roadmap

Connections / How-To

Q: Do customers need to know what access they have to their data sources?
A: Yes, they need read access.

3 years ago

Collibra DQ Connection to GCP BigQuery using the JSON key with the platform team. Please refer to the following steps (per Laurent/Vadim, July 2021):

  1. We would use this Simba driver: com.simba.googlebigquery.jdbc42.Driver

  2. We would make an owl-gcp.json (your org auth key in JSON format)

  3. We would create a JDBC connection (for example only do not use this JDBC URL):
    jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=;OAuthType=0;OAuthServiceAcctEmail=<1234567890>[email protected];OAuthPvtKeyPath=/opt/ext/owl-gcp.json;Timeout=86400

  4. Vadim shared, “Regarding Big Query, the Simba driver provided by Google for Big Query just takes a path to a JSON file that contains the service account for authorization. That same file is provided to the Spark session to make a direct to storage connection for maximum parallelism once Core fires up.”

Brian Mearns tested the above and explained there are actually a number of others steps which must be performed to achieve success:

  1. Password for the BigQuery Connector form in Collibra DQ must be a base64 encoded string created from the json file (see step 3. above) and input as password. For example:
    base64 owl-gcp.json
    or
    cat owl-gcp.json | base64
    {see screen shot below}





  2. Check that this JARs exists and is on the path of the Collibra DQ Web UI server (eg. <INSTALL_PATH>/owl/drivers/bigquery/core). Look at your driver directory location which contains this BigQuery JAR: spark-bigquery_2.12-0.18.1.jar

  3. Make sure there are all the needed JARs present in <INSTALL_PATH>/owl/drivers/bigquery/:
    animal-sniffer-annotations-1.19.jar
    google-api-services-bigquery-v2-rev20201030-1.30.10.jar
    grpc-google-cloud-bigquerystorage-v1beta1-0.106.4.jar
    listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
    annotations-4.1.1.4.jar
    google-auth-library-credentials-0.22.0.jar
    grpc-google-cloud-bigquerystorage-v1beta2-0.106.4.jar
    opencensus-api-0.24.0.jar
    api-common-1.10.1.jar
    google-auth-library-oauth2-http-0.22.0.jar
    grpc-grpclb-1.33.1.jar
    opencensus-contrib-http-util-0.24.0.jar
    auto-value-annotations-1.7.4.jar
    GoogleBigQueryJDBC42.jar
    grpc-netty-shaded-1.33.1.jar
    perfmark-api-0.19.0.jar
    avro-1.10.0.jar
    google-cloud-bigquery-1.125.0.jar
    grpc-protobuf-1.33.1.jar
    protobuf-java-3.13.0.jar
    checker-compat-qual-2.5.5.jar
    google-cloud-bigquerystorage-1.6.4.jar
    grpc-protobuf-lite-1.33.1.jar
    protobuf-java-util-3.13.0.jar
    commons-codec-1.11.jar
    google-cloud-core-1.93.10.jar
    grpc-stub-1.33.1.jar
    proto-google-cloud-bigquerystorage-v1-1.6.4.jar
    commons-compress-1.20.jar
    google-cloud-core-http-1.93.10.jar
    gson-2.8.6.jar
    proto-google-cloud-bigquerystorage-v1alpha2-0.106.4.jar
    commons-lang3-3.5.jar
    google-http-client-1.38.0.jar
    guava-23.0.jar
    proto-google-cloud-bigquerystorage-v1beta1-0.106.4.jar
    commons-logging-1.2.jar
    google-http-client-apache-v2-1.38.0.jar
    httpclient-4.5.13.jar
    proto-google-cloud-bigquerystorage-v1beta2-0.106.4.jar
    conscrypt-openjdk-uber-2.5.1.jar
    google-http-client-appengine-1.38.0.jar
    httpcore-4.4.13.jar
    proto-google-common-protos-2.0.1.jar
    core
    google-http-client-jackson2-1.38.0.jar
    j2objc-annotations-1.3.jar
    proto-google-iam-v1-1.0.3.jar
    error_prone_annotations-2.4.0.jar
    google-oauth-client-1.31.1.jar
    jackson-annotations-2.11.0.jar
    grpc-alts-1.33.1.jar
    jackson-core-2.11.3.jar
    slf4j-api-1.7.30.jar
    failureaccess-1.0.1.jar
    grpc-api-1.33.1.jar
    jackson-databind-2.11.0.jar
    gax-1.60.0.jar
    grpc-auth-1.33.1.jar
    javax.annotation-api-1.3.2.jar
    threetenbp-1.5.0.jar
    gax-grpc-1.60.0.jar
    grpc-context-1.33.1.jar
    joda-time-2.10.1.jar
    gax-httpjson-0.77.0.jar
    grpc-core-1.33.1.jar
    json-20200518.jar
    google-api-client-1.31.1.jar
    grpc-google-cloud-bigquerystorage-v1-1.6.4.jar
    jsr305-3.0.2.jar





































































  4. You may get a CLASSPATH conflict regarding the JAR files.

  5. Make sure the BigQuery connector Scala version matches your Spark Scala version.

    .

10: RESULTS (see screen shot above) and: Collibra DQ Doc for BigQuery Connector

16 Messages

3 years ago

Question about compatibility with Parquet files, particularly on ADLS Gen 2?

Have heard this request a few times recently. Thanks for any information.

41 Messages

3 years ago

@adam.blalock Collibra DQ does support both Azure Data lake and parquet files. If there are any issues when setting this up it will likely be around permissioning to the files which can be cloud configuration issues but the DQ software supports both as long as the end-user has access to the files and can use the connection tool to connect them. out of the box there is a templated connection for HDFS, Amazon S3, and Google Storage. I do not believe there is a default template for Azure blob storage

2 Messages

3 years ago

What’s the DQ story for SAP other than Hana? Q from a prospect. .

3 years ago

5 Messages

3 years ago

CDATA Driver/Marketplace Questions:

  1. If a customer is having an issue with the pre-packaged JDBC driver in DQ, can I download and provide the corresponding CDATA driver from Marketplace?
    Looking for information on

    • Supportability (i.e. “If it works, great! If not, oh well.”)
    • Billing/Sales
  2. Is there already an ongoing product/engineering effort to align the CDATA drivers that are already offered to our DGC customers in Marketplace with the pre-packaged DQ drivers?

Thanks!

Loading...