L

Wednesday, March 23rd, 2022 5:05 PM

What is DQ on Cloud (SaaS) exactly?

What exactly is Collibra Data Quality on Cloud?:[1]

  1. DQ Web App hosted in Cloud.
    This DQ Web App will be able to connect to the DQ Connector data sources for testing the connection, and for the DQ Explorer to do a data preview. Collibra will provision this web instance.

  2. Customer Provides a Postgres DQ Metastore:
    There will now be two DQ Metastores:
    A. For the DQ Cloud there will be a metastore in the Cloud:

  • No client DQ “Preview Data” will be stored in the cloud.

  • Also no DQ Connector credentials will be stored in the cloud.

    B. There will be a DQ Metastore on the Collibra Edge Server on-prem.

  • This DQ Metastore on Edge will hold client preview data, and Connector credentials.

  • This means the DQ Web App will have to query the metastore on Edge for the preview data.

  • Also these two DQ Metastores (A & B above) will need to be synchronized.

  1. DQ Agent will be installed on the Edge server.
    K8s cluster which Collibra deploys with Agent installation can be used for Apache Spark compute.

  2. Customer Provides Apache Spark: “The DQ Job (Spark) compute will take place locally on Edge K3s. Increase the size of your VM to vertically scale for more resources (.e.g. 32 cores, RAM, etc.). This is the preferred option in beta. Hadoop compute is supported if customer chooses that path and uses their Dataproc or EMR cluster.”[2]

  • Apache Spark will need to be made available to the Agent.
    The customer will decide which Spark cluster to use:
    The customer has a Spark cluster:

  1. Customers can use K8s cluster.

  2. Customers can use a YARN based Apache Spark cluster.

Notes:

  1. See also:https://dq-docs.collibra.com/installation/cloud#3.-install-edge
  2. K3s is: https://k3s.io/ “lightweight kubernetes.”

157 Messages

2 years ago

Thanks @laurent.weichberger.collibra.com talking with @pascal.vlaemynck we had an open question on what is technical stored in the DQ Web App…

I see “This means the DQ Web App will have to query the metastore on Edge for the preview data.”

When the webapp shows a data-preview after the query, does this mean the webapp is holding the data in som cache, implying a departure from the on-prem enviroment?

Hi Eric, this is a great question. The DQ Web App makes requests of any DQ Metastore using a REST API invocation, and the HTTP Response from that invocation must be rendered in the web browser. If that web browser is running on a laptop say in New York City, and the DQ Metastore is on a server in Switzerland for example, by all means the “Preview Data” is now on the laptop in New York City, in the laptops memory (temporarily) as the rendering occurs. It must be the case. Now, if the user must go through some VPN to reach the browser, and the browser software is not actually running on the laptop but on the other side of the VPN connection that is a different story.
Does this help?

Loading...