101 Messages
What is DQ on Cloud (SaaS) exactly?
What exactly is Collibra Data Quality on Cloud?:[1]
-
DQ Web App hosted in Cloud.
This DQ Web App will be able to connect to the DQ Connector data sources for testing the connection, and for the DQ Explorer to do a data preview. Collibra will provision this web instance. -
Customer Provides a Postgres DQ Metastore:
There will now be two DQ Metastores:
A. For the DQ Cloud there will be a metastore in the Cloud:
-
No client DQ “Preview Data” will be stored in the cloud.
-
Also no DQ Connector credentials will be stored in the cloud.
B. There will be a DQ Metastore on the Collibra Edge Server on-prem.
-
This DQ Metastore on Edge will hold client preview data, and Connector credentials.
-
This means the DQ Web App will have to query the metastore on Edge for the preview data.
-
Also these two DQ Metastores (A & B above) will need to be synchronized.
-
DQ Agent will be installed on the Edge server.
K8s cluster which Collibra deploys with Agent installation can be used for Apache Spark compute. -
Customer Provides Apache Spark: “The DQ Job (Spark) compute will take place locally on Edge K3s. Increase the size of your VM to vertically scale for more resources (.e.g. 32 cores, RAM, etc.). This is the preferred option in beta. Hadoop compute is supported if customer chooses that path and uses their Dataproc or EMR cluster.”[2]
- Apache Spark will need to be made available to the Agent.
The customer will decide which Spark cluster to use:
The customer has a Spark cluster:
-
Customers can use K8s cluster.
-
Customers can use a YARN based Apache Spark cluster.
Notes:
ericgerstner
157 Messages
2 years ago
Thanks @laurent.weichberger.collibra.com talking with @pascal.vlaemynck we had an open question on what is technical stored in the DQ Web App…
I see “This means the DQ Web App will have to query the metastore on Edge for the preview data.”
When the webapp shows a data-preview after the query, does this mean the webapp is holding the data in som cache, implying a departure from the on-prem enviroment?
1
0