M

Monday, January 16th, 2023 11:42 PM

Limitations of Spark-Submit jobs for CollibraDQ

Hello Fellow Data Citizens,

I hope you are well!

I have read that Spark-Submit is an available option to execute CollibraDQ jobs. However, It has some limitations. Can I please request more clarification/information. Thank you.

  • Spark-Submit path is not supported in Production by Collibra support team
  • CollibraDQ team is not supporting any bug fix
  • Each Spark-Submit job requires a new Spark Cluster.

5 Messages

2 years ago

@muhammad.salahuddin.asic.gov.au - The requirement of having a New SPARK cluster for each command line submitted spark-submit job is really a limitation at Databricks side. Databricks does not allow a direct spark submit on their cluster, they do not support it.

You could still submit DQ jobs from Databricks UI or APIs as alternative here, as illustrated in CollibraDQ docs:
https://productresources.collibra.com/docs/collibra/latest/Content/DataQuality/DQApis/DQ-Databricks%20Submit.htm

Thank you @ashish.sharma for your continuous help. I am evaluating API option as well. Our goal is to build an automated Data Quality Framework which can be plugged into Data Pipelines for quality checks.

I think, API is the best option between Spark-Submit, DataBricks UI and API.

2 years ago

Hi @ashish.sharma

Can I please request to share the location of DQ Web Run to find the complete list of parameters for JSON payload template for Spark-Submit job, I have checked the CollibraDQ documentation and the parameters for JSON Payload Template has been mentioned partially, Please refer the following screenshot. Thank you again.

5 Messages

Hi @muhammad.salahuddin.asic.gov.au - Here DQ’s web Run command refers to DQ Command Line which you would have seen in UI while submitting a DQ job. You would see some default parameters here (e.g. -rd, -ds, -cxn) and than some more when you enable/disable any layer or configuration.

Full List of command line options are available within DQ UI on this UI Navigation:

or on below path:
<dq_home>/a/options

Thank you @ashish.sharma. Got the list of command line options. Much appreciated your continuous help mate.

Loading...