A

Wednesday, November 10th, 2021 4:34 PM

Optimizing data discovery & DQ rule enforcement with semantics & data concepts management

@joshua.zatulove asks -
Is there any synchronization between data concepts in DQ and Data Concepts that may have been defined in Data Governance?

“It was smart enough to figure out the EIN was invalid” - are we saying the manual rules and respective regex are going to be automatically created?

@jeffery.edwards.collibra.com asks -
When can we expect this additional DQ capability (version) to be rolled out to our DQ demo and assessment environments?

@vasiliki.nikolopoulou.collibra.com asks - When would the terminology or nomenclature (such as data concepts, semantics, data classes, data categories) be aligned with the descriptions in DG?

@adam.blalock asks - What is the positioning of this functionality alongside Collibra Catalog Auto Classification? Is there overlap, or are these different enough from each other that it makes sense to use both? @kirk.haslbeck.collibra.com Do you want to respond to Adam’s question?

My understanding is - On the messaging/positioning side, we can make it more clear via distinguishing use cases. Catalog focuses on data discovery and policy enforcement (whenever ready) use case from a data steward perspective. DQ focuses on data discovery and DQ rule enforcement from a data quality analyst/engineer perspective. But, once Collibra DQ is integrated within CDIC…we need to deprecate any duplicate capabilities.

33 Messages

3 years ago

Thanks for the great questions!

Synchronization between DQ and DG-defined data concepts: That is longer-term on our roadmap as we streamline operating model. Will definitely seek Pre-Sales and Prof Svcs input on how best to implement.

Yes, the manual rules created from respective RegEx will be automatically created. We will conform these rules so they can be loaded onto the DQ Connector. Early 2022 ETA

Deployment to DQ Demo / Assessment: Upgraded as of 11/11/21 to both x.x.107.21 and x.x.51.20

Alignment of terminology with Collibra: ETA 2021.12 or 2022.01. We will rename Data Concepts -> Data Categories and Semantics -> Data Classes

Positioning functionality alongside Catalog Auto Classification: Currently the use cases are different enough (lightweight but broad-based profiling, guided stewardship and classification) vs. DQ’s data discovery / class detection / rule generation on targeted tables / data sources. That said, there will be overlap and we will consider rationalization and optimization in 2H 2022 since DQ on Edge (Cloud DQaaS) is highest current priority.

5 Messages

@matthew.tsui.collibra.com In the video, the RegEx is described as being automatically figured out. I doubt that’s realistic. Are you saying the manual rule will be automatically added based on known regex patterns associated with known semantics/data classes? Or are you saying RegEx patterns will be derived on the fly? B/c how will we realistically know what is/isn’t a valid pattern? Thanks.

Loading...