L

Thursday, October 27th, 2022 6:17 PM

TO BE REVIEWED: What type of AI techniques (eg. Random Forest) does Collibra DQ Use?

Q: What type of AI techniques (eg. Random Forest) does Collibra DQ Use?
Answer:
It is important to realize that we have many aspects to Collibra DQ and they don’t all use the same algorithms. Does that make any sense? I will give you some examples:

  1. Outlier detection: We use Interquartile Range (“IQR”).

And which I wrote about here in a LinkedIn blog on IQR.

  1. Fuzzy matching for duplicate detection, we use Levenshtein Distance.

We have a custom Collibra DQ “Analyze” class which uses the Spark ML libraries to get its work done. We use a number of these Spark ML packages. And each package contains classes.
For example in the ML package: org.apache.spark.ml, we use: Pipeline and PipelineStage.

And in the package org.apache.spark.ml.classification we use the RandomForestClassifier.

Lastly, in the org.apache.spark.ml.fpm package,we use FPGrowth.

Unless someone is a Machine Learning developer, simply telling you about the Apache Spark Machine Learning packages and classes won’t help much. But it is fact that we use these Spark AI technologies. Perhaps a meeting with the individuals concerned will help?

No Responses!
Loading...