101 Messages
TO BE REVIEWED: What type of AI techniques (eg. Random Forest) does Collibra DQ Use?
Q: What type of AI techniques (eg. Random Forest) does Collibra DQ Use?
Answer:
It is important to realize that we have many aspects to Collibra DQ and they don’t all use the same algorithms. Does that make any sense? I will give you some examples:
- Outlier detection: We use Interquartile Range (“IQR”).
And which I wrote about here in a LinkedIn blog on IQR.
- Fuzzy matching for duplicate detection, we use Levenshtein Distance.
We have a custom Collibra DQ “Analyze” class which uses the Spark ML libraries to get its work done. We use a number of these Spark ML packages. And each package contains classes.
For example in the ML package: org.apache.spark.ml, we use: Pipeline and PipelineStage.
And in the package org.apache.spark.ml.classification we use the RandomForestClassifier.
Lastly, in the org.apache.spark.ml.fpm package,we use FPGrowth.
Unless someone is a Machine Learning developer, simply telling you about the Apache Spark Machine Learning packages and classes won’t help much. But it is fact that we use these Spark AI technologies. Perhaps a meeting with the individuals concerned will help?
No Responses!