35 Messages
OWL Check on Timeseries Event Sequence
I see one use-case where the customer is looking owl to detect for an event ID if Message (A, B, C) come in this pattern A --> B-- > C (80% of the time) at different times in a day and one day if it comes to B–>A–>C the owl should detect this?
is this possible? Event ID will be the same
Just to add A, B and C are in a single column
BigDataBear
101 Messages
4 years ago
Hi Bineesh, is this a Kafka Streaming question, or how are the “messages” coming through to DQ?
0
0
bineeshbabu
35 Messages
4 years ago
This via some ETL process into a Database
0
0
kirkhaslbeck
41 Messages
4 years ago
If A,B,C are in a single column and the column looks like this
event_id
a,b,c
a,b,c
d,e,f
a,b,c
b,c,a -> alert
then all you have to do is click OUTLIER -> Categorical and select event_id. The b,c,a would be a rare categorical, low frequency outlier.
0
0
bineeshbabu
35 Messages
4 years ago
Thanks @kirk.haslbeck.collibra.com, Message A,B,C are not in single cell. They are in different row . something like below
0
0
kirkhaslbeck
41 Messages
4 years ago
In the ML world this would be similar to a Markov Chain where you predict the sequence of events. While Collibra DQ does support time series outliers it does not support time series event sequences, at least not as an out of the box 1 click feature. One thing to consider for this unique case is to write a DQ rule that transposes by eventID and pivots on Message, similar to a GROUP by on eventId and then write a simple sql conditional check like below.
0
bineeshbabu
35 Messages
4 years ago
Thanks @kirk.haslbeck.collibra.com
Can we write such spark functions in owl rule tab or idea is to create a View and then write a SQL
0
0
kirkhaslbeck
41 Messages
4 years ago
If you use the DataFrame API you can write whatever spark code you like. If you want to use the Rule Builder you could write SQL like the above.
0
bineeshbabu
35 Messages
4 years ago
Thanks @kirk.haslbeck.collibra.com
This was really helpful
0
0