L

Thursday, July 29th, 2021 5:57 PM

Apache Spark in Collibra DQ with Py4J (need Spark v. 3.01, and correct Scala version)

Python and Py4J code: owl.owlCheck()

fails with an exception:

File “/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py”, line 1305, in call
File “/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py”, line 128, in deco
File “/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py”, line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.owlCheck.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at com.owl.core.activity2.Schema.findMalformedColumns(Schema.scala:77)
at com.owl.core.activity2.Load.preProcess(Load.scala:1761)
at com.owl.core.activity2.Load.loadNotebookDF(Load.scala:200)






This is a Collibra DQ -> Spark version mismatch. Container’s Spark and DQ both need to be aligned on Spark v.3.0.1, this exception is a classic Scala mismatch (2.11 instead of 2.12).

41 Messages

3 years ago

Yes. While there are versions of DQ that work with Spark 3+ the default version uses spark 2.3 so it will not be compatible unless you use a version of the application that compiled against Spark3 or provide containers

Loading...