FU

Tuesday, April 6th, 2021 6:48 PM

job.catalog.JdbcIngestionJob error on databricks JDBC

Hello, all

I tried to connect Azure Databricks using Collibra JDBC Driver. I’ve set up the connection via the Catalog successfully with the Driver downloaded from https://marketplace.collibra.com/listings/jdbc-driver-for-databricks/

But I got an error. I guess it might because of the single quote at null of schema like tablesList=null, schema='null', when JdbcIngestionJob Ingestion params. And I’ve no idea how to fix it.

Below is the log. I’d very much appreciate if anyone could help on this.

2021-04-06 14:00:03.328 INFO [SparkContext-akka.actor.default-dispatcher-2] spark.jobserver.JobStatusActor - Starting actor spark.jobserver.JobStatusActor
2021-04-06 14:00:03.328 INFO [SparkContext-akka.actor.default-dispatcher-5] spark.jobserver.JobResultActor - Starting actor spark.jobserver.JobResultActor
2021-04-06 14:00:03.335 INFO [SparkContext-akka.actor.default-dispatcher-3] spark.jobserver.JobManagerActor - Starting actor spark.jobserver.JobManagerActor
2021-04-06 14:00:03.806 WARN [SparkContext-akka.actor.default-dispatcher-3] hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
2021-04-06 14:00:03.885 WARN [SparkContext-akka.actor.default-dispatcher-3] apache.spark.SparkConf - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
2021-04-06 14:00:04.393 INFO [SparkContext-akka.actor.default-dispatcher-3] jetty.util.log - Logging initialized @2694ms
2021-04-06 14:00:04.462 INFO [SparkContext-akka.actor.default-dispatcher-3] jetty.server.Server - jetty-9.3.z-SNAPSHOT
2021-04-06 14:00:04.477 INFO [SparkContext-akka.actor.default-dispatcher-3] jetty.server.Server - Started @2778ms
2021-04-06 14:00:04.504 INFO [SparkContext-akka.actor.default-dispatcher-3] jetty.server.AbstractConnector - Started ServerConnector@6dbb469a{HTTP/1.1,[http/1.1]}{0.0.0.0:35917}
2021-04-06 14:00:04.526 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@247d6d79{/jobs,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.526 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@46da06e8{/jobs/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.527 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@46c41459{/jobs/job,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.528 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@5bbb5026{/jobs/job/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.529 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@361c798c{/stages,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.529 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@77d4575f{/stages/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.530 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@40866605{/stages/stage,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.532 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@60e54a07{/stages/stage/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.533 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@6b1410eb{/stages/pool,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.533 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@375c8410{/stages/pool/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.534 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@7c5ed921{/storage,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.535 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@4853ac54{/storage/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.536 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@45ecdb36{/storage/rdd,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.537 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@6db01359{/storage/rdd/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.537 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@d35d18a{/environment,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.538 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@7977ded2{/environment/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.538 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@68ac0100{/executors,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.539 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@3a211f6e{/executors/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.540 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@4842c883{/executors/threadDump,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.541 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@39744798{/executors/threadDump/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.549 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@54ddc8b7{/static,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.550 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@61b71a3a{/,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.551 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@13854a30{/api,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.552 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@7a3550bc{/jobs/job/kill,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.552 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@2b198325{/stages/stage/kill,null,AVAILABLE,@Spark}
2021-04-06 14:00:04.821 INFO [SparkContext-akka.actor.default-dispatcher-3] server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@5aba9ee2{/metrics/json,null,AVAILABLE,@Spark}
2021-04-06 14:00:08.168 DEBUG [pool-3-thread-1] job.serialize.InternalRequestDataDeserializer - Using Package com.collibra.jobserver.dto.catalog.ingestion, version 14.0.3
2021-04-06 14:00:08.168 INFO [SparkContext-akka.actor.default-dispatcher-3] spark.jobserver.JobStatusActor - Job status update: ‘com.collibra.jobserver.job.catalog.JdbcIngestionJob’ ‘54b993b6-89f6-4585-b96f-b0d2909e6dc1’, status ‘Started’
2021-04-06 14:00:08.176 INFO [pool-3-thread-1] job.catalog.JdbcIngestionJob - Ingestion params: JdbcIngestionParameters{jdbcConnection=JdbcConnection{url=‘jdbc:spark://***.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=***;AuthMech=3’, user=’***’, password=’***’, driverClassName=‘com.simba.spark.jdbc.Driver’, tablesList=null, schema=‘null’, targetCsvSize=10485760000, properties={password=***, user=***}}, tableTypesToSkip=[SEQUENCE, INDEX], tablesToSkip=[]} AbstractIngestionParameters{dataSourceType=‘COLLIBRA_DRIVER’, pageNumber=0, pageSize=10000}
2021-04-06 14:00:08.178 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - found job code location in entry sqldao.rootdir with value /opt/collibra/collibra_data/spark-jobserver/data/sqlDao
2021-04-06 14:00:08.178 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - driversPath = /opt/collibra/collibra_data/spark-jobserver/data/sqlDao
2021-04-06 14:00:08.179 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - sparkUrl = spark://10.41.11.64:37099/jars/jdbc-driver-24ebfd25-c546-45ef-b930-89729b8d835e-20210401_124523_364.jar
2021-04-06 14:00:08.179 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - filePath = /opt/collibra/collibra_data/spark-jobserver/data/sqlDao/jdbc-driver-24ebfd25-c546-45ef-b930-89729b8d835e-20210401_124523_364.jar
2021-04-06 14:00:08.180 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - sparkUrl = spark://10.41.11.64:37099/jars/jobserver-job-mockjar-1.0.0.jar
2021-04-06 14:00:08.180 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - filePath = /opt/collibra/collibra_data/spark-jobserver/data/sqlDao/jobserver-job-mockjar-1.0.0.jar
2021-04-06 14:00:08.180 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - Can’t find jar /opt/collibra/collibra_data/spark-jobserver/data/sqlDao/jobserver-job-mockjar-1.0.0.jar locally. This may not be an issue in multi-jvm or cluster mode.
2021-04-06 14:00:08.230 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - driverClass = class com.simba.spark.jdbc.Driver
2021-04-06 14:00:08.230 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - driver : com.simba.spark.jdbc.Driver@67403d78
2021-04-06 14:00:08.230 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - registering wrapper: com.collibra.jobserver.CollibraDriverWrapper@58709b48
2021-04-06 14:00:08.230 INFO [pool-3-thread-1] job.catalog.JdbcIngestionJob - Establishing connection
2021-04-06 14:00:08.230 DEBUG [pool-3-thread-1] job.catalog.JdbcHelper$ - Using driver com.collibra.jobserver.CollibraDriverWrapper@58709b48
2021-04-06 14:00:08.234 DEBUG [pool-3-thread-1] jdbc.connection.ConnectionProviderChainImpl - Skipping provider class com.collibra.jdbc.connection.kerberos.KerberosCompatibilityConnectionProvider, as it does not support the current request
2021-04-06 14:00:08.235 DEBUG [pool-3-thread-1] jdbc.connection.ConnectionProviderChainImpl - Skipping provider class com.collibra.jdbc.connection.kerberos.KerberosConnectionProvider, as it does not support the current request
2021-04-06 14:00:08.236 DEBUG [pool-3-thread-1] jdbc.connection.ConnectionProviderChainImpl - Skipping provider class com.collibra.jdbc.connection.cdata.CDataConnectionProvider, as it does not support the current request
2021-04-06 14:00:08.239 DEBUG [pool-3-thread-1] jdbc.connection.ConnectionProviderChainImpl - Skipping provider class com.collibra.jdbc.connection.cyberark.CyberArkConnectionProvider, as it does not support the current request
2021-04-06 14:00:08.239 DEBUG [pool-3-thread-1] jdbc.connection.ConnectionProviderChainImpl - Configured connection providerClasses: [com.collibra.jdbc.connection.pathresolver.PathResolverConnectionProvider@66130ea2, com.collibra.jdbc.connection.batched.BatchedConnectionProvider@66c0f6b5, com.collibra.jdbc.connection.DefaultConnectionProvider@49cf1d00]
2021-04-06 14:00:08.239 DEBUG [pool-3-thread-1] jdbc.connection.ConnectionProviderChainImpl - Configured connection providerClasses: [com.collibra.jdbc.connection.pathresolver.PathResolverConnectionProvider@66130ea2, com.collibra.jdbc.connection.batched.BatchedConnectionProvider@66c0f6b5, com.collibra.jdbc.connection.DefaultConnectionProvider@49cf1d00]
2021-04-06 14:00:08.239 DEBUG [pool-3-thread-1] jdbc.connection.ConnectionProviderChainImpl - Calling provider: class com.collibra.jdbc.connection.pathresolver.PathResolverConnectionProvider
2021-04-06 14:00:08.249 DEBUG [pool-3-thread-1] jdbc.connection.ConnectionProviderChainImpl - Calling provider: class com.collibra.jdbc.connection.batched.BatchedConnectionProvider
2021-04-06 14:00:08.249 DEBUG [pool-3-thread-1] jdbc.connection.ConnectionProviderChainImpl - Calling provider: class com.collibra.jdbc.connection.DefaultConnectionProvider
2021-04-06 14:00:08.249 DEBUG [pool-3-thread-1] jdbc.connection.DefaultConnectionProvider - Connecting using url: jdbc:spark://***.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=***;AuthMech=3
2021-04-06 14:00:09.392 DEBUG [pool-3-thread-1] connection.batched.BatchedConnectionDecoratorFactory - Setting connection fetch size to: 1000
2021-04-06 14:00:09.393 DEBUG [pool-3-thread-1] connection.batched.BatchedConnectionDecoratorFactory - Setting connection fetch size to: 1000
2021-04-06 14:00:09.410 INFO [pool-3-thread-1] job.catalog.JdbcHelper$ - databaseMajorMinor=3.1
2021-04-06 14:00:09.410 INFO [pool-3-thread-1] job.catalog.JdbcHelper$ - databaseProductVersion=3.1.0
2021-04-06 14:00:09.410 INFO [pool-3-thread-1] job.catalog.JdbcHelper$ - driverMajorMinor=2.6
2021-04-06 14:00:09.410 INFO [pool-3-thread-1] job.catalog.JdbcHelper$ - driverName=SparkJDBC
2021-04-06 14:00:09.410 INFO [pool-3-thread-1] job.catalog.JdbcHelper$ - driverVersion=02.06.17.1021
2021-04-06 14:00:09.410 INFO [pool-3-thread-1] job.catalog.JdbcIngestionJob - Connection established
2021-04-06 14:00:09.585 INFO [pool-3-thread-1] job.catalog.JdbcIngestionJob - available schemas: List(default, global_temp, testdata)
2021-04-06 14:00:09.586 INFO [pool-3-thread-1] job.catalog.JdbcIngestionJob - requested schema: null
2021-04-06 14:00:09.586 DEBUG [pool-3-thread-1] job.catalog.JdbcIngestionJob - schema was blank
2021-04-06 14:00:09.586 ERROR [pool-3-thread-1] job.catalog.JdbcIngestionJob - jdbc schema not specified, schemas.size=3

2021-04-06 14:00:09.645 DEBUG [pool-3-thread-1] job.catalog.IngestedSchemaResultPageWriter - Writing results file /opt/collibra/collibra_data/spark-jobserver/temp-files/context-JS-f26a007e-d7e7-4658-b78f-d52c624c4ae0/ingestion-results/page-0
2021-04-06 14:00:09.673 DEBUG [pool-3-thread-1] job.catalog.IngestedSchemaResultsPageReader - Serving page /opt/collibra/collibra_data/spark-jobserver/temp-files/context-JS-f26a007e-d7e7-4658-b78f-d52c624c4ae0/ingestion-results/page-0
2021-04-06 14:00:09.687 DEBUG [pool-3-thread-1] job.catalog.IngestedSchemaResultsPageReader - Returning 0 tables
2021-04-06 14:00:09.700 INFO [SparkContext-akka.actor.default-dispatcher-3] spark.jobserver.JobStatusActor - Job status update: ‘com.collibra.jobserver.job.catalog.JdbcIngestionJob’ ‘54b993b6-89f6-4585-b96f-b0d2909e6dc1’, status ‘Completed’
2021-04-06 14:00:10.326 INFO [SparkContext-akka.actor.default-dispatcher-3] spark.jobserver.JobManagerActor - Received context termination request
2021-04-06 14:00:10.333 WARN [SparkContext-akka.actor.default-dispatcher-15] spark.jobserver.JobResultActor - Shutting down spark.jobserver.JobResultActor
2021-04-06 14:00:10.334 INFO [SparkListenerBus] spark.jobserver.JobManagerActor - Spark is shutting down. Terminating current actor
2021-04-06 14:00:10.339 INFO [SparkContext-akka.actor.default-dispatcher-3] jetty.server.AbstractConnector - Stopped Spark@6dbb469a{HTTP/1.1,[http/1.1]}{0.0.0.0:0}

36 Messages

4 years ago

Did you specify schema in connection properties?

You have three schema available - ‘default’,‘global_temp’,‘testdata’. You have to choose one of these as value for ‘Schema’ in connection property. Connection properties would look like this:

262 Messages

Hi Arvind,

through this, can we load metadata of the underlying Azure Blob Storage files also incase it is Azure hosted Databricks?

Br,
Noor.

1.2K Messages

4 years ago

That works when schema had been passed in. Thank you very much Arvind.

Loading...