R

Friday, December 17th, 2021 12:27 PM

Ingest specific SQL tables from a schema

Hello,

I am trying to ingest some tables from a schema from an SQL server. I only want to ingest a few tables and ignore the rest. I am trying to use the connection string given in the documentation - “Tables=TableA, TableB, TableC”. However, I only see the schema ingested and none of the mentioned tables. I can’t verify if I am doing something wrong. Has anyone does this? I would really appreciate if you could help me with this.

Thank You
Rahul Nair

2 Messages

3 years ago

Hi @rahul.nair.external.toyota-europe.com ,
Are you using Data catalog to load tables for a Schema . Ideally the catalog gives you options to EXCLUDE tables from getting in Collibra (and not the other way around).

8 Messages

Hi Nikhil,

Thank you for pointing that out. I am aware of this option. However, in this specific case, I only know the name of tables that I want to ingest and not the others that exist in this particular schema. For eg: I know the name of the 6 tables that I want to ingest and there are a total of 200 tables in the schema. So essentially, I don’t know the name of the other 194 tables that I want to exclude from ingestion. Hence in this scenario, the “exclude tables” option can become very time consuming, which is why I was exploring if there’s any other way around to this.

Thanks for your response!

51 Messages

 • 

200 Points

3 years ago

We have been waiting for the INCLUDE option for a couple o f years now. We finally implemented a manual process to bring in tables one at a time. We couldn’t wait indefinitely and we are not willing to being in 25,000+ tables using Catalog. Not sure when/if Collibra will ever figure this out.
Be glad to share how we did it, but it might not work for you.
Regards,
Jeff Evoy

8 Messages

Thank you for your response Jeff. It is indeed strange that Collibra has not included this feature because this seems to be one of the most important and basic ones. I would be glad if you could share how you did it. It might point me into the right direction.

Thank You
Rahul Nair

51 Messages

 • 

200 Points

Rahul:
We use Epic’s Electronic Health Record. It’s database has over 25,000 tables. I’ve developed a desktop application that screen-scrapes a table’s data dictionary in Epic and creates the Collibra upload worksheets. We have ingested 300+ tables out of the 25,000+ this way,. However, we have to manually re-ingest a table when it changes. I’m waiting to hear from Alvin if his solution scales. That would be best, but I’m not sure how we could load 300 table names into a connection string.

8 Messages

Hi Jeffrey,

This is indeed an interesting solution and it might be very helpful for me in the near future. Thank you for sharing your approach!

Collibra expects us to suggest this in their ideation platform to take up for enhancements based on # upvotes received, in case you have not created one already.

I agree, I could not wait and found workarounds. I created an ideation ticket for selectively profiling from the ingested schema’s and seems like it is planned now. In case this is something you want you may want to vote for it.

FYI: Ideation Platform – Collibra | Product Resource Center

157 Messages

3 years ago

I have some good news for all parties on this post :slight_smile:

So it is possible to restrict the connection to include only a pre-set list of tables. The trick is to pass the fully qualified names of your tables and not the friendly name.

In my case, restricting to the Department table consists of passing the full name: HumanResources.Department

Shout if any questions.

I’ve attached an example using the AdventureWorks2019 Database and the Human Resources schema that consists of more than 1 table:

10 Messages

Hi Alvin,

This looks interesting. Could you please elaborate a bit? What tool are you using that has the “Connection details” screen you are displaying?

Thank you!

157 Messages

3 years ago

Gladly :slight_smile:

So the screen is the Collibra UI to register a data source, I just chose to zoom into the bits that are relevant here (attached a zoomed out pic).

I’ve also attached below, an image of the documentation for the Tables property. In order to use the Tables property, you need to specify the fully qualified table name (I admittedly thought it would be Database.Schema.Table but Schema.Table looks to work well).



8 Messages

Thanks Alvin! This looks interesting. I see that you are passing the arguments in the “Tables” connection string slightly differently than I was. I believe this should work for me as well. Let me try this out and report back the results. Thanks again!

That’s interesting ! Even though support suggested to use fully qualified names when i created the support ticket, it still did not work. Finally it was concluded that selective ingestion/profiling is not available for ingestion via Job Server.

By any chance are you ingesting via EDGE?

10 Messages

3 years ago

Thank you very much!

51 Messages

 • 

200 Points

3 years ago

Alvin:
Great job on your solution. Does it scale? We have 300-400 tables to refresh.

157 Messages

Good question :slight_smile:

Each property in the connection string can hold a lot of information so the short answer is yes, the solution does scale.

Here’s a quick test I just did for 200 dummy tables I created in that same HumanResources Schema :

Here’s what I’ve put in the Tables string (I’ve selected tables 0-198):

Here’s the Catalog config UI:

And here are the results which only include the dummy tables I created (and none of the original tables in HumanResources):

51 Messages

 • 

200 Points

3 years ago

Terrific! I’ll test this with our existing table list. This would be the best of both worlds. My screen scraper brings in a lot of metadata that doesn’t come across in the JDBC connection. This would provide automatic updates to the basic schema, which would be super. I’ll keep you posted.
Thanks again…

51 Messages

 • 

200 Points

3 years ago

Unfortunately this might work for a MS SQL Server driver but our db is Oracle, and although I gave it two table names it tried to bring in all 25,000+ tables and all columns. Had to abort it all. I’ll check with Oracle further.

157 Messages

Ah you’re using Oracle. So every JDBC driver is different and many have their own properties you can add to the connection string so of course, a SQL solution probably won’t work for Oracle.

Your best bet would be to tackle this problem through permissions management on the Oracle side.

The recommended route in your case would be to assign the relevant permissions to the Oracle DB user (that you’re using to connect from the Catalog to Oracle) to just the 300 tables and not the 25,000 - if possible.

The catalog will only ingest tables it has permissions to select, so if the Oracle DB user can only select 300 tables, only 300 tables will be ingested.

I appreciate that adding permissions to 300 or so tables isn’t particularly ideal, but it would be a one time activity.

This holds true for all connections, I’d say, if you can, it’s best practise to use a Service Account, dedicated for connections to Data Sources via Collibra, and for that account to only have access to data it needs access to.

If you’d like more help on Oracle specifically, I’d be happy to pick up on this separately, equally, give support a nudge and they’ll steer you in the right direction.

Cheers!

Alvin

51 Messages

 • 

200 Points

Yep - we use a dedicated Collibra service account for all database accesses. The table permissions idea might work but I’d like to see if we can keep this off the dba’s table. I’ll check the Oracle Tables property documentation (unless you’d like to be involved as well). This is a huge issue in healthcare. If we can crack this nut it will be a BIG win for a lot of orgs.

157 Messages

3 years ago

Having said that, I’ll also check whether the tables property is usable for oracle just incase :+1:

51 Messages

 • 

200 Points

Alvin:
Wondering if you’ve had any luck with this? I’d be glad to work with Oracle directly if you want. This would be a HUGE win for healthcare clients.
Regards,
Jeff

3 years ago

Hi @alvin.useree, thanks for the info.
What version of the SQL driver are you using? I have:

  • mssql-jdbc-9.2.1.jre8.jar (Own driver, Microsoft)
  • sqljdbc42.jar (Own driver, came with Collibra)
  • sql-server-7662.jar (Collibra provided, paid)
    But I dont see the Tables field in your image.
    I only see ‘Tables excluded from registration’ at the bottom of the connection parameters window. Thanks again!

157 Messages

3 years ago

Good to see activity on this one!

@jeffrey.evoy I’ve had no luck with it unfortunately, I’m quite sure it isn’t possible for the Oracle JDBC yet but would be keen to hear where you get to with your conversation.

@lzuloaga.health.ucsd.edu I’m using the Collibra provided one - you’ll need to configure the driver itself to have the property when you add the JDBC driver. Let me know if that doesn’t make sense and I’ll send some documentation or a list of instructions.

@nramesh.levi.com I am indeed using Job Server. The Tables property I referenced is specific to SQL connections - are you using SQL? If so, I can put 2 and 2 together for you quickly.

Loading...