4 Messages


200 Points

Wednesday, April 3rd, 2024 4:43 PM

quick start for data engineers j rest api pratical calls examples

Hi, im very pratical senior data architect/engineer and i understand collibra concepts but i would like to understand how data engineers could proactively create/update databases schemas, tables, views etc by interacting with collibra.

I think there's 2 ways using collibra:

way 1 (not good enough - potential collibra out of sync) developers execute sql scripts + collibra syncs up metadata (only AFTER database already changed): 

  1. data engineers change database directly with sql scripts.
  2. collibra will somehow connect, pull metadata, detect changes.
  3. collibra notifies users of changes and impacts.

way 2 - (best - proactive mode - developers always need use collibra to create,  update or get schema of database objects sooo no more executing ddl sql scripts first and have lag of collibra aware of changes after):

way 2 scenario 1 - developers initiate schema change:

  1. data engineers call api to create or modify some asset (table, view).
  2. collibra returns to data engineers ddl script which is executed to physically modify database.
  3. data engineer inform collibra success on ddl operation (maybe to uodate flag that asset no longer pending dd execution).

way 2 scenario 2 - collibra users modify schema so developers can get notified and/or programmatically request of any changes list + request ddl command from collibra + execute them + inform ddl successful changed.

my doubts:

  1. are above two modus operandi possible?
  2. can you share example (quickstart) showing management of 1 single table: collibra user creates table and columns assets (any flag showing if asset physically present in physically database or pending for change execution etc?) --> rest api http request to get list of changes (one new table shall come) --> rest api http request to get schema of new table (can collibra return ddl sql or must developers manually generates based on non ddl sql schema response?) --> finally developer execute change ddl script creating new table in database --> and developer informs collibra asset physically created/updated (avoiding collibra to be out of sync with physical database)
  3. programmatically workflow : developer initiated changes on collibra by creating a new table asset + respective columns assets + respective columns using http rest api (example please) --> developer gets feedback (ok  pending approval, denied and reason etc) 

Maybe i was unlucky, but i feel bit lost with all many documentation that nicely explains to data Stewarts how to be "collibra users" but im finding bit hard to understand from non collibra user as data engineer how to have a programmatically interaction with collibra metadata store to proactively first adk collibra about changes vs the 99% of real data engineering world they just execute ddl sql scripts directly to databases and using governance platforms as just repository of users to notify upon detecting changes collibra vs database:

1. would be nice also a quickstart/steps in how to setup and manage collibra process to pull and detect changes).

2. where to find what type of pushdown operations can collibra executes on physical target database (only pull vs also create table/alter table add/modify column), to udf support to implement data masking ?

Thanks in advance, hope answers already exists.

Last but not least, would it be possible a free trial where i could test above against my free snowflake account? simple creation of table in both ways.

Cheers ~

Emanuel O.

No Responses!