Community Resources
Participate
Learn
Support
User Groups and Events
We’re excited to share the improved Collibra Community ! Based on feedback from our members, we’ve made big improvements to enhance your experience and make it easier to collaborate, find information, and stay connected. What’s new? A modern, updated design for more straightforward navigation A better forum experience with improved discussion filtering A more robust customer user group experience for deeper engagement Improved email communications and notifications to keep you informed A new Welcome Center to help members get started Note: As we update the back-end, your points and badges will not be available on the platform until mid-to-late 2025. Once complete, your gamification information will be restored. What’s stayed the same? The Community URL, email ([email protected] ), and login process remain unchanged. All discussions, knowledge base article,s and resources are still available If you have any questions, please reach out to [email protected] . We’re here to help! Please see this post on how to start a discussion We can’t wait for you to explore the new Collibra Community! Best, The Collibra Community Team Dear Collibra Community, I am currently trying to test the HTTP Task within a Collibra workflow. Overall, the HTTP task documentation is well described, but I am not receiving a successful response. The Edge Connection in the Edge Site should be correctly setup since the Test Connections gives Succeeded as feedback. Hence, I wanted to furtherly review the request and response by analysing the details through a following Script Task. The Script Task contains the following Script: loggerApi.info("***********************************") loggerApi.info("***HTTP Response: ${response_json}***") loggerApi.info("***HTTP Request: ${RequestUrl}***") The response_json variable is defined in the Response variable name attribute of the HTTP Task and I assumed that the RequestUrl variable can be used as it is stated in the HTTP task documentation (named in Save request variables). However, the response is logged with the value null , the RequestUrl does not seem to exist and results into an error. You can find the log here: INFO c.c.d.c.api.component.LoggerApiImpl - *********************************** [authenticated_id=00000000-0000-0000-0000-000000900003, trace_id=724a6e103e9d571f5c1f9c4cd6c1403e, trace_flags=01, span_id=6fead334445ce819] INFO c.c.d.c.api.component.LoggerApiImpl - ***HTTP Response: null *** [authenticated_id=00000000-0000-0000-0000-000000900003, trace_id=724a6e103e9d571f5c1f9c4cd6c1403e, trace_flags=01, span_id=769359a9ced1046f] WARN c.c.d.w.s.g.b.SecureGroovyTaskActivityBehavior - Exception while executing scriptTask1 : groovy script evaluation failed: 'javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: RequestUrl for class: Script21' Trace: scopeType=bpmn, scopeDefinitionKey=hTTPTask, scopeDefinitionId=hTTPTask:17:0196430b-a0df-7198-ae9a-ef8b4f9b4cd2, subScopeDefinitionKey=scriptTask1, tenantId=<empty>, type=scriptTask [authenticated_id=00000000-0000-0000-0000-000000900003, trace_id=724a6e103e9d571f5c1f9c4cd6c1403e, trace_flags=01, span_id=ec8775d825dd69b5] It would be very helpful to log the request and response details, but I do not understand why the response is null respectively how to log the request. Does anybody have experience with this or any idea how to resolve this problem? Best regards, Felix We’re excited to invite you to an insightful Ask Me Anything session with Databricks and Collibra. Use this opportunity to connect with subject matter experts from both organizations where you can ask questions about how your organization can tap into their combined power and better understand: Why your organization needs Collibra alongside Databricks Unity Catalog How you can scale AI initiatives and enable your people to build on each other’s successes Real-world use cases that showcase the Collibra and Databricks advantage This Q&A promises invaluable insights and an open forum for all your questions. You can watch the video [here] and read the live questions from the session below. _________________________________________________________________________________________________________ Question: Both Collibra and Databricks talk about access governance. What are the differences between their approaches, and what specific capabilities does each platform offer in this area? Answer: Collibra gives users visibility and context around data access, including who should have access and why. It helps users discover and understand the data available in Databricks and any other source, including the business context around who owns it, what it's used for, and the quality of the data to determine if it’s the right data for the use case. Once the right data is identified, the user can request access—directly in Collibra—and where required, Collibra will trigger the business workflows to secure the appropriate approvals. Once access is approved, Collibra pushes the request to Databricks Unity Catalog, which provides the policy enforcement layer and technical capabilities to ensure that policies are executed efficiently. Question: Can you talk more about how Collibra can help streamline access management and requests on my Databricks Unity catalog? Answer: There are a couple of different routes. The most obvious one is that most organizations have an agreed-upon process for provisioning data in source systems, including Databricks. Collibra allows you us to search for the right data, understand the context, and request access to it. It’s much like shopping for something on a popular shopping site—you find the product you want, add it to your shopping basket, and check out. Then Collibra could integrate with an internal ticket system that you have internally to put it through the approval process, providing an audit history of the approval. Further, we can extend our capabilities with Collibra Protect by pushing the policy down to Databricks Unity Catalog and translating it from a natural English language to a row filtering or column masking policy that Databricks Unity Catalog can use. In other words, if I want to give marketing access to sensitive data like customers' first, last, and email addresses, you can set up a policy that hashes these out using natural language. The policy is pushed down to Databricks Unity Catalog, which will then do the heavy lifting. Question: Are there any plans to get source tagging in Collibra from the JDBC connection to Databricks? Also, are there any plans for allowing profiling and sampling with Collibra’s Unity Catalog integration? Answer: Collibra has worked with some of our customers to successfully push additional tagging and context from Collibra to Databricks through a custom integration, so if you have a near-term need, we have an accelerator that can help. Currently, metadata exchange between Collibra and Databricks is one-way, with Collibra ingesting metadata from Unity Catalog. Bidirectional metadata exchange between Databricks Unity Catalog and Collibra is currently on our roadmap for this year. The exception is Collibra Protect, available today, which allows policies defined in Collibra to be enforced within the Databricks environment. Question: Will the "bidirectional metadata transfer" include the ability for a change made in Collibra to push updates into Databricks Unity Catalog? If so, can we tell Collibra which pieces of metadata should not be altered in this way (for example, table schema metadata)? Answer: As part of the integration and synchronization between Databricks Unity Catalog and Collibra, not only would metadata and lineage information come from Databricks Unity Catalog to Collibra, but Collibra metadata would be able to be ingested into Databricks Unity catalog. We certainly understand that customers have a lot of flexibility in capturing and managing metadata within Collibra. We would have to provide functionality to specify what metadata pieces you want to push back to Databricks Unity Catalog. We're developing that capability now, so I can't get too far into the details. Question: Are there any plans to integrate Collibra business glossary terms into auto-assign certified glossary terms to Delta Lake columns instead of using Databricks Assistant AI to generate the column description? Answer: From a Collibra perspective, you should be able to leverage the bidirectional metadata exchange on our roadmap to push metadata, including business glossary terms, back into Databricks. While we can't speak to all the specifics of the solution, we have planned the capability to push tags from Collibra into Databricks as a way to curate different objects in Databricks Unity Catalog, and I think that will help to share some business context between the two – for example, for auto-categorization and for column names, but it’s a bit of an open question as to where you're going to want to do that curation. Having much of it done in your enterprise business glossary makes sense. It is our understanding that a future release will give you the ability to get SQL-based lineage. In later releases, you will see incremental updates that support wider lineage capabilities like capturing Python transformations, volumes, notebooks, etc.In these future capabilities, you will be able to push metadata, including business glossary terms, back into Databricks. This capability may require further enhancement if it is not part of our initial integration launch. Question: We currently have to auto-stitch Databricks metadata back to legacy source systems. Is there anything on Collibra’s roadmap to support this automation? Answer: Besides the metadata ingestion from Databricks Unity Catalog, Collibra integrates with Databricks Unity Catalog to bring in the technical lineage that Databricks captures. You can stitch it together with your Collibra lineage—whether with PowerBI, Tableau, or ETL sources. On Collibra’s roadmap, we plan to enhance our technical lineage integrations with Databricks further. For example, volumes, notebooks, and SQL transformations happening within Databricks are some other items on our roadmap regarding technical lineage and technical lineage integration between Collibra and Databricks. If your organization wants to do more in this area, Collibra would love to have a follow-up conversation to better understand your situation. Question: Will the Collibra Databricks connection support lineage for indirect dependencies in the lineage? Answer: Collibra leverages the lineage information available within Databricks and system tables, and if you are experiencing a gap, both organizations would be interested in understanding your situation in more detail. Please reach out to your Databricks account team. Once we have a better understanding, Databricks can work with Collibra to see how that gap could be filled, as you can never be too ambitious about what you incorporate with lineage.. Question: We are struggling to visualize how Collibra and Databricks can work together as part of a data engineer's natural process/journey to seamlessly allow an engineer to explore the data catalog and then transition to using those data assets in their pipeline. Answer: It comes down to the personas that will be using that Databricks data. Collibra’s Data Marketplace lets you answer any persona’s questions up front, allowing them to channel into Databricks correctly. For example, maybe a data engineer is looking for data that's curated or looked after by another line of business and wants to understand how they are calculating KPIs in that particular data. Collibra has a variety of business and operational contexts, providing important details not available in the Unity Catalog. Question: Is there any plan to add support for Microsoft Entra ID to connect to Unity Catalog instead of using only Databricks service principals? Answer: From Databricks: This is like a single sign-on. We're always looking at additional ways to enhance our partnerships, including our hyperscaler partners, to ensure we're working the right way with their product lines. I don't have information about this specific one, but if this is of interest to our joint customers with Microsoft, please reach out to your Databricks account team. Question: Can Collibra pull DLT and traditional notebook plus Delta table transformation lineage from Databricks? Answer: When it comes to DLT, certain capabilities are supported today with Collibra, and some are on the roadmap. This is based on what's available to Collibra from Databricks.For notebooks, that is on the roadmap for Collibra for the second half of this year after volumes. In Databricks, I believe that is captured today if you are using materialized views or streaming tables that use DLT under the hood. But there may be a gap if you're directly creating DLT pipelines. Don't hesitate to get in touch with your Databricks account team and let them know the specific item you're looking for. We’ll check with the lineage team on timelines if there is a gap. If it's available on Databricks, Collibra will certainly pull that information. Question: We're currently migrating old workloads from our enterprise data warehouse on-prem. Can I help accelerate? Answer: We've seen a number of customers leverage Collibra to support their journey to the cloud or migration from one data store to another. Migration is an interesting challenge, and you can use Collibra to accelerate your journey in a couple of ways. Data Quality is top of mind. For example, suppose you're moving from more of a legacy technology, like operational SQL servers or Oracle DB and PostgreSQL DBS, to Databricks in the future. In that case, you can use Collibra data quality to help measure the underlying quality of the system and understand quality issues before being moved. Another example is assigning ownership for review. At Collibra, we have a very flexible operating model that enables you to assign ownership and responsibility to the right individuals. This is important as you prepare your data to move and for ongoing management and quality monitoring after the move. From a metadata and lineage perspective, it's absolutely critical to understand data before it’s migrated to see how it flows and transforms between systems. This will allow you to address any potential dependencies or challenges as you move critical workloads. Question: How does Collibra help with governance in AI applications like AI/BI genie? Answer: The great thing about working with Databricks and Collibra together is that Databricks allows you to have your data platform, your data warehouse, and your AI governance layer all in one place. The great thing about working with Collibra is that you can expose those technical capabilities to a much wider user group. I think a lot of compliance folks or legal folks are sitting in these organizations knowing that they have to get involved in these AI projects early, but they may not know how to do it or which platform to use to approach their data science team. Collibra’s AI governance capability really allows more parts of the organization to be involved in the AI story so that they can get their input on requirements and expected outcomes documented and considered at the beginning of an organization's AI journey rather than toward the end. Question: What's the experimental capabilities to elevate? Answer: Genie, an AI assistant capability, relies heavily on Unity Catalog, metadata. The richer the metadata, the richer your responses will be from this AI Genie product from Databricks. That's where, you know, Collibra can help because Collibra would have richer metadata across the enterprise that could be used as part of the response when you ask the question. Question: Can we connect lineage from tools like Power BI to Databricks, and how does that work? Answer: The Power BI connector definitely works back into Databricks Unity Catalog so that we can stitch together Databricks’ lineage with Power BI lineage. This is great because when people are fetching data from the warehouse into that Power BI data model or just directly querying into reports, we not only fetch those reports in the lineage but also interpret the Dax expressions using AI. This gives you full visibility on what data is being used for what mission-critical reports. Question: We have metadata and lineage and unity catalog. And then we have it in Collibra. What's the benefit of having both? Answer: It's not metadata and lineage in Collibra and metadata and lineage and in Unity Catalog. They're like two separate entities doing two separate things. The idea is that metadata and lineage form the linchpin that holds the systems together. Collibra pulls up the metadata and the lineage (and hopefully soon will also have that full bidirectional sync). In addition, the data quality gets created in Collibra Data Quality & Observability, and the processing gets pushed down to Databricks. So, there's already a lot of traversing between the two systems that rely on metadata. Their value is, of course, the metadata. And the lineage is a product of the fantastic things you're doing with Databricks, including delivering analytics and AI at scale and speed. What you're doing in Collibra is taking that metadata and then enriching it. It’s kind of like an inverted V funnel where everything has a center of gravity around metadata and lineage, where the two systems stick together. But the value of Collibra is that we're extending with all of the metadata that you don’t have in Unity Catalog – the business process, the terminology, the KPI definition, the use cases, the DQ metrics. All of these things come together to make it one holistic journey - find the Data Product and work your way down, understand everything as you go to drive fully informed, compliant access to data ultimately."So it's not one or the other. The two together produce something more significant than the sum of the two. Question: How does AI governance work with Databricks MLflow? Answer: From a Collibra perspective, our partnership with Databricks is trying to make it easier for organizations to adjust the metadata and lineage information so they don't have to use an API integration and maintain that. So, today, we support AI governance and Unity Catalog models. We are working on supporting MLflow as well, but today, it would be the customer's responsibility to extract or leverage the metadata in MLflow and push it into Collibra in support of AI governance. From Databricks: This actually kind of tags really nicely to the previous question about where you keep your metadata or how you have multiple copies of metadata. The metadata, the model metadata, and the table metadata that's in Unity Catalog are the foundations of your technical data governance solution. Then, you can use that information in multiple places, including your business data catalog, like Collibra. This is just another example of that. Instead of syncing table metadata to be curated in Collibra, you're thinking of creating model metadata. AI use cases in Collibra are another way to collaborate and bring what you have in Databricks to a broader user group. ______________________________________________________________________________________________________________ Thank you for participating in the AMA |Why Databricks and Collibra are better together. I f you have more questions you would like answered - comment below, and we will get you in touch with one of our experts.