Community Resources
Participate
Learn
Support
User Groups and Events
Version update: March 31, 2025 What is the Collibra Community? The Collibra Community is your dedicated space to learn best practices, share insights and collaborate with 11,000+ data citizens worldwide. You can use the Community to access valuable content, tutorials, best practices, product documentation and more. How do I create an account and join the Community? Visit community.collibra.com and click on the person icon in the top left corner to register for the Community. How do I log in to the platform? Log in with this login link and use the SSO method, which allows you to use the same credentials as your other Collibra sites without creating new logins and passwords. How do I update my profile? Once you are logged into the Community, you can view your profile in the navigation menu. Click the person icon to the right of the bell icon. Click “View profile” and edit using the green “Edit” button. We encourage you to welcome your profile with a headshot and About description for a more engaged Community experience. How do I post a discussion? Visit community.collibra.com/discussion/ Click your preferred Category or view all categories from the left side panel Click ‘Start a discussion’ to create a post Visit this discussion post for more information. Can I edit my notification settings? Yes, you can edit your notification settings and customize your preferences for receiving updates, alerts, and notifications related to discussions, mentions, and activity within the community. Click your profile icon > “Profile Settings” > “Updates” to explore your options. How do I search for specific topics or discussions? Use our search bar to search for specific topics or visit the discussion page to view all of our active forums. How do I report bugs or issues on the new platform? If you encounter any bugs or issues on the new platform, please report them to [email protected] . Your feedback is invaluable in helping us improve the platform for everyone. What should I do if I encounter inappropriate behavior or content? If you encounter inappropriate behavior or content in the Collibra Community, please let us know by clicking on a discussion post and “Mark as Spam” or “Flag as Inappropriate.” You can also email us directly at [email protected] . Why can't I find my previous gamification points and badges in the Community? We recognize the importance of reputation points as a measure of community engagement. We are making improvements to our gamification to improve your Community experience. We plan to bring back your gamification information soon. Check back for more updates! How can I provide feedback on the Community? We strive to provide our community members with the best community experience possible. We would love your feedback to help identify areas for improvement and gather suggestions for new features and enhancements. We send a bi-annual Community member survey in July and December of each year. You can also email us directly at [email protected] . What are certified Data Citizen® User Groups? Data Citizens® User Groups are interactive virtual groups designed to bring together fellow Collibra data enthusiasts to share insights, collaborate on best practices and drive innovation. Certified groups gain exclusive access to resources, support, beta testing, product roadmaps and expert guidance, with Collibra providing the tools for success. How do I join a user group? Browse our user group list and select the group you are interested in joining. Sign up via the dedicated landing page, and we’ll send you next steps from there. Once you join, you can participate in discussions, attend events and connect with other members. How do I stay up to date with Community news? The Collibra Community sends our members a quarterly newsletter, including the latest community news, events and more. We also post weekly announcements on the Community. We’re excited to invite you to an insightful Ask Me Anything session with Databricks and Collibra. Use this opportunity to connect with subject matter experts from both organizations where you can ask questions about how your organization can tap into their combined power and better understand: Why your organization needs Collibra alongside Databricks Unity Catalog How you can scale AI initiatives and enable your people to build on each other’s successes Real-world use cases that showcase the Collibra and Databricks advantage This Q&A promises invaluable insights and an open forum for all your questions. You can watch the video [here] and read the live questions from the session below. _________________________________________________________________________________________________________ Question: Both Collibra and Databricks talk about access governance. What are the differences between their approaches, and what specific capabilities does each platform offer in this area? Answer: Collibra gives users visibility and context around data access, including who should have access and why. It helps users discover and understand the data available in Databricks and any other source, including the business context around who owns it, what it's used for, and the quality of the data to determine if it’s the right data for the use case. Once the right data is identified, the user can request access—directly in Collibra—and where required, Collibra will trigger the business workflows to secure the appropriate approvals. Once access is approved, Collibra pushes the request to Databricks Unity Catalog, which provides the policy enforcement layer and technical capabilities to ensure that policies are executed efficiently. Question: Can you talk more about how Collibra can help streamline access management and requests on my Databricks Unity catalog? Answer: There are a couple of different routes. The most obvious one is that most organizations have an agreed-upon process for provisioning data in source systems, including Databricks. Collibra allows you us to search for the right data, understand the context, and request access to it. It’s much like shopping for something on a popular shopping site—you find the product you want, add it to your shopping basket, and check out. Then Collibra could integrate with an internal ticket system that you have internally to put it through the approval process, providing an audit history of the approval. Further, we can extend our capabilities with Collibra Protect by pushing the policy down to Databricks Unity Catalog and translating it from a natural English language to a row filtering or column masking policy that Databricks Unity Catalog can use. In other words, if I want to give marketing access to sensitive data like customers' first, last, and email addresses, you can set up a policy that hashes these out using natural language. The policy is pushed down to Databricks Unity Catalog, which will then do the heavy lifting. Question: Are there any plans to get source tagging in Collibra from the JDBC connection to Databricks? Also, are there any plans for allowing profiling and sampling with Collibra’s Unity Catalog integration? Answer: Collibra has worked with some of our customers to successfully push additional tagging and context from Collibra to Databricks through a custom integration, so if you have a near-term need, we have an accelerator that can help. Currently, metadata exchange between Collibra and Databricks is one-way, with Collibra ingesting metadata from Unity Catalog. Bidirectional metadata exchange between Databricks Unity Catalog and Collibra is currently on our roadmap for this year. The exception is Collibra Protect, available today, which allows policies defined in Collibra to be enforced within the Databricks environment. Question: Will the "bidirectional metadata transfer" include the ability for a change made in Collibra to push updates into Databricks Unity Catalog? If so, can we tell Collibra which pieces of metadata should not be altered in this way (for example, table schema metadata)? Answer: As part of the integration and synchronization between Databricks Unity Catalog and Collibra, not only would metadata and lineage information come from Databricks Unity Catalog to Collibra, but Collibra metadata would be able to be ingested into Databricks Unity catalog. We certainly understand that customers have a lot of flexibility in capturing and managing metadata within Collibra. We would have to provide functionality to specify what metadata pieces you want to push back to Databricks Unity Catalog. We're developing that capability now, so I can't get too far into the details. Question: Are there any plans to integrate Collibra business glossary terms into auto-assign certified glossary terms to Delta Lake columns instead of using Databricks Assistant AI to generate the column description? Answer: From a Collibra perspective, you should be able to leverage the bidirectional metadata exchange on our roadmap to push metadata, including business glossary terms, back into Databricks. While we can't speak to all the specifics of the solution, we have planned the capability to push tags from Collibra into Databricks as a way to curate different objects in Databricks Unity Catalog, and I think that will help to share some business context between the two – for example, for auto-categorization and for column names, but it’s a bit of an open question as to where you're going to want to do that curation. Having much of it done in your enterprise business glossary makes sense. It is our understanding that a future release will give you the ability to get SQL-based lineage. In later releases, you will see incremental updates that support wider lineage capabilities like capturing Python transformations, volumes, notebooks, etc.In these future capabilities, you will be able to push metadata, including business glossary terms, back into Databricks. This capability may require further enhancement if it is not part of our initial integration launch. Question: We currently have to auto-stitch Databricks metadata back to legacy source systems. Is there anything on Collibra’s roadmap to support this automation? Answer: Besides the metadata ingestion from Databricks Unity Catalog, Collibra integrates with Databricks Unity Catalog to bring in the technical lineage that Databricks captures. You can stitch it together with your Collibra lineage—whether with PowerBI, Tableau, or ETL sources. On Collibra’s roadmap, we plan to enhance our technical lineage integrations with Databricks further. For example, volumes, notebooks, and SQL transformations happening within Databricks are some other items on our roadmap regarding technical lineage and technical lineage integration between Collibra and Databricks. If your organization wants to do more in this area, Collibra would love to have a follow-up conversation to better understand your situation. Question: Will the Collibra Databricks connection support lineage for indirect dependencies in the lineage? Answer: Collibra leverages the lineage information available within Databricks and system tables, and if you are experiencing a gap, both organizations would be interested in understanding your situation in more detail. Please reach out to your Databricks account team. Once we have a better understanding, Databricks can work with Collibra to see how that gap could be filled, as you can never be too ambitious about what you incorporate with lineage.. Question: We are struggling to visualize how Collibra and Databricks can work together as part of a data engineer's natural process/journey to seamlessly allow an engineer to explore the data catalog and then transition to using those data assets in their pipeline. Answer: It comes down to the personas that will be using that Databricks data. Collibra’s Data Marketplace lets you answer any persona’s questions up front, allowing them to channel into Databricks correctly. For example, maybe a data engineer is looking for data that's curated or looked after by another line of business and wants to understand how they are calculating KPIs in that particular data. Collibra has a variety of business and operational contexts, providing important details not available in the Unity Catalog. Question: Is there any plan to add support for Microsoft Entra ID to connect to Unity Catalog instead of using only Databricks service principals? Answer: From Databricks: This is like a single sign-on. We're always looking at additional ways to enhance our partnerships, including our hyperscaler partners, to ensure we're working the right way with their product lines. I don't have information about this specific one, but if this is of interest to our joint customers with Microsoft, please reach out to your Databricks account team. Question: Can Collibra pull DLT and traditional notebook plus Delta table transformation lineage from Databricks? Answer: When it comes to DLT, certain capabilities are supported today with Collibra, and some are on the roadmap. This is based on what's available to Collibra from Databricks.For notebooks, that is on the roadmap for Collibra for the second half of this year after volumes. In Databricks, I believe that is captured today if you are using materialized views or streaming tables that use DLT under the hood. But there may be a gap if you're directly creating DLT pipelines. Don't hesitate to get in touch with your Databricks account team and let them know the specific item you're looking for. We’ll check with the lineage team on timelines if there is a gap. If it's available on Databricks, Collibra will certainly pull that information. Question: We're currently migrating old workloads from our enterprise data warehouse on-prem. Can I help accelerate? Answer: We've seen a number of customers leverage Collibra to support their journey to the cloud or migration from one data store to another. Migration is an interesting challenge, and you can use Collibra to accelerate your journey in a couple of ways. Data Quality is top of mind. For example, suppose you're moving from more of a legacy technology, like operational SQL servers or Oracle DB and PostgreSQL DBS, to Databricks in the future. In that case, you can use Collibra data quality to help measure the underlying quality of the system and understand quality issues before being moved. Another example is assigning ownership for review. At Collibra, we have a very flexible operating model that enables you to assign ownership and responsibility to the right individuals. This is important as you prepare your data to move and for ongoing management and quality monitoring after the move. From a metadata and lineage perspective, it's absolutely critical to understand data before it’s migrated to see how it flows and transforms between systems. This will allow you to address any potential dependencies or challenges as you move critical workloads. Question: How does Collibra help with governance in AI applications like AI/BI genie? Answer: The great thing about working with Databricks and Collibra together is that Databricks allows you to have your data platform, your data warehouse, and your AI governance layer all in one place. The great thing about working with Collibra is that you can expose those technical capabilities to a much wider user group. I think a lot of compliance folks or legal folks are sitting in these organizations knowing that they have to get involved in these AI projects early, but they may not know how to do it or which platform to use to approach their data science team. Collibra’s AI governance capability really allows more parts of the organization to be involved in the AI story so that they can get their input on requirements and expected outcomes documented and considered at the beginning of an organization's AI journey rather than toward the end. Question: What's the experimental capabilities to elevate? Answer: Genie, an AI assistant capability, relies heavily on Unity Catalog, metadata. The richer the metadata, the richer your responses will be from this AI Genie product from Databricks. That's where, you know, Collibra can help because Collibra would have richer metadata across the enterprise that could be used as part of the response when you ask the question. Question: Can we connect lineage from tools like Power BI to Databricks, and how does that work? Answer: The Power BI connector definitely works back into Databricks Unity Catalog so that we can stitch together Databricks’ lineage with Power BI lineage. This is great because when people are fetching data from the warehouse into that Power BI data model or just directly querying into reports, we not only fetch those reports in the lineage but also interpret the Dax expressions using AI. This gives you full visibility on what data is being used for what mission-critical reports. Question: We have metadata and lineage and unity catalog. And then we have it in Collibra. What's the benefit of having both? Answer: It's not metadata and lineage in Collibra and metadata and lineage and in Unity Catalog. They're like two separate entities doing two separate things. The idea is that metadata and lineage form the linchpin that holds the systems together. Collibra pulls up the metadata and the lineage (and hopefully soon will also have that full bidirectional sync). In addition, the data quality gets created in Collibra Data Quality & Observability, and the processing gets pushed down to Databricks. So, there's already a lot of traversing between the two systems that rely on metadata. Their value is, of course, the metadata. And the lineage is a product of the fantastic things you're doing with Databricks, including delivering analytics and AI at scale and speed. What you're doing in Collibra is taking that metadata and then enriching it. It’s kind of like an inverted V funnel where everything has a center of gravity around metadata and lineage, where the two systems stick together. But the value of Collibra is that we're extending with all of the metadata that you don’t have in Unity Catalog – the business process, the terminology, the KPI definition, the use cases, the DQ metrics. All of these things come together to make it one holistic journey - find the Data Product and work your way down, understand everything as you go to drive fully informed, compliant access to data ultimately."So it's not one or the other. The two together produce something more significant than the sum of the two. Question: How does AI governance work with Databricks MLflow? Answer: From a Collibra perspective, our partnership with Databricks is trying to make it easier for organizations to adjust the metadata and lineage information so they don't have to use an API integration and maintain that. So, today, we support AI governance and Unity Catalog models. We are working on supporting MLflow as well, but today, it would be the customer's responsibility to extract or leverage the metadata in MLflow and push it into Collibra in support of AI governance. From Databricks: This actually kind of tags really nicely to the previous question about where you keep your metadata or how you have multiple copies of metadata. The metadata, the model metadata, and the table metadata that's in Unity Catalog are the foundations of your technical data governance solution. Then, you can use that information in multiple places, including your business data catalog, like Collibra. This is just another example of that. Instead of syncing table metadata to be curated in Collibra, you're thinking of creating model metadata. AI use cases in Collibra are another way to collaborate and bring what you have in Databricks to a broader user group. ______________________________________________________________________________________________________________ Thank you for participating in the AMA |Why Databricks and Collibra are better together. I f you have more questions you would like answered - comment below, and we will get you in touch with one of our experts. Domain structure is a critical aspect of a well-designed operating model. The recommendations here will help you design your domain structure to achieve these objectives. Impact Following these best practices will support platform adoption by improving: Platform usability. Clarity of governance responsibilities. Ease of system. Best practice recommendations Domains are collections of assets with similar characteristics, attributes or similar responsibilities or roles. When deciding how to group assets into domains, think about which assets will be acted on similarly within the system. For example, assets that have the same governance process or are assigned similar responsibilities. Since assets can only be a member of a single domain, you may run into situations where it is hard to decide between 2 or more possible domain assignments. For example, an asset that might have stewards in a geographically defined domain, such as the North America Group, and a business-function-defined domain, like Finance. In these cases, you should choose the domain of the ultimate “owner” of the asset. These decisions will sometimes drive the need to create new communities or even shared communities. For more information, go to the Community structure best practice article. Only create domains when you have a set of assets which will belong to them. Empty domains can cause clutter and confuse your users. Domain names must be unique within a single community but can be duplicated across communities. However, it is best to name each domain uniquely. For instance, instead of having a Glossary domain within multiple communities, give them specific names such as Finance Glossary and Marketing Glossary. Add a description to your domain. It is a best practice to provide a clear and useful description for users who may not otherwise be familiar with the content. While all domains are visible to everyone by default, it is possible to hide domains from view based on users or groups. However, it is a best practice to use this feature sparingly and only where it is clearly required. For example, a domain of sensitive reference data, like salary scales should have restricted visibility. While enabling automatic hyperlinking is not recommended, when using it, it is necessary not only to enable the system-level automatic hyperlinking but also specifically for the domains whose assets you want to be hyperlinked. This should be applied judiciously to smaller, business asset domains rather than broadly to avoid performance issues since the links are dynamically maintained. Validation criteria The Operating Model Diagnostic workflow, which is available from your Customer Success representative, will help identify empty domains and domains without descriptions, as well as domains without stewards and domains where automatic hyperlinking is enabled. The Operating Model Reverse Engineering , OMRE, available in Marketplace, can make it easier to find duplicate-named domains across communities. Additional information For more information, go to the following resources: See the Collibra Documentation Center for more information on any of the above elements. You can run the OMRE to help identify the existence of duplicate domain names across communities. Domain structure and community structure work together so our Community structure best practice is another useful resource. Limit the number of custom asset types and statuses to optimize both operating model maintenance and user experience. Impact Use out-of-the-box (OOTB) asset types as much as possible to ensure maximum compatibility with future product features. Custom asset types should be used to meet your specific business requirement, but it is best to avoid using too many. If there are a lot of custom asset types, you should review them and search for duplicates, as well as overlapping or unused asset types and confirm that custom asset types cannot be replaced by the use of OOTB asset types. Ensure that you have the relevant asset types available before creating a custom asset type. This is to avoid creating unused custom asset types which can complicate the governance of the operating model. Introduce a Data Office governance process for the creation of custom asset types. This process should provide guidance on when a custom asset type is absolutely necessary and reduce the risk of unused asset types. You should also review the number of users with permission to create custom asset types, as too many users can result in unused, duplicated or unnecessary custom asset types. Custom statuses are encouraged to support the asset life cycle. However, keep the number of possible statuses for a given asset type as small as possible to avoid confusing users. If there are more than 30, consider consolidating to fewer statuses. Topic area Operating Model → Metamodel → Asset Model → Asset types Operating Model → Execution and Monitoring Concepts → Status types Monitoring this practice For customers with established production models: Run the Operating Model Reverse Engineering , or OMRE, on a regular basis to identify the elements in this article. Contact your Collibra representative to run the Operating Model Diagnostic. Additional information For more information, go to the following resources: Asset types Overview of packaged asset types Create an asset type Collibra maintains a 30-day archive of log files by default, but in special circumstances CollibraSupport can temporarily extend this period. Impact Comply with industry-specific log retention requirements. Keep your environment performant, to avoid potential risk such as: Issues with restores. Issues with Import jobs. Recommendations When setting up JDBC logs, set parameters to follow the same folder structure as Jobserver logs to ensure they are written to easy to find locations with the appropriate permissions. In general operation, the default logging level is usually sufficient. You should only set logging levels to a higher level of detail when you are troubleshooting issues with Collibra Support or Engineering teams. You should only maintain this logging change until the issue has been duplicated and captured in the logs. You should then return the logging levels to their default, because higher levels of logging can exhaust the available disk space faster than monitoring tools can detect. By default, the system retains 30 days of logs in the archive. If there is a business or regulatory requirement to retain log files for longer than 30 days, we recommend external scripting to call the Collibra Support REST API (<base console url>/docs/rest/index.html#/support) to move archived files to a non-Collibra storage space, such as an S3 bucket, every 2 weeks. Validation criteria Use Collibra Console → Settings → Logs to monitor these recommendations. Additional information For more information, go the following resources: Logging Environment log settings for DGC services Environment log settings for Repository services By setting and observing benchmark maximums for key model elements, you can improve your user experience in areas such as navigation and readability, as well as avoid overtaxing your system resources. Impact Follow these recommendations to maximize the scalability and adoption of your implementation by improving performance, reducing the size of backup/restore files and time-consuming import and export queries. Equally, user interface and experience are improved by greater readability and navigation, and by supporting governance processes that are more practical and easier to sustain. Best practices Domain-level recommendations Keep the number of domains within a single community below 1,000 to aid navigability for users. Use ownership, stewardship, or governance councils as a basis for dividing communities with more domains into multiple communities. Try to keep the total number of domains in your model below 10,000, as any more may make the model difficult to manage and navigate for users. Use any business dimension, such as Line of Business, geographic region or data domain, as a logical basis to consolidate the number of domains. For example, consolidate all customer schemas or product schemas into a single domain. Asset-level recommendations The number of attributes per asset should not exceed 500. Beyond this limit, it becomes extremely difficult for users to read or navigate what distinguishes one asset from another. As with the benchmarks above, consider a logical or business dimension that will allow you to consolidate attributes. Automatic hyperlinking of assets is turned off by default, but it can be turned on. However, if you allow the number of automatically hyperlinked assets to exceed one million, it can slow performance and be a negatively impact user experience and adoption. If the number of responsibilities, direct or inherited, per asset exceeds 100, governance of the asset becomes difficult to sustain and navigate, and increases the risk of inadvertent conflicts among the asset’s responsibilities. A process with tens of roles involved often represents over-engineering of the governance process. User recommendations Exceeding 20,000 users per user group can lead to performance degradation, therefore use any business dimension, such as Line of Business, geographic region or data domain, as a logical basis to split large user groups into smaller groups. Validation criteria Review the elements above periodically to ensure you are not exceeding the suggested maximums. You can develop a custom workflow to capture volumes of the above and/or use Insights reporting. Additional Information Go to the diagnostics section in Collibra's Documentation Center for more information on any of the above elements. Start by understanding the permissions model: Responsibilities are used to assign a resource role to one or more users and/or user groups . Based on their responsibilities, users can act on the permissions conveyed to them via the resource role. Impact Changes in permissions on global roles can affect users’ access to the designated product features as well as impacting your consumption of Standard licenses. Follow the best practices to: Clarify your users' experience. Reduce confusion and operating model complexity. Recommendations Global roles Global roles grant permissions on product capabilities globally rather than just to specific resources. Therefore, global roles, as defined out-of-the-box (OOTB), should meet most needs and only be changed for special circumstances. Generally, you should use resource roles to develop particular use cases, these are described below. Resource roles When creating resource roles, it’s good practice to start with a list describing all of the roles you envision, outlining their responsibilities and permissions. These definitions should be public within your organization and shared with all users. Specific resource role names are better than generic ones. For example, “Steward” doesn’t necessarily distinguish between data stewards, business stewards and privacy stewards. Each of these more detailed steward role definitions should then carry a differing set of responsibilities and permissions. The names of roles should be self-explanatory and unique to avoid multiple roles with the same role name. However, do not create too many roles with minor distinctions between them as this can lead to confusion. It is best to retain the OOTB resource role names as they are recognized by workflows that call upon them. Responsibilities should be assigned to roles at the highest possible level, such as at the domain or community-level, rather than asset-level, to make it easier to maintain and assign them. All domains, communities and assets should have some responsibility assigned to them, whether it is ownership, stewardship or SME. There should always be someone responsible for each asset. This is particularly important where workflows are involved, as they cannot complete if the called upon responsibilities have not been assigned. A governance best practice is to maintain a hierarchy of assigned roles that describes your escalation process. Validation criteria Review Read-only vs Standard licenses in the User area of Settings to match against global roles. You can also run the Operating Model Diagnostic Report to see the types of roles and the number of people assigned to them. This workflow is available from your Customer Success representative. Additional information For more information, go to the following resources: Resource roles. Users and/or user groups. Permissions.