Hi, We were delighted to host the webinar “Data Products – Because Agents Need Context” . We hope you found the discussion insightful and the content valuable. You can access the on-demand recording at any time, and we encourage you to share it with your colleagues. The presentation deck is available here . In addition, we're pleased to share the Research Briefing from MIT CISR: “Shifting to a Product Mindset for Data” If you have any questions or would like to continue the conversation, please don’t hesitate to reach out to a Collibra representative. Questions/Answers: Q(1): What factors should help identify the best Data Owner for a shared product like Master Data? A(1): Ideally, the best data owner for data products such as master data typically has: - accountability for the business domain that ""most originates"" or has the most at stake with the data - decision-making authority to resolve definitional conflicts across teams - cross-functional visibility and influence - the bandwidth to actually engage For master data specifically (e.g., Customer, Product, Location, etc.), look for a senior business leader whose domain is the primary ""system of record"" sponsor. Critically, ownership is a business accountability, rather than a technical one. If multiple domains share equal stake, a governance committee with a tie-breaking Chair can work, but clear escalation paths are essential. Q(3): Fully agree that data products are a great building block to enable AI / agentic use of data, but you need to provide the agent with a lot of context in order for it to be able to understand and query the data product and get consisten / accurate results. Emerging semantic standards (e.g., OSI) are trying to address this. Can you speak more about how firms are creating this semantic layer between the AI and the product and how Collibra enables this? A(3): This is one of the most critical questions in the space right now, and Collibra has a specific framework for it. The key insight is that semantics alone, making data queryable, is only a fraction of what AI agents need. Collibra's approach is organized around four pillars of context: 1) Semantics: governed metrics, entities, attributes and physical table joins defined in the registry (the technical ""what") 2) Business context: Use cases, stewardship metadata, data sharing agreements (DSAs), and intent-driven access permissions (the ""who"" and ""why"") 3) Business ontology: the business glossary and relationships between business concepts that answer the ""why"" questions (e.g., why did revenue drop?) 4) Assurance: DQ rules, SLAs and data sharing agreements as core promises This is where many emerging semantic standards (including OSI and platform-native approaches like Snowflake Horizon, Databricks Unity Catalog, etc.) are highly valuable, but often focused primarily on the first pillar: making data machine-readable and queryable. That’s necessary, but it’s not sufficient for enterprise-grade AI outcomes. What firms are increasingly doing is using a broader governance platform as the system of record for this context, then exposing it to AI systems through open interfaces. That’s where Collibra plays a critical role. We help organizations centralize business definitions, policies, lineage, ownership, quality signals and semantic metadata, then make that context consumable by agents through APIs, MCP tooling and standards-based exchanges such as YAML and emerging semantic protocols. We’re also applying AI to accelerate the hardest part of this work: building and maintaining the semantic layer itself. For example, Collibra can use existing glossary terms, metadata and physical schemas to generate semantic models, recommend mappings between columns and business concepts, and keep those relationships current with human oversight. So in practice, Collibra becomes the trusted context plane between AI and data products. Instead of an agent guessing what “customer value” or “active policy” means, it can retrieve governed definitions, understand lineage and quality, verify permissions and then act with much higher confidence. That’s ultimately how firms move from AI that can query data to AI that can understand and use data responsibly at scale. Q(4): Can you speak to the need or importance of deconstructing the very complicated, specific "Data Products" of old in order to have more modularized and reusable building block kind of data products for the future? A(4): Many of data products built over the last several years were really large, highly customized outputs designed for a single dashboard, reporting process or business team. They delivered value, but they were often tightly coupled, difficult to reuse, expensive to maintain and not well suited for AI-driven consumption. To unlock the next generation of value, firms need to rethink those legacy data products as composable building blocks rather than finished end-state assets. Instead of creating one monolithic product called “Customer 360” or “Finance Reporting Mart", organizations could break those into reusable components such as trusted customer identity, product hierarchy, transaction history, pricing metrics, risk signals, consent status or revenue definitions. These smaller products can then be assembled dynamically for many use cases. That matters for three reasons: First, reuse and speed. Modular data products let teams assemble new use cases much faster instead of rebuilding logic every time. The same trusted customer identity product might power marketing analytics, service operations, fraud detection and an AI agent simultaneously. Second, adaptability. Business needs change quickly. Large bespoke products are hard to evolve, while modular products can be swapped, upgraded or recombined without redesigning everything upstream and downstream. Third, AI readiness. Agents and AI systems work best when they can discover small, clearly defined, well-governed components with explicit meaning, quality signals and access rules. They struggle when confronted with sprawling, undocumented, purpose-built data structures. Where Collibra helps is enabling organizations to manage that ecosystem with governance at scale. As data products become more modular, you need a way to define ownership, lineage, policies, quality expectations, relationships and discoverability across hundreds or thousands of components. Collibra provides the control plane to manage those reusable assets as data products, not just data sets. So yes, in many cases the future is not building more giant data products. It’s intelligently decomposing yesterday’s complex data products into smaller, trusted, reusable capabilities that can be assembled for whatever the business or an AI agent needs next. Q(5): How do you define the scope of data product so it will have high liquidity but also not endless? I think there is intrinsic conflict b/w analytic cases (which are fluid and prone to random combinatorics) and data architecture which is more frozen in time. A(5): The key principle is to scope data products around stable business concepts, not around use cases. Customers, products and transactions are durable; the analytical use cases built on top of them are fluid. If you design to the use case, the product becomes brittle. If you design to the concept, the product stays stable while the analytic layer composes freely. But "stable concept" should be understood at the right level of granularity. A "Customer" domain, for instance, should not become one monolithic Customer 360 product, as that just recreates the problem under a different name. Instead, it decomposes into distinct, well-governed components: trusted customer identity, consent status, transaction history and risk signals. Each is small enough to be clearly defined and well-governed, but durable enough to power many use cases simultaneously, from marketing analytics to fraud detection to AI agents. Collibra recommends that each component still be anchored to a clear, documented business case. Not to constrain its scope, but to justify its existence, establish success criteria and ensure it delivers measurable value. The business case articulates who needs the product and why; the domain-oriented design ensures it stays reusable long after that initial use case evolves. Q(6): related to Stijn's comment about the importance of speed, how do you make these products known and then "provision" these data products to end users at the 'speed of business'? e.g. self-service or help desk requests or ? A(6): Speed requires removing friction at discovery, access and consumption. That means: 1) A governed internal data marketplace where data products are searchable with quality, ownership and contract metadata visible 2) Automated access request workflows with defined SLAs, replacing manual approval chains 3) Data contracts that pre-define what consumers can expect, reducing integration negotiation 4) Self-service provisioning tied to role-based entitlements with automated access controls, masking and filtering applied Collibra's three-phase "Deliver ROI with data products" journey maps this progression: Phase 1 (Discover) quickly establishes the data marketplace "front door"; Phase 2 (Trust) embeds quality and contracts to further build confidence; Phase 3 (Scale) automates access so it happens in seconds rather than days, critical for both human consumers and AI agents that need data access programmatically. Join us for our webinar ("The data scevenger hunt") on May 12 to learn more. Q(8): How do you quantify ROI of data governance and data products? A(8): ROI typically falls into four buckets: 1) Cost avoidance: Regulatory fines avoided, reduction in rework and duplicate pipeline builds, elimination of "dark data" storage costs 2) Revenue enablement: Faster time-to-market for data-driven products, AI and analytics use cases unlocked by trusted, context-rich data 3) Efficiency gains: Measurable reduction in time data consumers spend finding, understanding and validating data 4) Risk reduction: Breach prevention, audit readiness, compliance cost reduction Each data product can carry financial parameters tied to adoption metrics, turning "data governance investment" into "data product portfolio performance". Establish baselines before you start; track metrics like self-service adoption rate, mean time to data access and volume of trusted assets being reused across AI and analytics initiatives. Q(9): How do users know that a discovered dataset is actually trustworthy? A(9): In Collibra's framework, trust is formally codified in the data contract: a clear, enforceable agreement between data producers and owners that is exposed to data consumers. The data contract specifies the data schema, quality rules, SLAs, access terms, and ownership, not as aspirations, but as promises. Every data product port can have an associated data contract defining what is produced and what the consumer can rely on. Layered on top of the contract are policies: the organizational rules that govern how data can and cannot be used. Where the data contract tells a consumer what the product delivers, policies tell them what they're permitted to do with it: privacy restrictions, purpose limitations, regulatory constraints, and data sharing agreements. Together, contracts and policies give consumers both a quality guarantee and a clear picture of appropriate use, which matters especially as AI agents begin consuming data products autonomously. Beyond contracts and policies, trust signals visible in the Data Marketplace include live DQ scores, certification status, end-to-end lineage showing data origin and transformations, business glossary linkage confirming curated definitions, usage statistics and ratings and comments from other users. The combination of formal promises, enforced policies and visible trust signals means a consumer never has to chase down a data engineer to answer the question "can I trust this and can I use it?" Q(10): How seamless is integration with existing ecosystems, especially legacy systems and custom pipelines? A(10): Collibra integrates via a comprehensive REST API, CLIs, MCP tooling and pre-built connectors to major platforms. (See a list of integrations here.) For legacy systems without native connectors, custom connectors can be built via the API or through partner solutions. The typical approach is to start with metadata harvesting (cataloging what exists via Collibra's integrations) before deeper operational integration. Importantly, Collibra acts as a platform-agnostic governance layer above the execution environment: policies defined in Collibra apply consistently whether data lives in Snowflake, Databricks or on-prem systems, without requiring data movement, providing a unified control layer that doesn't force platform consolidation. Q(11): What about the security of the data while using AI ,how confidential data will be handled? A(11): This is where governance and AI intersect most critically, and it requires governance to evolve from static PDF policies to active, machine-readable metadata. In Collibra, data products can carry data categories (e.g., PII tags), purpose limitations, data sharing agreements (DSAs) and intent-driven access permissions as part of its governed context. AI agents reading this context know what they are (and are not) permitted to do with the data before they ever query it. Data contracts formalize these constraints as legal promises. Collibra also supports automated access controls with masking and filtering applied at access time, so sensitive data isn't exposed beyond its authorized scope. On the strategic level, governance evolves from "visit Collibra to check policy" to "agent calls Collibra API to verify permissions before every action". Q(12): is it fair to say that we have moved the use case product into the solution product leveraging re-usable asset products? A(12): Yes, and that's a sign of maturity in the data product model. The evolution is: governed asset products (stable, reusable building blocks) to solution products (purpose-built compositions for specific use cases). This mirrors what software engineering did in moving from monoliths to services. The compounding benefit is that each new solution product is faster and cheaper to build because the asset layer already exists, is already governed, and is already trusted. Q(13): Where do data do main owners fit in? A(13): Domain owners are the business accountability layer for data products within their domain. We typically see that the data product owner (domain-side) owns the internal definition of a data product along with publishing guidelines, ensures the data product addresses a business issue and maintains compliance with governance rules. The data product build team (federated data steward + engineer) designs, develops and packages data products that meet defined business and technical requirements. The data office provides centralized guardrails. This federated model (centralized standards, domain-led ownership) is what Collibra's flexible operating model is specifically designed to support, and it's what distinguishes Collibra from platforms with rigid, one-size-fits-all governance structures. Q(14): what is the bottleneck or longest work task that makes the data product creation so long? is it defination, getting access to sources, change management or is it the data engineering teams which are very overloaded with existing projects...sounds like hurry up and wait for engineering A(14): We typically find that data product definition takes longer than expected. Getting business and technical stakeholders to agree on scope, semantics, business rules and what "done" looks like is consistently underestimated. Debates about what a "customer" means or which revenue definition to use can consume weeks before a single line of pipeline code is written. Source data access is the next common stall. Even once requirements are clear, getting entitlements to upstream systems, especially across domains or in heavily regulated environments, can sit in approval queues for weeks. And once access is granted, teams often discover the source data quality is worse than assumed, triggering a remediation cycle that wasn't in the plan. Then comes engineering capacity. This is where the "hurry up and wait" feeling is most acute. Data engineering teams are typically carrying significant existing project load, and new data product builds have to compete for prioritization. Even well-defined, well-scoped products can sit in a backlog for months waiting for bandwidth. Change management tends to run in parallel throughout and is the most underestimated drag of all,getting domain teams to accept ownership, getting consumers to adopt new products over familiar workarounds and getting leadership to sustain focus past the initial mandate. Collibra's approach helps address this at multiple points throughout the lifecycle: For business teams: clearer definitions, ownership, prioritization, and measurable outcomes For governance teams: automated policy enforcement, access workflows, lineage, and trust controls For engineers: the ability to work in the systems they already use, such as pipelines, transformation tools, cloud platforms, and code repositories, with capabilities like data contracts that translate business requirements into executable specifications For domain owners: lifecycle management and accountability for products For consumers and AI agents: easier discovery and access to trusted, reusable data products Q(15): Whats the typical timeframe to build out a liquid data asset that can be reused, from definition to deployment? A(15): A well-scoped, straightforward asset with accessible, clean source data and a clear owner typically takes around 4-8 weeks. A complex, multi-source, cross-domain asset requiring significant quality remediation and stakeholder alignment: 3-6 months. Organizations with Collibra's prescriptive data product lifecycle management and automation (data contract syncing, semantic layer mapping, AI-assisted metadata generation) run significantly faster because governance is built in at each step rather than bolted on afterward. The definition and scoping phase is almost always longer than expected, investing time there pays dividends in execution speed. Q(18): Do you find more success in upskilling SMEs into data stewards or finding stewards who are skilled in data governance to work in collibra on behalf of those SMEs? I am at roll-up of healthcare companies and bouncing from the former to the latter. Thank you for the time. Taking lots away from this. A(18): A hybrid model tends to work best and is what Collibra's AI-automated and collaborative stewardship model is designed to enable. Dedicated governance stewards bring consistency, process discipline and deep platform fluency, essential for establishing foundations across newly acquired entities with varying data maturity. SME networks bring the domain knowledge, clinical/operational context and quality judgment. The practical approach: dedicated stewards for the "how" (governance processes, workflows, catalog curation) and an engaged SME network for the "what" (definitions, quality judgment, fitness-for-purpose validation). Collibra's various stewardship capabilities help shift the steward's job from author to editor: AI proposes at scale and humans curate and certify, which makes the SME engagement model more sustainable. Q(19): What is the difference between "data product "and "data asset product"? A(19): Data asset product: This is the data itself, organized as a cohesive set and ready for reuse. It is often a "master" data object, such as a customer master list or a fleet list. The primary goal of a data asset product is reuse across different areas of the organization. It should have a product owner responsible for evolving the data over time to meet the needs of various users and solutions. Data solution product: This refers to a specific application or consumption of data that drives direct value. It is defined by its purpose and its ability to lead to value realization, such as a positive impact on an income statement. An example provided during the webinar was an equipment management app that utilizes data to provide specific services to customers. Unlike data asset products, these are managed with a focus on their financial return to the bottom line. Q(20): @BarbW As data becomes more ‘liquid’ internally, how do you see the end delivery interface evolving—are we moving toward programmatic access layers like APIs or MCP-style servers as the endpoint for data products for internal, partner, and external data consumers? A(20): Yes, the trajectory is clearly toward programmatic, API-first and agent-ready delivery. Collibra's exposure channels include: graphical user experience (for human consumers), as well as fully documented REST APIs, CLIs and MCP tooling, all of which take the governed asset model into account. Platform-native sharing (Snowflake, Databricks Delta Sharing) handles the compute-side delivery; Collibra governs the context and access layer above it. Organizations building data products with clean API and MCP interfaces today are well-positioned for AI consumption at scale tomorrow. Q(21): What are the main blockers to Data liquidity ? A(21): Some common blockers include: 1) Lack of a product mindset: without clear ownership, accountability and product strategy, data stays siloed and ungoverned 2) Manual governance processes: significant manual effort to create and maintain governed data products exceeds the capacity of most organizations 3) Poor access experience: selecting the wrong delivery channel or leaving access as a manual request process kills adoption 4) Undefined success metrics: without measurable KPIs, there's no feedback loop to improve liquidity over 5) Lack of management buy-in: organizations struggle to convince leadership that teams should invest time in building data products Technically, fragmented platforms, vendor lock-in, and insufficient metadata compound the problem. Collibra's unified control layer and data marketplace directly target the governance and access experience blockers; and our data product operating model guidance addresses the mindset and accountability gaps. Q(23): Do you think dimensional data model concepts are still useful in building a data product A(23): Yes, and Collibra's semantic blueprint reinforces why. Dimensional modeling concepts (e.g., facts, dimensions, hierarchies, etc.) encode business understanding that maps directly to Collibra's governed metrics and entities framework. A governed measure in Collibra (e.g., sum of customer identifier for a specific metric) is, in essence, a dimensionally-aware definition. The delivery format evolves (lakehouse table formats, dbt MetricFlow, warehouse-native semantic layers), but the underlying principle of organizing data around business events and the dimensions that describe them is precisely what makes data products understandable and queryable by business consumers. Data product thinking reinforces dimensional concepts by making business semantics explicit, governed and linked to the physical tables that implement them. Q(24): thanks for the sharing, My question is relate mainly to Semantics layer, should it be more deterministic to govern and guarrail the Ai agent tool behavior? or should it give more flexibility (like RAG) to give AI agents more options and figure out which one to use? A(24): Both, applied contextually, and Collibra's framework describes exactly this hybrid. For high-stakes, autonomous AI actions (e.g., issuing refunds, updating financial records, reporting to regulators), data quality rules and governed metrics can serve as a deterministic "kill-switch: the agent cannot proceed if the data contract's assurance conditions aren't met. For exploratory, discovery-oriented use cases, an analyst chatbot or a research agent, more RAG-style flexibility allows the agent to reason across a broader information space. Collibra's semantic layer provides the deterministic foundation (certified metrics, governed entities, data contracts) within which flexible inference can operate safely. The architectural principle: hard guardrails for consequential actions, soft inference within governed bounds for exploration. Determinism and flexibility are not opposites, they operate at different layers. Q(26): Will we able to Integrate Agent or Chatbot via API to Collibra? Is Collibra allow to API based Integration with Agent and Chat bot ? A(26): Yes, Collibra supports integration via REST APIs, CLIs and MCP tooling, all of which take the governed asset model into account. AI agents and chatbots can call Collibra to query the metrics catalog, retrieve data product context and contracts, check DQ scores and SLA adherence and validate access permissions before querying data. For instance, Collibra's MCP tools define a standardized interface for agents to navigate from a business question to the right governed data product. Your Collibra account or implementation team can share API documentation and the MCP tool specifications relevant to your use case. Q(27): What are the essential topics we must know about AI when it comes to data or data governance? A(27): Some of the most important topics for data governance professionals entering the AI era: 1) LLMs and how they consume data: RAG, fine-tuning and in-context learning, and why data quality and context quality directly determine output quality 2) AI agents and agentic architectures: How agents discover and query data autonomously, and what governance they require 3) Data lineage for AI: Tracking what data trained a model and how that affects its outputs 4) Responsible AI and bias: How data quality and representation issues propagate into model outputs 5) Machine-readable governance and data contracts: Policies as active metadata, not static documents 6) The context required for succcessful AI agents (Semantics, business context, business ontology and assurance, the framework for what AI agents actually need from governed data We can direct you to resources to learn more. [e.g., Collibra University if a current customer] Q(28): Where does query execution happen in the data product architecture? A(28): Query execution happens at the data platform layer (e.g., Snowflake, Databricks, BigQuery, Redshift, etc.), not in Collibra. Collibra is the governance and intelligence layer: it describes what a data product is, where it lives, who owns it, what it guarantees and how to access it. When a consumer or AI agent queries a data product, the actual compute runs against the underlying platform. This separation is intentional and is one of Collibra's architectural differentiators: a platform-agnostic governance layer that applies consistently whether data lives in Snowflake, Databricks or on-prem, without requiring data movement or platform consolidation. Q(29): How are your data products actually delivered to users and applications? A(29): Related to question above. Q(30): Also, can they be consumed in real time or do they require data movement or replication? A(30): Platform-native sharing (Snowflake, Databricks Delta Sharing) enables real-time in-place query without data movement, a key advantage of Collibra's platform-agnostic governance model, which doesn't require data to move to be governed. Q(31): How do taxonomies and taxonomy management systems compliment what collibra can do in this AI agent context and control context? A(31): Collibra acts as the "governance gateway", turning the sophisticated business logic defined in a taxonomy management system (TMS) into actionable guardrails for AI agents. While the TMS maps the complex relationships and ontologies of your business, Collibra registers these definitions and enforces them. This helps ensure that when an agent discovers a data product via APIs or MCP tooling, it receives a package that includes both the rich semantic context (the "map") and strict, policy-driven boundaries (the "rules of the road"). By bridging the gap between high-level taxonomy and technical execution, Collibra makes sure that masking and filtering are applied at the platform layer (e.g., Snowflake, Databricks) based on the agent's intent. This architectural synergy prevents "policy hallucination", making self-service safe by ensuring governance is a structural component of the data product rather than an afterthought. Q(32): How does Collibra position itself relative to other leading governance providers such as Alation, Atlan, Informatica and Microsoft Purview in the context of AI governance and AI agents, and what would you say is Collibra’s main differentiator? A(32): Where Collibra differentiates comes down to three things that matter most: 1. Completeness of context: Most governance platforms focus on making data queryable, the technical "what", but Collibra's framework addresses all four pillars of what AI agents actually need: 1) Semantics (governed metrics, entities, attributes, and physical table joins) 2) Business Context (use cases, stewardship metadata, data sharing agreements, and intent-driven access permissions) 3) Business ontology (the glossary and relationships between concepts that answer the "why" questions, e.g., why did revenue drop?) 4) Assurance (DQ rules, SLAs and data sharing agreements as enforceable promises) Addressing only Pillar 1 is what causes AI agents to produce technically correct queries that return strategically wrong answers. 2. Enterprise governance depth with federated flexibility. Collibra's operating model supports centralized standards with domain-led ownership, the structure large, complex organizations actually need. Rather than imposing a one-size-fits-all governance layer, it scales across cloud, analytics, and AI platforms through a single unified control layer, enabling domains to move at their own pace without creating new silos or sacrificing global standards. 3. Headless, API-first governance for the agentic era. As AI agents become the primary data consumers in most organizations, governance can no longer live only in a UI that humans navigate. Collibra is built so that agents call its APIs and MCP tooling to verify context and permissions before every action, with governance as a service layer that operates continuously in the background, not just when a person opens a catalog. Q(33): Where do you suggest starting to establish data products? The example of master data management is a tough place to start. A(33): MDM can a difficult starting point because it requires cross-domain consensus and organizational alignment before building can begin. Collibra's three-phase data product framework recommends starting in Phase 1 (Discover), focusing on quickly establishing a Data Marketplace housing a few simpler but key data products. Identify a domain with a motivated, senior owner who already feels the pain and wants to solve it; find data with clear, waiting consumers (reducing adoption risk); and attach the data product effort to a strategic initiative (e.g., AI readiness, self-service analytics, or platform optimization) that already has executive attention and budget. Start with data that is relatively clean and well-understood to build early trust and demonstrate what good looks like. Once you have one or two successful domain-owned products in your internal data marketplace (e.g., Collibra Data Marketplace), the case for tackling cross-domain challenges like MDM becomes concrete rather than theoretical. Join us for our webinar (The data scevenger hunt) on May 12 to learn more about how to get started. Q(34): My boss thinks that a dashboard or report can constitute a "data product". I'm trying to emphasize that a data set is really the central consumable asset of a data product. Do you have advice on how to frame this in an industry where so much ambiguous language is thrown around? A(34): The way we frame it at Collibra: a data product is the governed asset plus its context: what the data represents, who owns it, how it can be used, what rules and quality checks apply, and how it's accessed. The port is how that data product gets consumed, and we deliberately keep it flexible: a table, a view, an API, an event stream, even a file export. One data product can have multiple ports. Tables and views are the cleanest representation because they're queryable, governed, and reusable as building blocks. A dashboard sits further downstream: it's a rendered view of data, optimized for a specific question or audience. It can absolutely be served by a data product, but on its own it usually lacks the context layer (semantics, ownership, contracts, access policies) that makes a data product reusable. The framing I'd offer: the data product is where the context lives. The port is one way of consuming it. A dashboard is one possible port, but if you call the dashboard itself the data product, you've collapsed the asset into a single rendering, and the next team that wants to build an AI model, a different dashboard, or an operational integration is starting from scratch with no governed source underneath. This distinction shows up consistently in Zhamak Dehghani's Data Mesh writing and in analyst frameworks (Gartner, Forrester). Pointing at external sources can help depersonalize what might otherwise feel like an internal terminology debate.