Data and Information Inventory and Attributes - Data and Information Governance Context
Data and Information Inventory and Attributes
Data and Information Governance Context
Why Data and Information in a Single Inventory
The most common assumption practitioners bring to a data inventory is that it governs structured data — database records, CSV files, JSON payloads. The Data and Information Inventory is deliberately broader: it governs both data (structured, schema-bound, machine-readable content) and information (unstructured or semi-structured content such as PDF documents, images, audio files, contracts, and regulatory submissions). These two forms are governed together under a single Noun Type because the enterprise produces, moves, protects, and must comply with regulations governing both.
The distinction between data and information is philosophically contested — particularly in the context of AI systems, which treat a database record, a PDF document, and an audio transcript as equally machine-consumable inputs. Rather than imposing a philosophical boundary that breaks down under AI workloads, this inventory uses the Structure attribute as the governing classification: Structured, Semi-Structured, or Unstructured. This is an objective, durable, and AI-friendly distinction that does not depend on whether advanced processing systems can read the content. The governance obligations that apply to a Customer Profile — ownership, sensitivity, retention, disposal, regulatory compliance — apply regardless of whether that profile is stored as a relational database record (Structured) or a scanned paper form converted to PDF (Unstructured).
Enterprises that govern only structured data leave an unquantified portion of their information landscape ungoverned. Contract documents, regulatory filings, insurance claims, engineering drawings, and audio recordings are frequently the most sensitive and most heavily regulated content the enterprise handles — and they are almost always unstructured. A data governance program that does not reach them is incomplete by design.
What a Noun Instance Is in This Inventory
Every record in the Data and Information Inventory is a Noun Instance of the Data and Information Noun Type — a named, governed type of content, not a specific physical record, file, or database row. “Customer Profile” is a Noun Instance. A specific customer record in the Salesforce CRM database is an instance of the “Customer Profile” type. This distinction — between the governed type and the physical instance — is the defining architectural principle of this inventory.
Type-level governance is what makes data governance scalable. Governing the “Customer Profile” type once — who owns it, how sensitive it is, what the authoritative source is, how long it must be retained, what encryption is required — applies that governance universally to every physical instance of the type, wherever it appears, in whatever system, in whatever format. Without type-level governance, the same decisions must be made separately for every table, file, and API that handles customer data — producing inconsistent governance and leaving gaps between systems.
The practical implication: when building this inventory, think in terms of named business concepts, not database tables or file names. “Invoice” is a type. “INVOICE_HDR table in SAP” is a physical asset that stores instances of the “Invoice” type — it belongs in the Data Stores Inventory, not here. The relationship between the type and its physical housing is captured in the Relationship Attributes category of this inventory.
The Logical Layer vs. the Physical Layer
The Data and Information Inventory operates at the logical and governance layer of the enterprise data architecture. It answers governance questions: what types of data and information exist, who owns them, how sensitive are they, what are the rules for handling them? It is system-agnostic, format-agnostic, and environment-agnostic — a Data and Information type exists as a governed concept independent of any physical implementation.
The physical layer is governed by two complementary instruments: the Data Stores Inventory (which governs the physical containers — databases, object stores, file systems — that hold instances of data types) and the enterprise Data Catalog (which provides automated technical metadata about those physical assets). These three instruments operate at different levels of abstraction and serve different governance purposes. They are complementary, not competing, and together they provide complete data governance coverage from the conceptual layer through the physical asset layer.
Relationship to the Enterprise Data Catalog
The Data and Information Inventory is not a Data Catalog — but it is the governance layer that a Data Catalog should be built on. Understanding the distinction is essential for practitioners deciding how to implement both.
A Data Catalog is a technical metadata management tool. It scans, ingests, and organizes metadata from actual data assets: databases, data warehouses, data lakes, S3 buckets, BI reports, and APIs. Tools such as Collibra, Alation, Atlan, AWS Glue Data Catalog, Microsoft Purview, and Apache Atlas are Data Catalogs. They answer questions like: what tables exist in this database, what columns do they have, what are the data types, when was it last updated, who has queried it recently? Their primary audience is data engineers, analysts, and data scientists who need to find and understand specific physical data assets.
The table below summarizes the key differences between the two instruments:
| Dimension | Data Catalog | Data and Information Inventory |
| Layer | Physical / Technical | Logical / Governance |
| Unit of governance | A specific table, file, column, or asset | A named, governed type of data or information |
| Population method | Automated scanning and ingestion from source systems | Manual governance decisions and deliberate recognition |
| Primary question answered | What data assets exist and what do they contain technically? | What data types does the enterprise recognize, own, and govern? |
| Primary audience | Data engineers, analysts, data scientists | Data owners, governance practitioners, enterprise architects, AI designers |
| Relationship to systems | Bound to specific systems and assets | System-agnostic — a type exists independently of which system holds it |
| Sensitivity governance | Often tagged post-hoc, inconsistently | A Crawl-level mandatory attribute on every record |
| Cross-inventory relationships | Typically isolated to data lineage within the catalog | Explicit typed relationships to Capabilities, Integrations, Applications, Data Stores, and the Enterprise Model |
The Data and Information Inventory should be the governing vocabulary that a Data Catalog uses to classify and tag its assets. When a Data Catalog scans a database and discovers a table called CUST_MASTER, the catalog can tag it with the governed Data and Information type “Customer Profile” — complete with its Semantic ID, sensitivity classification, authoritative source designation, and retention rules from this inventory. The catalog provides physical discovery; this inventory provides logical governance context.
Without this inventory, a Data Catalog accumulates technical metadata with no consistent logical classification. Two catalog entries at the same enterprise may label the same concept as “Customer Data,” “Client Records,” and “Account Master” — three labels for the same governed type. This inventory resolves that ambiguity by providing the canonical governed vocabulary that the catalog applies consistently. At Run maturity, the relationship becomes bidirectional: the catalog’s automated scans surface new physical assets that don’t yet have a corresponding Data and Information Inventory record, triggering a governance decision about whether to recognize and govern that type. The catalog becomes a discovery instrument that feeds the inventory rather than just consuming its vocabulary.
The Data Catalog Reference attribute in the Technical Attributes category of this inventory carries the identifier or link to the corresponding catalog entry for each Data and Information type, providing the navigational bridge between the logical governance layer and the physical asset discovery layer.
Authoritative Source vs. Source of Truth Store
Two attributes in this inventory answer questions that are closely related but fundamentally different, and conflating them is one of the most common and costly data governance mistakes enterprises make.
Authoritative Source is an organizational designation: it identifies the system or unit that is accountable for this Data and Information type — who sets the rules, who resolves disputes about what the correct value is, and whose copy is canonical by governance agreement. It answers the question: who owns this data?
Source of Truth Store is a physical designation: it identifies the specific data store — the database, object store, or file system — where the master copy of this type physically lives. It answers the question: where does the canonical version actually reside?
These two answers are not always the same system. An enterprise may designate Salesforce as the Authoritative Source for the Customer Profile type — Salesforce is the system of record, the system whose values win in any conflict. But the actual master copy of customer data may physically live in a PostgreSQL database that Salesforce writes to, or in a data warehouse that aggregates from Salesforce and two other systems, or in Salesforce’s own cloud-managed storage. Each of these is a different physical location with different access controls, backup policies, and encryption implementations.
Without both attributes, governance is incomplete. Knowing the Authoritative Source without knowing the Source of Truth Store means you know who owns the data but not where the master copy lives — making incident response, data migration planning, and cross-border compliance analysis guesswork. Knowing the Source of Truth Store without knowing the Authoritative Source means you know where the data is but not who is accountable for its accuracy — making data quality remediation impossible to escalate. Both attributes are required for complete data governance, and both are captured as separate, explicitly named attributes in this inventory.
The Data Layer of the Enterprise Model
The Data and Information Inventory is the data layer of the Enterprise Model graph. Every Data and Information type is a node in the graph, carrying governance metadata as node attributes and connected to other inventory nodes through typed relationships. These relationships are not informal descriptions — they are explicit, Semantic-ID-carrying edges in the Enterprise Model that AI agents can traverse.
The typed relationships that connect Data and Information types to the rest of the Enterprise Model: a Data and Information type is produced by one or more Capabilities (connecting to the Capabilities Inventory), consumed by one or more Capabilities (connecting to the Capabilities Inventory), carried by one or more Integrations as their Payload (connecting to the Integrations Inventory), owned by one Application as its system of record (connecting to the Applications Inventory), physically housed in one or more Data Stores (connecting to the Data Stores Inventory), classified by one or more Data Sensitivity Types (connecting to the Data Sensitivity Types Inventory), and moved across environments by one or more Systems Deployment Pipelines (connecting to the Systems Deployment Pipelines Inventory). These six relationship types make the Data and Information type one of the most connected Noun Types in the Enterprise Model — the hub through which data flows, data ownership, and data sensitivity can be traced from any starting point in the graph.
Why This Inventory Is Foundational to AI Governance
As enterprises deploy AI agents that read, classify, generate, summarize, and act on data and information, the boundary between structured data and unstructured information collapses entirely for the purposes of AI consumption. An AI agent processing a database record, a PDF contract, and a voice transcript treats all three as equally consumable inputs. The governance question is no longer “is this machine-readable?” — it is “who governs this content, how sensitive is it, what are the rules for using it, and who is accountable if those rules are violated?”
The Data and Information Inventory is the governance instrument that answers these questions at the type level. Without it, AI governance is built on informal vocabulary that cannot be enforced, audited, or traversed by AI agents themselves. An AI agent that has access to a governed Data and Information Inventory can determine in a single lookup: is this content type classified as PII? What is its retention period? Which application is the authoritative source? What regulatory obligations apply? These are the governance decisions that determine whether an AI agent is operating within appropriate boundaries — and they cannot be made reliably without a governed data type vocabulary.
AI agents are also first-class contributors to this inventory at Run maturity. An AI agent with access to the Data Catalog, the Integrations Inventory, and the Capabilities Inventory can automatically identify candidate Data and Information types from Payload values, Input/Output attribute values, and catalog asset labels — generating draft records for human review. An AI agent with access to regulatory frameworks can suggest Sensitivity Classifications and Regulatory Obligations for each type. The Data and Information Inventory is not just governed by humans for AI consumption — it is populated, maintained, and quality-checked through AI-human collaboration.
Copyright for the International Foundation for Information Technology (IF4IT): 2008 - Present
Legal Disclaimers