Data and Information Inventory and Attributes - Data and Information Governance Context

Data and Information Inventory and Attributes

Chapter 8. Data and Information Governance Context

Authored and Published By: The International Foundation for Information Technology (IF4IT), LLC

Executive Summary: Chapter Overview

IF4IT

The Bottom Line

This chapter explains the governance logic behind combining data and information in one inventory and managing both as governed content types. It highlights why structured, semi-structured, and unstructured content all require ownership, classification, retention, disposal, lineage, and regulatory controls in an AI-era enterprise.

Core Concepts

Concept	Definition & Strategic Role
Single Inventory Model	Structured data and unstructured information are governed together because both carry enterprise value and regulatory exposure. A single inventory avoids leaving contracts, filings, images, recordings, and other information assets outside governance.
Type vs. Instance	The chapter reinforces that Customer Profile is a governed type, while a specific customer record is an instance. This separation keeps governance reusable across systems and physical implementations.
Structure-Based Classification	Structure provides an objective distinction among Structured, Semi-Structured, and Unstructured content. It is more durable than philosophical data-versus-information boundaries and more useful for AI-enabled governance.

Quick Q&A

Question: Why does the inventory govern both data and information instead of only structured data?

Answer: Enterprises must protect and comply with obligations for unstructured and semi-structured content as well as database records. Contracts, regulatory filings, scanned forms, images, and transcripts may be highly sensitive, so excluding them would leave a major portion of the information landscape ungoverned.

Read More Below

Why Data and Information in a Single Inventory

The most common assumption practitioners bring to a data inventory is that it governs structured data — database records, CSV files, JSON payloads. The Data and Information Inventory is deliberately broader: it governs both data (structured, schema-bound, machine-readable content) and information (unstructured or semi-structured content such as PDF documents, images, audio files, contracts, and regulatory submissions). These two forms are governed together under a single Noun Type because the enterprise produces, moves, protects, and must comply with regulations governing both.

The distinction between data and information is philosophically contested — particularly in the context of AI systems, which treat a database record, a PDF document, and an audio transcript as equally machine-consumable inputs. Rather than imposing a philosophical boundary that breaks down under AI workloads, this inventory uses the Structure attribute as the governing classification: Structured, Semi-Structured, or Unstructured. This is an objective, durable, and AI-friendly distinction that does not depend on whether advanced processing systems can read the content. The governance obligations that apply to a Customer Profile — ownership, sensitivity, retention, disposal, regulatory compliance — apply regardless of whether that profile is stored as a relational database record (Structured) or a scanned paper form converted to PDF (Unstructured).

Enterprises that govern only structured data leave an unquantified portion of their information landscape ungoverned. Contract documents, regulatory filings, insurance claims, engineering drawings, and audio recordings are frequently the most sensitive and most heavily regulated content the enterprise handles — and they are almost always unstructured. A data governance program that does not reach them is incomplete by design.

What a Noun Instance Is in This Inventory

Every record in the Data and Information Inventory is a Noun Instance of the Data and Information Noun Type — a named, governed type of content, not a specific physical record, file, or database row. “Customer Profile” is a Noun Instance. A specific customer record in the Salesforce CRM database is an instance of the “Customer Profile” type. This distinction — between the governed type and the physical instance — is the defining architectural principle of this inventory.

Type-level governance is what makes data governance scalable. Governing the “Customer Profile” type once — who owns it, how sensitive it is, what the authoritative source is, how long it must be retained, what encryption is required — applies that governance universally to every physical instance of the type, wherever it appears, in whatever system, in whatever format. Without type-level governance, the same decisions must be made separately for every table, file, and API that handles customer data — producing inconsistent governance and leaving gaps between systems.

The practical implication: when building this inventory, think in terms of named business concepts, not database tables or file names. “Invoice” is a type. “INVOICE_HDR table in SAP” is a physical asset that stores instances of the “Invoice” type — it belongs in the Data Stores Inventory, not here. The relationship between the type and its physical housing is captured in the Relationship Attributes category of this inventory.

The Logical Layer vs. the Physical Layer

The Data and Information Inventory operates at the logical and governance layer of the enterprise data architecture. It answers governance questions: what types of data and information exist, who owns them, how sensitive are they, what are the rules for handling them? It is system-agnostic, format-agnostic, and environment-agnostic — a Data and Information type exists as a governed concept independent of any physical implementation.

The physical layer is governed by two complementary instruments: the Data Stores Inventory (which governs the physical containers — databases, object stores, file systems — that hold instances of data types) and the enterprise Data Catalog (which provides automated technical metadata about those physical assets). These three instruments operate at different levels of abstraction and serve different governance purposes. They are complementary, not competing, and together they provide complete data governance coverage from the conceptual layer through the physical asset layer.

Relationship to the Enterprise Data Catalog

The Data and Information Inventory is not a Data Catalog — but it is the governance layer that a Data Catalog should be built on. Understanding the distinction is essential for practitioners deciding how to implement both.

A Data Catalog is a technical metadata management tool. It scans, ingests, and organizes metadata from actual data assets: databases, data warehouses, data lakes, S3 buckets, BI reports, and APIs. Tools such as Collibra, Alation, Atlan, AWS Glue Data Catalog, Microsoft Purview, and Apache Atlas are Data Catalogs. They answer questions like: what tables exist in this database, what columns do they have, what are the data types, when was it last updated, who has queried it recently? Their primary audience is data engineers, analysts, and data scientists who need to find and understand specific physical data assets.

The table below summarizes the key differences between the two instruments:


Dimension	Data Catalog	Data and Information Inventory
Layer	Physical / Technical	Logical / Governance
Unit of governance	A specific table, file, column, or asset	A named, governed type of data or information
Population method	Automated scanning and ingestion from source systems	Manual governance decisions and deliberate recognition
Primary question answered	What data assets exist and what do they contain technically?	What data types does the enterprise recognize, own, and govern?
Primary audience	Data engineers, analysts, data scientists	Data owners, governance practitioners, enterprise architects, AI designers
Relationship to systems	Bound to specific systems and assets	System-agnostic — a type exists independently of which system holds it
Sensitivity governance	Often tagged post-hoc, inconsistently	A Crawl-level mandatory attribute on every record
Cross-inventory relationships	Typically isolated to data lineage within the catalog	Explicit typed relationships to Capabilities, Integrations, Applications, Data Stores, and the Enterprise Model

The Data and Information Inventory should be the governing vocabulary that a Data Catalog uses to classify and tag its assets. When a Data Catalog scans a database and discovers a table called CUST_MASTER, the catalog can tag it with the governed Data and Information type “Customer Profile” — complete with its Semantic ID, sensitivity classification, authoritative source designation, and retention rules from this inventory. The catalog provides physical discovery; this inventory provides logical governance context.

Without this inventory, a Data Catalog accumulates technical metadata with no consistent logical classification. Two catalog entries at the same enterprise may label the same concept as “Customer Data,” “Client Records,” and “Account Master” — three labels for the same governed type. This inventory resolves that ambiguity by providing the canonical governed vocabulary that the catalog applies consistently. At Run maturity, the relationship becomes bidirectional: the catalog’s automated scans surface new physical assets that don’t yet have a corresponding Data and Information Inventory record, triggering a governance decision about whether to recognize and govern that type. The catalog becomes a discovery instrument that feeds the inventory rather than just consuming its vocabulary.

The Data Catalog Reference attribute in the Technical Attributes category of this inventory carries the identifier or link to the corresponding catalog entry for each Data and Information type, providing the navigational bridge between the logical governance layer and the physical asset discovery layer.

Authoritative Source vs. Source of Truth Store

Two attributes in this inventory answer questions that are closely related but fundamentally different, and conflating them is one of the most common and costly data governance mistakes enterprises make.

Authoritative Source is an organizational designation: it identifies the system or unit that is accountable for this Data and Information type — who sets the rules, who resolves disputes about what the correct value is, and whose copy is canonical by governance agreement. It answers the question: who owns this data?

Source of Truth Store is a physical designation: it identifies the specific data store — the database, object store, or file system — where the master copy of this type physically lives. It answers the question: where does the canonical version actually reside?

These two answers are not always the same system. An enterprise may designate Salesforce as the Authoritative Source for the Customer Profile type — Salesforce is the system of record, the system whose values win in any conflict. But the actual master copy of customer data may physically live in a PostgreSQL database that Salesforce writes to, or in a data warehouse that aggregates from Salesforce and two other systems, or in Salesforce’s own cloud-managed storage. Each of these is a different physical location with different access controls, backup policies, and encryption implementations.

Without both attributes, governance is incomplete. Knowing the Authoritative Source without knowing the Source of Truth Store means you know who owns the data but not where the master copy lives — making incident response, data migration planning, and cross-border compliance analysis guesswork. Knowing the Source of Truth Store without knowing the Authoritative Source means you know where the data is but not who is accountable for its accuracy — making data quality remediation impossible to escalate. Both attributes are required for complete data governance, and both are captured as separate, explicitly named attributes in this inventory.

The Data Layer of the Enterprise Model

The Data and Information Inventory is the data layer of the Enterprise Model graph. Every Data and Information type is a node in the graph, carrying governance metadata as node attributes and connected to other inventory nodes through typed relationships. These relationships are not informal descriptions — they are explicit, Semantic-ID-carrying edges in the Enterprise Model that AI agents can traverse.

The typed relationships that connect Data and Information types to the rest of the Enterprise Model: a Data and Information type is produced by one or more Capabilities (connecting to the Capabilities Inventory), consumed by one or more Capabilities (connecting to the Capabilities Inventory), carried by one or more Integrations as their Payload (connecting to the Integrations Inventory), owned by one Application as its system of record (connecting to the Applications Inventory), physically housed in one or more Data Stores (connecting to the Data Stores Inventory), classified by one or more Data Sensitivity Types (connecting to the Data Sensitivity Types Inventory), and moved across environments by one or more Systems Deployment Pipelines (connecting to the Systems Deployment Pipelines Inventory). These six relationship types make the Data and Information type one of the most connected Noun Types in the Enterprise Model — the hub through which data flows, data ownership, and data sensitivity can be traced from any starting point in the graph.

Why This Inventory Is Foundational to AI Governance

As enterprises deploy AI agents that read, classify, generate, summarize, and act on data and information, the boundary between structured data and unstructured information collapses entirely for the purposes of AI consumption. An AI agent processing a database record, a PDF contract, and a voice transcript treats all three as equally consumable inputs. The governance question is no longer “is this machine-readable?” — it is “who governs this content, how sensitive is it, what are the rules for using it, and who is accountable if those rules are violated?”

The Data and Information Inventory is the governance instrument that answers these questions at the type level. Without it, AI governance is built on informal vocabulary that cannot be enforced, audited, or traversed by AI agents themselves. An AI agent that has access to a governed Data and Information Inventory can determine in a single lookup: is this content type classified as PII? What is its retention period? Which application is the authoritative source? What regulatory obligations apply? These are the governance decisions that determine whether an AI agent is operating within appropriate boundaries — and they cannot be made reliably without a governed data type vocabulary.

AI agents are also first-class contributors to this inventory at Run maturity. An AI agent with access to the Data Catalog, the Integrations Inventory, and the Capabilities Inventory can automatically identify candidate Data and Information types from Payload values, Input/Output attribute values, and catalog asset labels — generating draft records for human review. An AI agent with access to regulatory frameworks can suggest Sensitivity Classifications and Regulatory Obligations for each type. The Data and Information Inventory is not just governed by humans for AI consumption — it is populated, maintained, and quality-checked through AI-human collaboration.

How to cite this page

When referencing this page in academic work, internal standards, or external publications, include the page title, IF4IT as author and publisher (The International Foundation for Information Technology (IF4IT), LLC), the URL, and your access date.

Example (informal web citation):

The International Foundation for Information Technology (IF4IT), LLC. Data and Information Governance Context | Data and Information Inventory and Attributes. https://if4it.org/best-practices/data-and-information-inventory-and-attributes/data-and-information-governance-context/ (accessed 2026-07-20).

See About Us for content governance and site-wide citation guidance.

Legal Disclaimers

💡 The Bottom Line

📝 Core Concepts

🤖 Quick Q&A