Data and Information Inventory and Attributes - Build, own, and govern the Data and Information Inventory
Data and Information Inventory and Attributes
Build, own, and govern the Data and Information Inventory
Section A — Sourcing and Harvesting
Before building the Data and Information Inventory from scratch, assess whether data type definitions already exist in any form in the enterprise. Common sources include: the enterprise Data Catalog, where physical asset metadata often contains type-level labels and descriptions that can be promoted to governed inventory records; the Integrations Inventory, where the distinct Payload values across all integration records are a direct discovery list for data types currently moving through the enterprise; the Capabilities Inventory, where the Key Input and Key Output Data and Information attributes name types that capabilities consume and produce; data dictionaries and business glossaries maintained by data governance or enterprise architecture teams; and regulatory compliance documentation, where data types subject to specific regulations are often explicitly named.
AI agents are effective tools for bootstrapping the Data and Information Inventory, particularly for generating initial Descriptions, suggesting Sensitivity Classifications, and identifying regulatory obligations for well-known data types. An AI agent can be prompted to generate initial records for standard types (Customer Profile, Supplier Invoice, Employee Record, etc.) that practitioners validate and extend. AI-generated records must be treated as starting points requiring human validation, not as authoritative records. The Provenance and Audit Attributes category documents the generation method and validation status of each AI-generated record.
Where no existing definitions exist, the Data and Information Inventory is built through structured discovery sessions with data owners, business domain leads, and integration architects. A session focused on one business domain — Finance, Customer, Product, etc. — with three to six participants who understand both what data that domain produces and what it consumes typically produces a workable set of Crawl-level records for that domain in two to four hours. Prioritize high-sensitivity and high-strategic-importance types first.
Section B — Ownership and Accountability
Every inventory must have a named owner accountable for the accuracy, completeness, and governance of the inventory as a whole. For the Data and Information Inventory, the Chief Data Officer, Head of Data Governance, or equivalent function is the natural organizational owner. In organizations without a formal data governance function, Enterprise Architecture or a data domain council is an appropriate alternative. Individual data type records each have their own Owner and Steward — the inventory owner is accountable for the schema, the governance process, and the overall health of the inventory as a governance artifact.
Section C — Lifecycle and Review Cadence
The Data and Information Inventory is a living governance artifact. New data types are introduced constantly as the business evolves, new systems are deployed, and new regulatory requirements emerge. Reconciliation cadence: Crawl maturity, quarterly minimum; Walk maturity, monthly or event-driven when a new integration, application, or capability is added to the portfolio; Run maturity, continuous or near-continuous through automated feeds from the Data Catalog and integration platform. Every new integration record added to the Integrations Inventory should trigger a check: does the Payload value correspond to a governed Data and Information type? If not, a new record is required.
Section D — Data Quality and Starting Approach
Recommended approach: (1) Identify all known Data and Information types from the sources described in Section A and create a stub record for each — Semantic ID, Display Name, Description, Structure, Data Category, Data Domain, and Sensitivity Classification only. (2) Populate all remaining Crawl attributes before any Walk attributes are added. (3) Validate Crawl completeness — 100% of known types with 100% of Crawl attributes populated — before advancing. (4) Populate Walk attributes systematically, prioritizing high-sensitivity and high-strategic-importance types. (5) Introduce Run attributes only when cross-inventory relationships are sufficiently mature to derive automatically. The most common failure mode is building a long list of data type names without descriptions, owners, or sensitivity classifications — a list with no governance value.
Section E — Access Control
The Data and Information Inventory contains governance-sensitive information including sensitivity classifications, authoritative sources, and regulatory obligations. Read access should be broadly available to data governance, enterprise architecture, APM, TPM, compliance, and security teams. Write access restricted to the inventory steward, designated data owners, and authorized automated feeds. Schema change access reserved for the inventory owner and governing body.
Section F — Change Management
Changes to a Data and Information type definition — particularly changes to its Description, Sensitivity Classification, Authoritative Source, or Retention Period — have downstream implications for every system, integration, and capability that references it. Schema changes to this inventory and definition changes to individual records follow the same five-step process: Propose → Review → Approve → Implement → Communicate. Impact assessment of affected integrations, applications, and capabilities is a required step before any definition change is approved.
Section G — Archival and Retention
When a Data and Information type is retired, its record is not deleted. Update the Lifecycle Status to Retired, retain the record for one full reconciliation cycle in the active inventory, then archive it. The archived record remains queryable for historical lineage analysis. Retain indefinitely any record for a type involved in a significant compliance finding, regulatory audit, or litigation hold. For all others, define a retention period consistent with applicable regulatory requirements.
Copyright for the International Foundation for Information Technology (IF4IT): 2008 - Present
Legal Disclaimers