Data quality is best defined as fitness for use and must be expressed as measurable requirements, not a vague idea of “clean data.” Using common dimensions—accuracy, completeness, consistency, timeliness, validity, and uniqueness—organizations can implement governance, controls, and monitoring that make data reliable for reporting, operations, and analytics.
Data quality is the degree to which data is fit for its intended use. In DAMA-DMBOK terms, data quality management is a core data management discipline that defines, measures, monitors, and improves data to meet business expectations. Poor quality data typically shows up as:
Many organizations use a set of commonly accepted dimensions to express requirements and design controls. The six dimensions below are widely used in governance and data quality practices and map well to how rules and metrics are implemented in real systems.
Definition: Data correctly represents the real-world entity/event it describes. How it fails: wrong amounts, wrong customer attributes, incorrect timestamps, incorrect mappings. How to measure: compare to an authoritative source (system of record, external validation, reconciliation); calculate error rate and impact. Common controls: reconciliations, reference data validation, controlled vocabularies, master data management (MDM) where appropriate.
Definition: Required data is present at the right level of granularity for the use case. How it fails: nulls in required fields, missing records, partial history after a pipeline outage. How to measure: null rate for required fields; record counts vs. expected; completeness by segment/time window. Common controls: required-field checks, ingestion expectations (e.g., “daily file must contain all regions”), backfills with auditable lineage.
Definition: Data does not contradict itself across datasets, systems, or time. How it fails: customer status differs between CRM and billing; metric definitions differ between dashboards; different currencies without conversion. How to measure: cross-system reconciliation; referential integrity checks; “same business concept, same definition” checks. Common controls: canonical definitions in a semantic layer/metrics layer; conformed dimensions (Kimball); standardized transformation logic.
Definition: Data is available when needed and reflects the required recency for the use case. How it fails: late-arriving feeds; pipelines succeed but deliver after reporting deadlines; operational actions happen on stale data. How to measure: freshness/latency (event time → availability time); SLA/SLO compliance. Common controls: pipeline SLAs, alerting on freshness, late data handling patterns (watermarks, reprocessing windows).
Definition: Data conforms to defined formats, types, ranges, and business rules. How it fails: invalid dates, negative quantities where prohibited, invalid country codes, malformed emails. How to measure: rule pass/fail rates; distribution checks (e.g., allowed values, ranges). Common controls: schema enforcement, domain constraints, business-rule tests in transformation pipelines, reference data and code sets.
Definition: Each real-world entity/event is represented once where uniqueness is required. How it fails: duplicate customers, repeated transactions due to retries, double-counted events. How to measure: duplicate rate by business key; collision checks; idempotency validation. Common controls: primary keys, deduplication logic, idempotent ingestion, survivorship rules (often connected to MDM).
Data quality is not “maximum on all dimensions.” The required thresholds depend on risk, decision impact, and tolerance for delay.
Data quality improves sustainably when it is treated as a governance and operating-model concern, not only a technical problem.
A practical, scalable approach is to treat data quality as part of the delivery lifecycle (similar to software quality).
For key datasets (tables, views, metrics), document a contract that specifies:
Place controls at multiple layers:
Treat quality as observable system behavior:
When issues occur, define how corrections are delivered: