Intuitas Data Modelling Framework

Updated 5/12/2025

This resource provides a lightweight framework for describing and developing data models. It draws from a range of suggested practices, with references provided where appropriate.

Note: See Modelling Standards and Conventions for notation and modelling standards.

Modelling Concepts

We distinguish between business information design and technical data implementation:

Information Models (Conceptual and Logical): Semantic models focused on business meaning, relationships, and rules. Technology-agnostic, designed for business stakeholders, analysts, and architects.
Data Models (Physical): Implementation-specific models focused on storage, performance, and technical constraints. Designed for database developers and engineers.

Some organisations use "data model" as an umbrella term for all three layers, which is also acceptable.

Concepts (Entities)

Concept (Entity) – The foundational unit of information modelling is the Core Business Concept, often referred to as an Entity. - A concept represents a real or abstract thing of business significance and is defined by: - A clear glossary definition - A set of attributes or properties - Relationships to other concepts, where applicable

Concepts may be organised from multiple architectural perspectives and expressed in different types of models, each serving a distinct purpose.

Perspectives

Domains decompose the enterprise into functional or capability-based areas of responsibility.

Concepts may be referenced within the scope of one or more domains
For governance purposes, each concept must have one and only one domain owner
Domain ownership establishes accountability for definition, quality, and change
See: Domain Models
See: Level 2: Domain Architecture

Subject areas describe the enterprise information space, organised by broad information topics.

Subject areas cut across domains and systems
They may loosely align to business functions in some cases, but are not synonymous with functions or organisational structures
Subject areas are primarily an information classification construct, not a governance boundary

The Enterprise Data Model (EDM) describes concepts that are materially relevant across the organisation.

Focus is placed on concepts, definitions, and standards that are:
Used by multiple domains
Implemented across multiple systems
Required for enterprise-wide integration and reporting
The EDM forms a primary foundation for interoperability, consistency, and reuse
See Level 1: Enterprise Architecture

A Solution (or System) Architecture is scoped to a specific business context, process set, and delivery objective.

Solutions refine and apply concepts within a defined scope
They frequently act as a validation and elaboration point for subject areas and enterprise concepts
Solution models must align to, and not redefine, enterprise and domain concepts

Modelling levels

Concepts may be expressed at multiple modelling levels, depending on intent and audience. Conceptual and logical modelling are informational in nature, even when labelled “data models”. Physical models are data models

Conceptual (Information) Models

For naming standards and conventions, see Conceptual Models in Modelling Standards and Conventions.

Conceptual models represent business meaning without implementation concerns.

What to Include:

Core business concepts (entities) that are identifiable, meaningful, and properly named
Key relationships between concepts
High-level business rules

What to Exclude:

Primary or surrogate keys
Data types
Precise cardinality (beyond "one" vs "many")

Use the business glossary to map domain-specific terms as synonyms to canonical enterprise terms.

Subtypes and Supertypes

Choosing the Right Level:

Too abstract: Not practical (e.g., "Thing")
Too granular: Should be attributes instead (e.g., "Customer Type")
Just right: Meaningful specialisations with distinct data requirements (e.g., "Individual Customer" vs "Corporate Customer")

Subtype Rules:

Mutually exclusive: Each instance belongs to one subtype only
Collectively exhaustive: Desirable but flexible—models evolve over time

Avoid multiple classification hierarchies; choose the primary classification dimension.

Logical (Information) Models

For naming standards and conventions, see Logical Models in Modelling Standards and Conventions.

Logical models extend conceptual models with structure and precision while remaining technology-independent.

What to Include:

Keys and identifiers: Natural and alternate keys
Cardinality: Precise relationships (1:1, 1:M, M:N)
Optionality: Mandatory vs optional relationships and attributes
Attribute definitions: Clear definitions and business rules
Normalisation: Applied where relevant for data integrity

Physical (Data) Models

For naming standards and conventions, see Physical Models in Modelling Standards and Conventions.

Physical models implement conceptual and logical models in specific technologies. They focus on storage, performance, and technical constraints for database developers and engineers.

Common Physical Formats:

Relational (3NF): Traditional normalised models for transactional systems (OLTP). Used in PostgreSQL, SQL Server, Oracle.
Dimensional: Star/snowflake schemas optimised for analytics (OLAP). See Dimensional Models.
Semi-Structured: JSON, Parquet, Avro supporting nested data. Used in data lakes, MongoDB, Delta Lake, Databricks.
Graph: Node/edge structures for relationship-heavy queries. Used in Neo4j, Amazon Neptune.
Key-Value: Simple stores optimised for speed and scale. Used in DynamoDB, Cosmos DB, Redis.

Note: Physical models often exist without documented conceptual/logical models, but these can be reverse-engineered to understand business meaning.

Modelling Lifecycle

Recommended approach:

Enterprise and domain models define the authoritative concepts, meaning, ownership, and standards for the organisation
Solution models apply and extend those concepts to meet specific project needs
Solutions depend on enterprise and domain models for consistency, reuse, and interoperability
Enterprise and domain models depend on solutions to validate, refine, and evolve based on real delivery experience
Governance controls the feedback loop, promoting solution learnings to domain or enterprise level where they are materially reusable
The relationship is iterative and cumulative: each project strengthens the enterprise baseline over time

Top down:

Start with industry reference model
Engage business domains to refine domain models
Review existing reports and documentation

Middle out:

Start with use case requirements
Deep dive business processes and associated concepts
Reconcile business concepts with actual source data

Bottom up:

Infer logical and conceptual models from applications, data and metadata

Enterprise Context

Enterprise-wide modelling is challenging due to semantic differences across functional areas. Instead, we focus on common touch points using Domain Models and Canonical Models where systems and domains integrate.

Domain Topology

Refer to Domain-Centric Design

Domains represent functional and organisational boundaries. Each domain encapsulates responsibilities, services, processes, information, expertise, and governance. Domains serve their own objectives while offering value to other domains and the enterprise.

A Domain Topology maps domains across the enterprise and their relationships, typically organised by functional areas (though other dimensions like region may apply).

Illustative example domains for a Healthcare organisation

Domain Models

For entity naming conventions in domain models, see Entity Naming in Modelling Standards and Conventions.

Under Domain-Centric Design, domains are authoritative for their information definitions. They own concept meanings, define lifecycle rules, and act as the system of semantic truth for their scope.

A Domain Model describes business concepts relevant to a domain, including:

Business Objects: Operational entities with identity, lifecycle, and direct participation in business processes (e.g., Patient, Order, Product)
Supporting Concepts: Qualities, rules, conditions, and classifications (e.g., Risk, Compliance, Status)

Relationship to Business Processes:

Domain Models define WHAT (objects, attributes, relationships)
Business Process Models define HOW (creation, transformation, consumption)
Processes reveal which objects matter, expose lifecycles, uncover rules, and clarify business language
Clear domain models enable process design through shared vocabulary

Terminology: We prefer "Domain" over "Subject Area" as it ties to governance and ownership boundaries, supporting clear accountability for data products and outcomes.

Canonical Models

For canonical entity naming conventions, see Entity Naming in Modelling Standards and Conventions.

Canonical models represent authoritative, agreed-upon definitions for concepts standardised across domains and systems (e.g., APIs, interoperability, conformed dimensions).

Characteristics:

Conceptual and logical at minimum; physical where applicable
Represent standardised information contracts
Support interoperability and consistent interpretation

Use canonical models where:

Multiple domains integrate
Semantic consistency is required
Cross-domain processes exchange information

Role in Cross-Domain Processes:

Define integration points and standardised entities (e.g., canonical "Customer")
Enable process flows across domains (e.g., Sales → Fulfilment → Finance)
Document data contracts at each handoff point

Specialised Model Types

Business Process Models

Business process models illustrate activity sequences, decision points, participants, and work flows within and across domains. They define business objects (e.g., Customer, Order, Invoice) that form the basis of conceptual models within Domain Models and Fact Tables and Business Processes in Dimensional Modeling.

Data Warehouse Models

Data warehouses transform data through layers optimised for performance and analysis while preserving conceptual and logical semantics:

Conform to canonical standards
Apply reference data and business terminology
Ensure data quality and reliability
Preserve history for time-based analysis
Enrich with metadata for discovery and governance
Optimise for consumption (dimensional, data vault)
Aggregate for analytical use

Each layer has distinct physical structures aligned to underlying logical models.

Data Vault Model

Data Vault models centre on business concepts and associations from Enterprise/Domain Conceptual Models. Refer to Data Vault 2.0 standards for guidance.

Dimensional (Kimball) Model

For dimensional model naming standards, SCD columns, and key conventions, see Dimensional Models in Modelling Standards and Conventions.

Kimball Dimensional Modelling creates Information Marts using Star Schemas:

Fact tables: Quantitative metrics/events from business processes (sales, shipments, payments)
Dimension tables: Descriptive context (customer, product, date, region)

Intentionally denormalised for query performance and user-friendly analysis.

Conformed Dimensions: Conformed dimensions are physical implementations of canonical models in Dimensional models:

Ensure consistent definitions across fact tables and marts (e.g., shared Customer dimension across Sales, Support, Marketing)
Enable cross-process analysis and drill-across queries
Achieve "single version of truth" across analytical systems

Dimensional Bus Matrix

A Bus Matrix maps intersections between business processes (facts) and dimensions for scalable warehouse design.

Structure:

Rows: Business processes (Orders, Shipments, Payments)
Columns: Dimensions (Customer, Product, Time, Geography)
Cells: ✓ indicates dimension usage

Benefits:

Identifies common/conformed dimensions for consistency
Supports modular, independent implementation
Enables cross-process analytics

Example:

Process \ Dimension	Customer	Product	Time
Orders	✓	✓	✓
Shipments	✓	✓	✓

Guides star schema implementation and dimension standardisation.

Semantic Layers

For business-facing naming standards used in semantic layers, see Business Names vs Physical Names in Modelling Standards and Conventions.

Semantic layers are business-oriented abstraction layers that present data in business terms rather than physical structures.

Purpose:

Translate physical structures into business concepts
Encapsulate business logic, measures, and calculations
Provide consistent metric definitions for analytics
Simplify access for reporting and self-service use

Examples: Power BI semantic models, dbt Semantic Layer

Semantic layers depend on domain and canonical models for base entities but establish authoritative definitions for derived metrics within their consumption context.

Measures and Metrics

For measures and metrics naming standards and suffix conventions, see Measures and Metrics and Quantities & Measures in Modelling Standards and Conventions.

Measures and metrics are first-class semantic layer concepts that quantify business performance.

Measures: Numeric values directly aggregated from data (e.g., Sales Amount, Bed Days)

Derived from fact events/transactions at a clear grain (per encounter, per day)
Aggregated using simple functions (SUM, COUNT, AVG)
Sensitive to filter context (time, domain, segment)

Metrics: Business expressions combining measures (e.g., Readmission Rate, Gross Margin %)

Express performance or quality vs raw volume
Often use complex logic (conditionals, windows, exclusions)
Require clear definitions and assumptions

Additivity: Classify measures by aggregation behaviour:

Additive: Sum across all dimensions (e.g., Sales Amount)
Semi-additive: Sum across some dimensions only (e.g., Inventory Level - products yes, time no)
Non-additive: Cannot sum meaningfully (e.g., ratios, rates)

Metrics inherit these behaviours. Record additivity as metadata to constrain aggregations in marts and BI tools.

Logical Definition: Measures/metrics should be defined against domain and canonical models and documented in the business glossary with:

Name, description, owning domain, stakeholders
Formal definition and calculation rule
Input entities, attributes, filters
Validity period and version history
Related KPIs, dashboards, reports

Physical Implementation:

Fact tables: Base measures as columns (e.g., amount, quantity); sometimes pre-aggregated metrics
dbt models: Centralised, version-controlled definitions aligned to glossary
BI tools (Power BI): Expressions (DAX) aligned with central definitions; push complex logic upstream when possible

Placement Principles:

Favour upstream (mart / dbt zone) when:

Reused across dashboards/domains
Enterprise KPI or regulatory use
Complex business rules requiring testing
Needs governance and discoverability

Allow BI-layer-only when:

Experimental or single-report scope
Simple presentation variant
Clearly not the system of record

Example: Base measure Total Encounters defined in dbt. Power BI DAX adds interactive filter-aware metric: Readmission Rate (Current Filters) = DIVIDE([Readmission Count], [Total Encounters])

Governance:

Govern like reference data with review/approval process
Retain effective dates and versions
Notify consumers of changes
Make discoverable via catalog/glossary (calculation, dependencies, usage)

Master Data and Reference Data

For reference data naming standards, logical modelling, and physical implementation conventions, see Reference Data in Modelling Standards and Conventions.

Master Data: Core business entities critical to operations and shared across systems (e.g., Patient, Product, Provider, Location)

Managed as authoritative "single source of truth"
Subject to governance and quality processes
Forms backbone of conformed dimensions in data warehousing

Reference Data: Stable, standardised values for categorisation and validation (e.g., Status codes, Country codes, Product Categories)

May be externally standardised (ISO) or internally maintained
Can be simple (code/name pairs) or hierarchical (see Hierarchies)
Managed enterprise-wide with optional source-specific variants

Logical Representation

Master Data: Patient, Product, Provider, Location, Organisation, Payer Reference Data: Status Code, Country Code, Product Category, Location Type, Therapeutic Class, Unit of Measure

Physical Implementation

Physical form depends on analytical use, query patterns, history requirements, complexity, and performance needs.

Physical Forms:

1. Simple Lookup Tables

Basic code-to-description translation
Minimal attributes (code, name, description)
No history tracking
Example: ref_status_code, ref_currency

2. Dimension Tables (Star Schema)

Rich attributes and hierarchies
History tracking via SCD (Slowly Changing Dimensions)
Used for slicing, filtering, grouping
Example: dim_product, dim_provider, dim_location

3. Embedded in Fact Tables (Denormalised)

Attributes embedded directly for performance
Trades storage for speed (avoids joins)
Example: Fact includes status_code, status_name columns

For reference data modelling standards, see Reference Data Standards and Conventions.

Hierarchies

Hierarchies are structured parent-child relationships representing natural groupings and aggregation levels.

Across Model Types:

Conceptual/Logical: Business classifications and organisational structures (product categories, regions, org units)
Physical: Implemented via self-referencing foreign keys, hierarchy tables, or path encodings
Dimensional: Enable drill-down/roll-up analysis (Day → Month → Quarter → Year)
Reference data: Standardised classification schemes

Modelling Hierarchies: Subtypes vs. Relationships

Use Subtypes for "Is A" Taxonomies:

Type distinctions with mutually exclusive subtypes
Example: Individual Customer is a Customer; Inpatient Encounter is an Encounter
Primary classification dimension for fundamentally different variants

Use Relationships for Other Hierarchies:

Compositional (Part Of), categorical (Belongs To), organisational (Reports To)
Support multiple concurrent hierarchies
Example: Location entities (Room, Ward, Building, Campus) related via "Part Of"

Examples

Relationship Hierarchy: Healthcare Facility Structure

Hospital Campus
├── Building A
│   ├── Ward 1 → Room 101, Room 102
│   └── Ward 2 → Room 201
└── Building B
    └── Outpatient Clinic → Consultation Room 1

Physical Representations:

1. Parent-Child (Adjacency List): Each record references its parent for recursive traversal.

location_id	location_name	location_type	parent_location_id
1	Hospital Campus	Facility	NULL
2	Building A	Facility	1
3	Ward 1	Ward	2
4	Room 101	Room	3
5	Room 102	Room	3

2. Flattened Dimension: Lowest-grain entity with hierarchy level columns.

room_key	campus_name	building_name	ward_name	room_name	effective_from_datetime	effective_to_datetime	updated_datetime
1001	Hospital Campus	Building A	Ward 1	Room 101	2024-01-01 00:00:00	NULL	2024-01-01 00:00:00
1002	Hospital Campus	Building A	Ward 1	Room 102	2024-01-01 00:00:00	NULL	2024-01-01 00:00:00

Adjacency lists enable flexible navigation; flattened structures optimise star schema query performance.

Reference Data Hierarchy: Product Classification

All Products
├── Medical Equipment
│   ├── Diagnostic Equipment → Imaging Systems (MRI, CT Scanners)
│   └── Therapeutic Equipment
└── Pharmaceuticals
    ├── Prescription Drugs
    └── Over-the-Counter