Intuitas Data Modelling Standards and Conventions

Return to home

Updated 5/12/2025

Related Documents: This document focuses on naming conventions and standards. For platform-specific implementation patterns and architectural details, see Standards and Conventions. For conceptual framework and model types, see Modelling Framework.

General Naming Conventions

Entity Naming (General Rules)

Use clear business language
Use terminology familiar to business stakeholders and consistent with the business glossary
In domain models, use terms appropriate to the business context and document synonyms in the business glossary
Canonical entities must use terms that are formally agreed and recognised across domains
Use singular nouns for entities (e.g., Customer, not Customers)
Plurality is allowed only in dimensional fact tables (e.g., fact_payments). Dimension tables use singular (e.g., dim_customer)
Avoid system-specific or technical terminology
Avoid abbreviations unless universally understood
Be specific: add context where needed (e.g., Admin User vs User)

Business Names vs Physical Names

Business names use natural language with spaces (e.g., Order Placed)
Business names are authoritative for semantic meaning
Machine-safe identifiers (e.g., GUIDs) are internal metadata only
Physical names follow platform-specific conventions (e.g., order_placed)
The business glossary is the semantic source of truth

Example:

Business Name: Order Placed
Internal Identifier: GUID-1234-5678
Physical Name: order_placed

Events and Transactions

Business events and transactions may be modelled in two valid ways:

Model the entity itself, treating its lifecycle state as an attribute (e.g., Order with Order Status as an attribute)
Model the entity in a state-specific form where the status is intrinsic to the concept (e.g., Order Placed, Invoice Issued)

Choose the approach that best represents how the business actually thinks and works with the concept.

Relationship Naming Conventions

Use business verbs, not technical terms
Names must read as a business sentence
Use present tense and active voice
Avoid vague terms (e.g., "related to", "linked to")
Choose precise verbs that express meaning
Reflect ownership or composition explicitly where applicable
Use domain-specific language
Direction should be clear

Examples:

Customer places Order
Policy covers Claim
Doctor refers Patient
Employer employs Worker
Patient receives Treatment
Supplier delivers Product

Avoid:

has_fk
references
linked to
related to

Relationship Directionality

Every relationship has one authoritative direction and name
Model one semantic relationship with one verb. Allow alternate read-paths in tooling or documentation, not duplicate relationships
Do not create passive inversions (e.g., "is placed by" instead of "places")
If you need a different semantic perspective, model it as a separate relationship

Relationship Cardinality

There are two options:

Use crow's foot notation (shows 'one' with a line, 'many' with a branching crow's foot; commonly denotes relationships such as one-to-many, many-to-many, etc.)
Use multiplicity notation (explicitly labels relationship ends with numbers or ranges, e.g., '0..1', '1', '0..', '1..', indicating how many entities participate in the relationship)

Single direction (preferred):

Customer places Order ✓

Avoid passive inversion (same relationship, just reversed grammar):

Order is placed by Customer ✗

Separate relationships (different business concepts):

Customer places Order (ordering action)
Customer is billed for Order (financial relationship)

Conceptual Models

For conceptual modelling framework and principles, see Conceptual Models in the Modelling Framework.

Entities (Conceptual)

Business-Facing format with capitalised first letters and spaces (e.g., Order Placed, Customer, Patient Encounter)
Optimised for clarity and business communication

Diagrammatic Representation (Conceptual)

Boxes represent business concepts (entities)
Lines represent relationships

Show:

Core concepts
Key relationships
Major business rules

Do not show:

Keys
Data types
Technical constraints
Detailed cardinality beyond simple one-to-many

Logical Models

For logical modelling framework and principles, see Logical Models in the Modelling Framework.

Entities (Logical)

Use the same format as conceptual models
Business-Facing format with capitalised first letters and spaces (e.g., Order Placed, Customer, Patient Encounter)
Optimised for clarity and business communication

Attributes

Use Business-Facing format with capitalised first letters and spaces
Names must be business-meaningful
Avoid embedding data types in names
Prefer semantic names over technical ones

Measures and Metrics

For measures and metrics framework, governance, and placement principles, see Measures and Metrics in the Modelling Framework.

Use Business-Facing format with capitalised first letters and spaces (e.g., Invoice Amount, Item Count)
Make names consistent with related entities and attributes
Use semantic suffixes Quantities & Measures in the Standard Suffix Inventory.

Example:

Measure: Total Sales Amount — aggregated sum of sales transactions
Metric: Average Order Value — calculated as Total Sales Amount / Order Count

Important:

Keep metric names dimension-agnostic: Avoid including dimension references like "by Product" or "by Region" in metric names—the dimensional model handles slicing (e.g., use Sales Amount not Sales Amount by Product by Time).
Exception—Intrinsic dimensions: Include dimensional qualifiers only when the dimension is intrinsic to the calculation logic and defines how the metric works (e.g., Customer Lifetime Value is calculated at customer grain by definition; Monthly Revenue Growth Rate compares month-to-month by definition).
Temporal metrics: Include time window qualifiers when the period defines the calculation logic (e.g., Year-to-Date Sales Amount accumulates from year start; 90-Day Rolling Average uses a 90-day window).
Aggregation behavior: Averages and ratios cannot be summed across periods; they must be recalculated at each grain. BI tools like Power BI default to SUM, producing incorrect results for non-additive metrics—define these explicitly with appropriate DAX measures (AVERAGE, DIVIDE) rather than column aggregations. Document additivity to prevent incorrect rollups.

Keys (Logical Models)

Key Types

Natural or Surrogate (implementation characteristic):

Natural Key: Uses a real-world business identifier (e.g., Invoice Number, Medicare Number)
Surrogate Key: System-generated identifier (e.g., auto-increment, UUID)
This is a metadata property, not encoded in the name

Primary or Alternate (role in the model):

Primary Key (PK): The chosen unique identifier for the entity
Alternate Key (AK): Any other unique identifier
Tag primary keys as (PK) in logical models
All other unique keys are considered alternate keys

Key Naming Pattern

Naming follows these patterns for readability, but metadata remains the source of truth for key roles (i.e., metadata—not naming—should be used to identify Primary Keys, Alternate Keys, and Foreign Keys).

Default: When source systems or business glossaries explicitly define key names, preserve their terminology even if it deviates from this convention. Whether a key is natural or surrogate, primary or alternate is captured as metadata

Missing name: Where a name hasn't been provided, then preference ID suffix with natural/business keys and Key suffix with surrogate warehouse keys.

Canonical Models Rule: In canonical/enterprise views, fields named \<Entity> ID must represent business-meaningful identifiers recognizable across domains, not warehouse-generated surrogates. Use \<Entity> Key for warehouse surrogate keys.

Scenario Example:

Source system (CRM): Provides a natural business identifier for the customer customer_number.
Logical model: The identifier is named Customer Number.
Physical staging layer: The identifier is stored as the column customer_number to reflect source system (CRM).
Canonical view: Exposes Customer Number with metadata indicating it is a natural primary key (NPK).
Dimensional model (Customer dimension): Introduces a warehouse surrogate key called Customer Key (physical column customer_key) to support SCD Type 2.
Fact tables (e.g. Sales fact): Store a foreign key column customer_key that references the Customer dimension.
Domain contracts (other marts using the conformed dimension): A Sales mart and a Billing mart both join their fact tables to the conformed Customer dimension via customer_key internally. The dimension still carries the natural identifier as Customer Number (physical column customer_number), and this is what domains expose in their canonical contracts so downstream consumers see a consistent business identifier rather than the surrogate key.

Examples:

Applying the "missing name" rule (when creating new identifiers):

Primary Keys:

Customer ID (PK) — natural business identifier (when no source name exists)
Customer Key (PK) — surrogate warehouse key (e.g., auto-increment, UUID)
Order ID (PK) — natural business identifier (when no source name exists)

Applying the "default" rule (preserving source system names):

Primary Keys from Source:

Customer Number (PK) — natural business identifier from source system
Account Code (PK) — natural business identifier from source system

Foreign Keys:

FKs can point to either natural or surrogate PKs and that the FK naming follows the target’s business term where possible:

Patient ID (FK) — references Patient ID from another entity (the original key may be natural or surrogate)
Customer Order Key (FK) — references the surrogate composite key

Warehouse Dimension Keys:

Customer Key — surrogate key for dimensional models (used for joins in the warehouse)
Physical mapping: customer_key

Other unique identifiers named according to business meaning:

Invoice Number (PK) — natural business reference number for an invoice
Medicare Number (AK) — natural alternate key for a patient
National Health Identifier (AK) — natural alternate key for a patient

Composite Keys:

Our preference is to give composite keys their own distinct name if the modelling tools allow for it.
Each component follows the same naming conventions (Default or Missing name rules apply)
If the composite is a natural key, use ID suffix; if any single component is surrogate use Key suffix

Examples:

Flight ID (Natural PK) = Flight Number + Departure Date — logical name for the natural composite (all components natural)
Enrollment ID (Natural PK) = Student Number + Course Number — logical name for natural composite (all components natural)
Daily Product Sales Key (Surrogate PK) = Product ID + Date Key (from DW) + Transaction Key (from DW) — logical name for surrogate composite (at least one surrogate component)
Customer Order Key (Surrogate PK) = Customer ID + Order Key (from DW) — logical name for surrogate composite (mixed: surrogate + natural)

Diagrammatic Representation (Logical)

Consistent with conceptual models, with additional details to show attributes, key types
Show relationships as lines and cardinality as crows foot or multiplicity notation.

Physical Models

For physical modelling framework and common formats, see Physical Models in the Modelling Framework.

Physical models represent implemented storage structures and must follow deterministic, platform-safe naming conventions.

Singular vs Plural Naming:

Default: Use singular nouns for table names (e.g., customer, invoice, payment)
Exception - Dimensional Facts: Use plural nouns for fact tables (e.g., fact_payments, fact_orders)
Dimensional Dimensions: Use singular nouns (e.g., dim_customer, dim_product)

Relational Structures

Naming

All physical objects must use lowercase with underscores (snake_case).

General Rules:

No mixed case
No spaces
Use singular table names (except for dimensional model facts - see below)
Prefer clarity over brevity
Avoid source-system naming where possible

Tables:

Express business meaning
Use prefixes where appropriate (e.g., fact_, dim_)
Examples: customer, invoice, fact_orders (dimensional fact), dim_patient (dimensional dimension)

Columns:

Map directly from logical attributes
Use semantic names
Do not encode data types or storage formats
There are two options for handling units:
A: embed the unit in the column name
B: separate value and unit type as separate columns
Examples: customer_id, invoice_number, order_datetime, total_amount, is_active

Diagrammatic Representation

Consistent with logical models
Show relationships as lines and cardinality as crows foot or multiplicity notation.

Databricks Conventions

For detailed conventions and examples see Databricks

Catalogs:

Catalogs represent domain-level scope (e.g., engineering__dev, corporate__dev)
May optionally include zone, layer, or subdomain in the catalog name depending on desired granularity for access and sharing controls
Naming is lowercase, underscore-separated, and meaningful
Pattern: {domain}{__env} (e.g., corporate__dev)

Schemas:

Schemas represent layers and source systems within a catalog
Use double underscores (__) as separators for schema components
Optional channel suffix as an additional __-separated component (avoid special characters in identifiers)
Pattern: {layer}{__source_system}{__source_schema}{__channel} (e.g., ods__fhirhouse__dbo__lakeflow)
Full example: engineering__dev.ods__fhirhouse__dbo__lakeflow.encounter

Tables and Views:

Tables for persisted data
Views for logical abstraction
Names reflect business meaning, not source-system terminology

Columns:

Use lowercase snake_case
No abbreviations unless standard

Dimensional Models

For dimensional modelling framework, conformed dimensions, and dimensional bus matrix, see Dimensional (Kimball) Model in the Modelling Framework.

For detailed conventions and examples including staging models see the Information Marts sections of Schema and Object Conventions

Naming:

Fact tables prefixed with fact_ and use plural entity names (e.g., fact_payments, fact_orders) (pluralisation here is an exception in naming)
Dimension tables prefixed with dim_ and use singular entity names (e.g., dim_customer, dim_date)
Warehouse dimension keys: <entity>_key (the surrogate key used for joins in the warehouse)

SCD Type 2 Columns:

Column	Description
`effective_from_datetime`	Timestamp when this record version became effective
`effective_to_datetime`	Timestamp when this record version expired (NULL for current records)
`updated_datetime`	Timestamp from the source system indicating when the record was last modified

Default dbt Snapshot Columns:

Column	Description
`dbt_scd_id`	Unique identifier for each snapshot record (surrogate key)
`dbt_valid_from`	Timestamp when this record version became effective
`dbt_valid_to`	Timestamp when this record version expired (NULL for current records)
`dbt_updated_at`	Timestamp from the source system indicating when the record was last modified
`dbt_deleted`	(Optional) Boolean flag indicating if a record has been deleted from the source system

Measures and Metrics:

Align to logical model naming but apply lowercase snake_case (e.g., invoice_amount)

Data Warehouse Keys

Key resolution

Mappings of keys are often required to resolve and integrate heterogeneous identifiers originating from multiple source systems into a consistent warehouse-wide identifier. Key mapping tables support integration by translating source-system identifiers into a common business key and/or surrogate key used by downstream dimensional models.

Example: - corporate__prod.edw.keys_employee (base mart keyset conforming SAP and Workday employee identifiers into a single enterprise employee key)

Business Keys

Business keys (BKs) represent the real-world identifier of an entity and are sourced from operational systems.

Business keys may be:

Single-column or composite
Source-specific or enterprise-conformed

The basis of our conformed BK pattern is: source_system|source_identifier in string format

Usage guidance:

Business keys may be used directly as dimension primary keys (and fact foreign keys) for simple Type 1 dimensions, however this creates inconsistency if other dimensions use surrogate keys for Type 2 tracking.
Business keys alone are insufficient for Type 2 dimensions because they do not distinguish historical versions of the same entity. Hence the need for Surrogate Keys.

Surrogate Keys

Surrogate keys (SKs) are system-generated identifiers that uniquely represent a specific version of a dimensional entity.

They are required for:

Type 2 (and higher) SCD dimensions
Stable fact-to-dimension joins
Decoupling fact tables from volatile or composite business keys

For Type 2 dimensions, the logical version grain is:

business_key + effective_from_datetime

The surrogate key is the physical primary key representing this grain and is used by fact tables as the foreign key.

Type 2 Fact Resolution

For Type 2 behaviour, facts must resolve the correct dimensional version using an effective-date window.

fact.business_key = dim.business_key
AND fact.event_timestamp >= dim.effective_from_datetime
AND fact.event_timestamp <  COALESCE(dim.effective_to_datetime, TIMESTAMP '9999-12-31 00:00:00')

Databricks Surrogate Keys

In Databricks, there are multiple approaches to creating surrogate keys.

IDENTITY

Databricks recommended approach here:

BIGINT GENERATED ALWAYS AS IDENTITY
Sequence- or merge-based incrementing keys

Characteristics:

High performance for joins and aggregations
No collision risk
Environment-specific values (acceptable in most warehouse designs)

Limitations:

Not compatible with dbt-managed tables
Concurrent writes to the table are not supported

Other valid approaches (depending on downstream requirements):

Natural key only (BK as PK/FK): valid for Type 1 dimensions; simplest, but typically wider keys and slower joins.

Type 2 composite key (string): store business_key | effective_from_datetime as a single concatenated key; simple and human readable, but increases key width and join cost.

Deterministic hash key: hash (business_key + effective_from_datetime) into a fixed-width value (string or numeric) for smaller keys and cross-environment stability; manage collision risk by choosing an appropriate hash and store BK/effective_from_datetime alongside for traceability.

dbt Surrogate Keys

dbt supports surrogate hash key generation out of the box:

dbt_utils.surrogate_key(...) (MD5 string hash)
Platform-specific numeric hashes (e.g. xxhash64) cast to BIGINT

Usage guidance

Hash-based SKs should include:

Business key
effective_from_datetime (for Type 2 dimensions)

Hash-based SKs enable deterministic keys across environments:

MD5 used by the dbt default macro returns a 32-character hex string. Binary is more performant but it requires tweaking.
Consider SHA256 if hash collision risk is a concern (it is unlikely for dimensions however)

Key Design Summary

Numeric (BIGINT) keys offer better join performance
Prefer numeric SKs for performance (especially with PowerBI); use deterministic SKs when cross-environment stability is required
Date dimension tip: Use integer with YYYYMMDD format as the surrogate key for optimal performance
Create keys at the layer that owns their grain and semantics i.e where are business keys resolved and where is Type-2 first defined? In most architectures, this will be the EDW (Silver) zone. However, if Type 2 semantics are applied later (e.g. in Infomart (Gold)), then surrogate keys must be created there instead.

Reference Data

For reference data and master data framework, logical representation, and physical implementation patterns, see Master Data and Reference Data in the Modelling Framework.

Reference data plays a critical role in conformance by providing standardised values that enable mapping of source-specific codes to canonical enterprise definitions.

Logical Modelling

Reference entities contain:

Natural keys (e.g., Country Code, Product Type Code)
Code and description attributes
Optional: effective datetimes, display sequences, parent references (hierarchies), business metadata
One-to-many relationships to domain entities

Example: Country Code Reference Entity

country_code	country_name	iso_code_3	display_sequence	effective_from_datetime	effective_to_datetime
AU	Australia	AUS	10	2020-01-01 00:00:00	9999-12-31 00:00:00
GB	United Kingdom	GBR	20	2020-01-01 00:00:00	9999-12-31 00:00:00
US	United States	USA	30	2020-01-01 00:00:00	9999-12-31 00:00:00
NZ	New Zealand	NZL	40	2020-01-01 00:00:00	9999-12-31 00:00:00
CA	Canada	CAN	50	2020-01-01 00:00:00	9999-12-31 00:00:00

Note: Additional standard attributes (audit columns: created_timestamp, created_by, modified_timestamp, modified_by; optional: parent_code, version, source_system_id) would be included in the physical implementation.

Physical Implementation

Raw Reference Data sourced from upstream systems may require their own staging and transformation pipelines in order to conform them to standard, preserve change history and capture required metadata.

Cleansed reference tables are stored in EDW (Silver) zone for wide availability following the reference data naming standard:

Schema naming convention: edw_ref{optional: __domain name}{optional: __subdomain name(s)}
Object naming convention: {reference_data_set_name}{optional:__source_system}{optional:__source_channel}
e.g.: corporate__dev.edw_ref.account_code
Effectivity: effective_from_date, effective_to_date, (is_active is derivable)
Audit: created_datetime, created_by, updated_datetime, updated_by
Hierarchy: parent_code
Business: description
Technical: version, source_system_id

Usage:

Mapping logic is applied in EDW (Silver) zone staging models during transformation for domain/enterprise-wide application.
Consumption: Post-mapped data are exposed in marts in EDW or indirectly in Infomart (Gold) (having passed through EDW).
As dimension attributes: Reference values embedded directly in dimension tables (e.g., Product Type Code/Description in Product dimension) for filtering and grouping.

Change tracking:

Implement Type 1, 2, 4, or 6 slowly changing dimension strategies based on business requirements for point-in-time accuracy. This is especially important when reference data changes frequently or has many attributes.
Consider using mini-dimensions/outriggers—separate dimension tables linked via foreign keys—to efficiently track history without excessive row growth in the main dimension.

Recommended Practices:

Store codes and descriptions in fact tables only when necessary for performance
Prefer dimension lookups to maintain single source of truth

Standard Suffix and Prefix Inventory

Identity & Keys

Suffix	Meaning	Logical/Conceptual	Physical
`ID`	Primary business identifier	`Customer ID`	`customer_id`
`Key`	Warehouse dimension key (dimensional models only)	`Customer Key`	`customer_key`
`Number`	Business reference number	`Invoice Number`	`invoice_number`
`Code`	Coded business value	`Diagnosis Code`	`diagnosis_code`
`Reference`	External reference	`External Reference`	`external_reference`
`Identifier`	Explicit identifier (potentially many identifiers)	`National Identifier`	`national_identifier`

Temporal Concepts

Suffix	Meaning	Logical/Conceptual	Physical
`Date`	Calendar date only	`Admission Date`	`admission_date`
`DateTime`	Timestamp (assumed UTC)	`Order DateTime`	`order_datetime`
`DateTime <Timezone>`	Timestamp in explicit timezone	`Order DateTime AEST`	`order_datetime_aest`
`Time`	Time only	`Appointment Time`	`appointment_time`
`From Date`	Validity start	`Policy From Date`	`policy_from_date`
`To Date`	Validity end	`Policy To Date`	`policy_to_date`

Example:

Business Name: Order DateTime
Physical Name (UTC, implied): order_datetime
Physical Name (Local timezone explicit): order_datetime_aest

State & Classification

Suffix	Meaning	Logical/Conceptual	Physical
`Status`	Lifecycle state	`Order Status`	`order_status`
`Type`	Classification	`Customer Type`	`customer_type`
`Category`	Grouping	`Product Category`	`product_category`
`Class`	Structural grouping	`Asset Class`	`asset_class`

Quantities & Measures

Suffix	Meaning	Logical/Conceptual	Physical
`Count`	Quantity	`Item Count`	`item_count`
`Amount`	Monetary value	`Invoice Amount`	`invoice_amount`
`Value`	General numeric	`Score Value`	`score_value`
`Rate`	Rate	`Interest Rate`	`interest_rate`
`Ratio`	Proportion	`Utilisation Ratio`	`utilisation_ratio`
`Percentage`	Percentage	`Discount Percentage`	`discount_percentage`

Boolean Indicators (Prefix Convention)

Prefix	Meaning	Logical/Conceptual	Physical
`Is`	State check	`Is Active`	`is_active`
`Has`	Ownership	`Has Consent`	`has_consent`
`Can`	Capability	`Can Transact`	`can_transact`

Audit & Control

Suffix	Meaning	Logical/Conceptual	Physical
`Created DateTime`	Creation timestamp	`Created DateTime`	`created_datetime`
`Updated DateTime`	Last update timestamp	`Updated DateTime`	`updated_datetime`
`Deleted DateTime`	Soft deletion	`Deleted DateTime`	`deleted_datetime`
`Effective From Date`	Valid from date	`Effective From Date`	`effective_from_date`
`Effective To Date`	Valid to date	`Effective To Date`	`effective_to_date`

Summary of Naming Rules

Suffixes encode meaning, not technical data type or implementation details.

Rule	Logical/Conceptual	Physical
Primary business identifier	`Customer ID`	`customer_id`
Warehouse dimension key	`Customer Key`	`customer_key`
Business reference	`Invoice Number`, `Diagnosis Code`	`invoice_number`, `diagnosis_code`
Temporal meaning	`Order DateTime` (assumed UTC)	`order_datetime`
Classification	`Order Status`, `Customer Type`	`order_status`, `customer_type`
Measures	`Invoice Amount`, `Interest Rate`	`invoice_amount`, `interest_rate`
Boolean indicators	`Is Active`, `Has Consent`	`is_active`, `has_consent`

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search