Intuitas Data Modelling Standards and Conventions
Updated 5/12/2025
Related Documents: This document focuses on naming conventions and standards. For platform-specific implementation patterns and architectural details, see Standards and Conventions. For conceptual framework and model types, see Modelling Framework.
Table of Contents
- General Naming Conventions
- Conceptual Models
- Logical Models
- Physical Models
- Standard Suffix and Prefix Inventory
General Naming Conventions
Entity Naming (General Rules)
- Use clear business language
- Use terminology familiar to business stakeholders and consistent with the business glossary
- In domain models, use terms appropriate to the business context and document synonyms in the business glossary
- Canonical entities must use terms that are formally agreed and recognised across domains
- Use singular nouns for entities (e.g.,
Customer, notCustomers) - Plurality is allowed only in dimensional fact tables (e.g.,
fact_payments). Dimension tables use singular (e.g.,dim_customer) - Avoid system-specific or technical terminology
- Avoid abbreviations unless universally understood
- Be specific: add context where needed (e.g.,
Admin UservsUser)
Business Names vs Physical Names
- Business names use natural language with spaces (e.g.,
Order Placed) - Business names are authoritative for semantic meaning
- Machine-safe identifiers (e.g., GUIDs) are internal metadata only
- Physical names follow platform-specific conventions (e.g.,
order_placed) - The business glossary is the semantic source of truth
Example:
- Business Name:
Order Placed - Internal Identifier:
GUID-1234-5678 - Physical Name:
order_placed
Events and Transactions
Business events and transactions may be modelled in two valid ways:
- Model the entity itself, treating its lifecycle state as an attribute (e.g.,
OrderwithOrder Statusas an attribute) - Model the entity in a state-specific form where the status is intrinsic to the concept (e.g.,
Order Placed,Invoice Issued)
Choose the approach that best represents how the business actually thinks and works with the concept.
Relationship Naming Conventions
- Use business verbs, not technical terms
- Names must read as a business sentence
- Use present tense and active voice
- Avoid vague terms (e.g., "related to", "linked to")
- Choose precise verbs that express meaning
- Reflect ownership or composition explicitly where applicable
- Use domain-specific language
- Direction should be clear
Examples:
Customer places OrderPolicy covers ClaimDoctor refers PatientEmployer employs WorkerPatient receives TreatmentSupplier delivers Product
Avoid:
has_fkreferenceslinked torelated to
Relationship Directionality
- Every relationship has one authoritative direction and name
- Model one semantic relationship with one verb. Allow alternate read-paths in tooling or documentation, not duplicate relationships
- Do not create passive inversions (e.g., "is placed by" instead of "places")
- If you need a different semantic perspective, model it as a separate relationship
Relationship Cardinality
There are two options:
- Use crow's foot notation (shows 'one' with a line, 'many' with a branching crow's foot; commonly denotes relationships such as one-to-many, many-to-many, etc.)
- Use multiplicity notation (explicitly labels relationship ends with numbers or ranges, e.g., '0..1', '1', '0..', '1..', indicating how many entities participate in the relationship)
Single direction (preferred):
Customer places Order✓
Avoid passive inversion (same relationship, just reversed grammar):
Order is placed by Customer✗
Separate relationships (different business concepts):
Customer places Order(ordering action)Customer is billed for Order(financial relationship)
Conceptual Models
For conceptual modelling framework and principles, see Conceptual Models in the Modelling Framework.
Entities (Conceptual)
- Business-Facing format with capitalised first letters and spaces (e.g.,
Order Placed,Customer,Patient Encounter) - Optimised for clarity and business communication
Diagrammatic Representation (Conceptual)
- Boxes represent business concepts (entities)
- Lines represent relationships
Show:
- Core concepts
- Key relationships
- Major business rules
Do not show:
- Keys
- Data types
- Technical constraints
- Detailed cardinality beyond simple one-to-many
Logical Models
For logical modelling framework and principles, see Logical Models in the Modelling Framework.
Entities (Logical)
- Use the same format as conceptual models
- Business-Facing format with capitalised first letters and spaces (e.g.,
Order Placed,Customer,Patient Encounter) - Optimised for clarity and business communication
Attributes
- Use Business-Facing format with capitalised first letters and spaces
- Names must be business-meaningful
- Avoid embedding data types in names
- Prefer semantic names over technical ones
Measures and Metrics
For measures and metrics framework, governance, and placement principles, see Measures and Metrics in the Modelling Framework.
- Use Business-Facing format with capitalised first letters and spaces (e.g.,
Invoice Amount,Item Count) - Make names consistent with related entities and attributes
- Use semantic suffixes Quantities & Measures in the Standard Suffix Inventory.
Example:
- Measure:
Total Sales Amount— aggregated sum of sales transactions - Metric:
Average Order Value— calculated asTotal Sales Amount / Order Count
Important:
- Keep metric names dimension-agnostic: Avoid including dimension references like "by Product" or "by Region" in metric names—the dimensional model handles slicing (e.g., use
Sales AmountnotSales Amount by Product by Time). - Exception—Intrinsic dimensions: Include dimensional qualifiers only when the dimension is intrinsic to the calculation logic and defines how the metric works (e.g.,
Customer Lifetime Valueis calculated at customer grain by definition;Monthly Revenue Growth Ratecompares month-to-month by definition). - Temporal metrics: Include time window qualifiers when the period defines the calculation logic (e.g.,
Year-to-Date Sales Amountaccumulates from year start;90-Day Rolling Averageuses a 90-day window). - Aggregation behavior: Averages and ratios cannot be summed across periods; they must be recalculated at each grain. BI tools like Power BI default to SUM, producing incorrect results for non-additive metrics—define these explicitly with appropriate DAX measures (AVERAGE, DIVIDE) rather than column aggregations. Document additivity to prevent incorrect rollups.
Keys (Logical Models)
Key Types
Natural or Surrogate (implementation characteristic):
- Natural Key: Uses a real-world business identifier (e.g.,
Invoice Number,Medicare Number) - Surrogate Key: System-generated identifier (e.g., auto-increment, UUID)
- This is a metadata property, not encoded in the name
Primary or Alternate (role in the model):
- Primary Key (PK): The chosen unique identifier for the entity
- Alternate Key (AK): Any other unique identifier
- Tag primary keys as
(PK)in logical models - All other unique keys are considered alternate keys
Key Naming Pattern
Naming follows these patterns for readability, but metadata remains the source of truth for key roles (i.e., metadata—not naming—should be used to identify Primary Keys, Alternate Keys, and Foreign Keys).
Default: When source systems or business glossaries explicitly define key names, preserve their terminology even if it deviates from this convention. Whether a key is natural or surrogate, primary or alternate is captured as metadata
Missing name: Where a name hasn't been provided, then preference
IDsuffix with natural/business keys andKeysuffix with surrogate warehouse keys.Canonical Models Rule: In canonical/enterprise views, fields named \<Entity>
IDmust represent business-meaningful identifiers recognizable across domains, not warehouse-generated surrogates. Use \<Entity>Keyfor warehouse surrogate keys.
Scenario Example:
- Source system (CRM): Provides a natural business identifier for the customer
customer_number. - Logical model: The identifier is named
Customer Number. - Physical staging layer: The identifier is stored as the column
customer_numberto reflect source system (CRM). - Canonical view: Exposes
Customer Numberwith metadata indicating it is a natural primary key (NPK). - Dimensional model (Customer dimension): Introduces a warehouse surrogate key called
Customer Key(physical columncustomer_key) to support SCD Type 2. - Fact tables (e.g. Sales fact): Store a foreign key column
customer_keythat references theCustomerdimension. - Domain contracts (other marts using the conformed dimension): A
Salesmart and aBillingmart both join their fact tables to the conformedCustomerdimension viacustomer_keyinternally. The dimension still carries the natural identifier asCustomer Number(physical columncustomer_number), and this is what domains expose in their canonical contracts so downstream consumers see a consistent business identifier rather than the surrogate key.
Examples:
Applying the "missing name" rule (when creating new identifiers):
Primary Keys:
Customer ID(PK) — natural business identifier (when no source name exists)Customer Key(PK) — surrogate warehouse key (e.g., auto-increment, UUID)Order ID(PK) — natural business identifier (when no source name exists)
Applying the "default" rule (preserving source system names):
Primary Keys from Source:
Customer Number(PK) — natural business identifier from source systemAccount Code(PK) — natural business identifier from source system
Foreign Keys:
FKs can point to either natural or surrogate PKs and that the FK naming follows the target’s business term where possible:
Patient ID(FK) — referencesPatient IDfrom another entity (the original key may be natural or surrogate)Customer Order Key(FK) — references the surrogate composite key
Warehouse Dimension Keys:
Customer Key— surrogate key for dimensional models (used for joins in the warehouse)- Physical mapping:
customer_key
Other unique identifiers named according to business meaning:
Invoice Number(PK) — natural business reference number for an invoiceMedicare Number(AK) — natural alternate key for a patientNational Health Identifier(AK) — natural alternate key for a patient
Composite Keys:
- Our preference is to give composite keys their own distinct name if the modelling tools allow for it.
- Each component follows the same naming conventions (Default or Missing name rules apply)
- If the composite is a natural key, use
IDsuffix; if any single component is surrogate useKeysuffix
Examples:
Flight ID(Natural PK) =Flight Number+Departure Date— logical name for the natural composite (all components natural)Enrollment ID(Natural PK) =Student Number+Course Number— logical name for natural composite (all components natural)Daily Product Sales Key(Surrogate PK) =Product ID+Date Key (from DW)+Transaction Key (from DW)— logical name for surrogate composite (at least one surrogate component)Customer Order Key(Surrogate PK) =Customer ID+Order Key (from DW)— logical name for surrogate composite (mixed: surrogate + natural)
Diagrammatic Representation (Logical)
- Consistent with conceptual models, with additional details to show attributes, key types
- Show relationships as lines and cardinality as crows foot or multiplicity notation.
Physical Models
For physical modelling framework and common formats, see Physical Models in the Modelling Framework.
Physical models represent implemented storage structures and must follow deterministic, platform-safe naming conventions.
Singular vs Plural Naming:
- Default: Use singular nouns for table names (e.g.,
customer,invoice,payment) - Exception - Dimensional Facts: Use plural nouns for fact tables (e.g.,
fact_payments,fact_orders) - Dimensional Dimensions: Use singular nouns (e.g.,
dim_customer,dim_product)
Relational Structures
Naming
All physical objects must use lowercase with underscores (snake_case).
General Rules:
- No mixed case
- No spaces
- Use singular table names (except for dimensional model facts - see below)
- Prefer clarity over brevity
- Avoid source-system naming where possible
Tables:
- Express business meaning
- Use prefixes where appropriate (e.g.,
fact_,dim_) - Examples:
customer,invoice,fact_orders(dimensional fact),dim_patient(dimensional dimension)
Columns:
- Map directly from logical attributes
- Use semantic names
- Do not encode data types or storage formats
- There are two options for handling units:
- A: embed the unit in the column name
- B: separate value and unit type as separate columns
- Examples:
customer_id,invoice_number,order_datetime,total_amount,is_active
Diagrammatic Representation
- Consistent with logical models
- Show relationships as lines and cardinality as crows foot or multiplicity notation.
Databricks Conventions
For detailed conventions and examples see Databricks
Catalogs:
- Catalogs represent domain-level scope (e.g.,
engineering__dev,corporate__dev) - May optionally include zone, layer, or subdomain in the catalog name depending on desired granularity for access and sharing controls
- Naming is lowercase, underscore-separated, and meaningful
- Pattern:
{domain}{__env}(e.g.,corporate__dev)
Schemas:
- Schemas represent layers and source systems within a catalog
- Use double underscores (
__) as separators for schema components - Optional channel suffix as an additional
__-separated component (avoid special characters in identifiers) - Pattern:
{layer}{__source_system}{__source_schema}{__channel}(e.g.,ods__fhirhouse__dbo__lakeflow) - Full example:
engineering__dev.ods__fhirhouse__dbo__lakeflow.encounter
Tables and Views:
- Tables for persisted data
- Views for logical abstraction
- Names reflect business meaning, not source-system terminology
Columns:
- Use lowercase
snake_case - No abbreviations unless standard
Dimensional Models
For dimensional modelling framework, conformed dimensions, and dimensional bus matrix, see Dimensional (Kimball) Model in the Modelling Framework.
For detailed conventions and examples including staging models see the Information Marts sections of Schema and Object Conventions
Naming:
- Fact tables prefixed with
fact_and use plural entity names (e.g.,fact_payments,fact_orders) (pluralisation here is an exception in naming) - Dimension tables prefixed with
dim_and use singular entity names (e.g.,dim_customer,dim_date) - Warehouse dimension keys:
<entity>_key(the surrogate key used for joins in the warehouse)
SCD Type 2 Columns:
| Column | Description |
|---|---|
effective_from_datetime |
Timestamp when this record version became effective |
effective_to_datetime |
Timestamp when this record version expired (NULL for current records) |
updated_datetime |
Timestamp from the source system indicating when the record was last modified |
Default dbt Snapshot Columns:
| Column | Description |
|---|---|
dbt_scd_id |
Unique identifier for each snapshot record (surrogate key) |
dbt_valid_from |
Timestamp when this record version became effective |
dbt_valid_to |
Timestamp when this record version expired (NULL for current records) |
dbt_updated_at |
Timestamp from the source system indicating when the record was last modified |
dbt_deleted |
(Optional) Boolean flag indicating if a record has been deleted from the source system |
Measures and Metrics:
- Align to logical model naming but apply lowercase snake_case (e.g., invoice_amount)
Data Warehouse Keys
Key resolution
Mappings of keys are often required to resolve and integrate heterogeneous identifiers originating from multiple source systems into a consistent warehouse-wide identifier. Key mapping tables support integration by translating source-system identifiers into a common business key and/or surrogate key used by downstream dimensional models.
Example:
- corporate__prod.edw.keys_employee (base mart keyset conforming SAP and Workday employee identifiers into a single enterprise employee key)
Business Keys
Business keys (BKs) represent the real-world identifier of an entity and are sourced from operational systems.
Business keys may be:
- Single-column or composite
- Source-specific or enterprise-conformed
The basis of our conformed BK pattern is: source_system|source_identifier in string format
Usage guidance:
- Business keys may be used directly as dimension primary keys (and fact foreign keys) for simple Type 1 dimensions, however this creates inconsistency if other dimensions use surrogate keys for Type 2 tracking.
- Business keys alone are insufficient for Type 2 dimensions because they do not distinguish historical versions of the same entity. Hence the need for Surrogate Keys.
Surrogate Keys
Surrogate keys (SKs) are system-generated identifiers that uniquely represent a specific version of a dimensional entity.
They are required for:
- Type 2 (and higher) SCD dimensions
- Stable fact-to-dimension joins
- Decoupling fact tables from volatile or composite business keys
For Type 2 dimensions, the logical version grain is:
business_key + effective_from_datetime
The surrogate key is the physical primary key representing this grain and is used by fact tables as the foreign key.
Type 2 Fact Resolution
For Type 2 behaviour, facts must resolve the correct dimensional version using an effective-date window.
fact.business_key = dim.business_key
AND fact.event_timestamp >= dim.effective_from_datetime
AND fact.event_timestamp < COALESCE(dim.effective_to_datetime, TIMESTAMP '9999-12-31 00:00:00')
Databricks Surrogate Keys
In Databricks, there are multiple approaches to creating surrogate keys.
IDENTITY
Databricks recommended approach here:
BIGINT GENERATED ALWAYS AS IDENTITY- Sequence- or merge-based incrementing keys
Characteristics:
- High performance for joins and aggregations
- No collision risk
- Environment-specific values (acceptable in most warehouse designs)
Limitations:
- Not compatible with dbt-managed tables
- Concurrent writes to the table are not supported
Other valid approaches (depending on downstream requirements):
Natural key only (BK as PK/FK): valid for Type 1 dimensions; simplest, but typically wider keys and slower joins.
Type 2 composite key (string): store business_key | effective_from_datetime as a single concatenated key; simple and human readable, but increases key width and join cost.
Deterministic hash key: hash (business_key + effective_from_datetime) into a fixed-width value (string or numeric) for smaller keys and cross-environment stability; manage collision risk by choosing an appropriate hash and store BK/effective_from_datetime alongside for traceability.
dbt Surrogate Keys
dbt supports surrogate hash key generation out of the box:
dbt_utils.surrogate_key(...)(MD5 string hash)- Platform-specific numeric hashes (e.g.
xxhash64) cast toBIGINT
Usage guidance
Hash-based SKs should include:
- Business key
effective_from_datetime(for Type 2 dimensions)
Hash-based SKs enable deterministic keys across environments:
- MD5 used by the dbt default macro returns a 32-character hex string. Binary is more performant but it requires tweaking.
- Consider SHA256 if hash collision risk is a concern (it is unlikely for dimensions however)
Key Design Summary
- Numeric (BIGINT) keys offer better join performance
- Prefer numeric SKs for performance (especially with PowerBI); use deterministic SKs when cross-environment stability is required
- Date dimension tip: Use integer with YYYYMMDD format as the surrogate key for optimal performance
- Create keys at the layer that owns their grain and semantics i.e where are business keys resolved and where is Type-2 first defined? In most architectures, this will be the EDW (Silver) zone. However, if Type 2 semantics are applied later (e.g. in Infomart (Gold)), then surrogate keys must be created there instead.
Reference Data
For reference data and master data framework, logical representation, and physical implementation patterns, see Master Data and Reference Data in the Modelling Framework.
Reference data plays a critical role in conformance by providing standardised values that enable mapping of source-specific codes to canonical enterprise definitions.
Logical Modelling
Reference entities contain:
- Natural keys (e.g., Country Code, Product Type Code)
- Code and description attributes
- Optional: effective datetimes, display sequences, parent references (hierarchies), business metadata
- One-to-many relationships to domain entities
Example: Country Code Reference Entity
| country_code | country_name | iso_code_3 | display_sequence | effective_from_datetime | effective_to_datetime |
|---|---|---|---|---|---|
| AU | Australia | AUS | 10 | 2020-01-01 00:00:00 | 9999-12-31 00:00:00 |
| GB | United Kingdom | GBR | 20 | 2020-01-01 00:00:00 | 9999-12-31 00:00:00 |
| US | United States | USA | 30 | 2020-01-01 00:00:00 | 9999-12-31 00:00:00 |
| NZ | New Zealand | NZL | 40 | 2020-01-01 00:00:00 | 9999-12-31 00:00:00 |
| CA | Canada | CAN | 50 | 2020-01-01 00:00:00 | 9999-12-31 00:00:00 |
Note: Additional standard attributes (audit columns: created_timestamp, created_by, modified_timestamp, modified_by; optional: parent_code, version, source_system_id) would be included in the physical implementation.
Physical Implementation
Raw Reference Data sourced from upstream systems may require their own staging and transformation pipelines in order to conform them to standard, preserve change history and capture required metadata.
Cleansed reference tables are stored in EDW (Silver) zone for wide availability following the reference data naming standard:
- Schema naming convention:
edw_ref{optional: __domain name}{optional: __subdomain name(s)} - Object naming convention:
{reference_data_set_name}{optional:__source_system}{optional:__source_channel} -
e.g.:
corporate__dev.edw_ref.account_code -
Effectivity:
effective_from_date,effective_to_date, (is_activeis derivable) - Audit:
created_datetime,created_by,updated_datetime,updated_by - Hierarchy:
parent_code - Business:
description - Technical:
version,source_system_id
Usage:
- Mapping logic is applied in EDW (Silver) zone staging models during transformation for domain/enterprise-wide application.
- Consumption: Post-mapped data are exposed in marts in EDW or indirectly in Infomart (Gold) (having passed through EDW).
- As dimension attributes: Reference values embedded directly in dimension tables (e.g., Product Type Code/Description in Product dimension) for filtering and grouping.
Change tracking:
- Implement Type 1, 2, 4, or 6 slowly changing dimension strategies based on business requirements for point-in-time accuracy. This is especially important when reference data changes frequently or has many attributes.
- Consider using mini-dimensions/outriggers—separate dimension tables linked via foreign keys—to efficiently track history without excessive row growth in the main dimension.
Recommended Practices:
- Store codes and descriptions in fact tables only when necessary for performance
- Prefer dimension lookups to maintain single source of truth
Standard Suffix and Prefix Inventory
Identity & Keys
| Suffix | Meaning | Logical/Conceptual | Physical |
|---|---|---|---|
ID |
Primary business identifier | Customer ID |
customer_id |
Key |
Warehouse dimension key (dimensional models only) | Customer Key |
customer_key |
Number |
Business reference number | Invoice Number |
invoice_number |
Code |
Coded business value | Diagnosis Code |
diagnosis_code |
Reference |
External reference | External Reference |
external_reference |
Identifier |
Explicit identifier (potentially many identifiers) | National Identifier |
national_identifier |
Temporal Concepts
| Suffix | Meaning | Logical/Conceptual | Physical |
|---|---|---|---|
Date |
Calendar date only | Admission Date |
admission_date |
DateTime |
Timestamp (assumed UTC) | Order DateTime |
order_datetime |
DateTime <Timezone> |
Timestamp in explicit timezone | Order DateTime AEST |
order_datetime_aest |
Time |
Time only | Appointment Time |
appointment_time |
From Date |
Validity start | Policy From Date |
policy_from_date |
To Date |
Validity end | Policy To Date |
policy_to_date |
Example:
- Business Name:
Order DateTime - Physical Name (UTC, implied):
order_datetime - Physical Name (Local timezone explicit):
order_datetime_aest
State & Classification
| Suffix | Meaning | Logical/Conceptual | Physical |
|---|---|---|---|
Status |
Lifecycle state | Order Status |
order_status |
Type |
Classification | Customer Type |
customer_type |
Category |
Grouping | Product Category |
product_category |
Class |
Structural grouping | Asset Class |
asset_class |
Quantities & Measures
| Suffix | Meaning | Logical/Conceptual | Physical |
|---|---|---|---|
Count |
Quantity | Item Count |
item_count |
Amount |
Monetary value | Invoice Amount |
invoice_amount |
Value |
General numeric | Score Value |
score_value |
Rate |
Rate | Interest Rate |
interest_rate |
Ratio |
Proportion | Utilisation Ratio |
utilisation_ratio |
Percentage |
Percentage | Discount Percentage |
discount_percentage |
Boolean Indicators (Prefix Convention)
| Prefix | Meaning | Logical/Conceptual | Physical |
|---|---|---|---|
Is |
State check | Is Active |
is_active |
Has |
Ownership | Has Consent |
has_consent |
Can |
Capability | Can Transact |
can_transact |
Audit & Control
| Suffix | Meaning | Logical/Conceptual | Physical |
|---|---|---|---|
Created DateTime |
Creation timestamp | Created DateTime |
created_datetime |
Updated DateTime |
Last update timestamp | Updated DateTime |
updated_datetime |
Deleted DateTime |
Soft deletion | Deleted DateTime |
deleted_datetime |
Effective From Date |
Valid from date | Effective From Date |
effective_from_date |
Effective To Date |
Valid to date | Effective To Date |
effective_to_date |
Summary of Naming Rules
Suffixes encode meaning, not technical data type or implementation details.
| Rule | Logical/Conceptual | Physical |
|---|---|---|
| Primary business identifier | Customer ID |
customer_id |
| Warehouse dimension key | Customer Key |
customer_key |
| Business reference | Invoice Number, Diagnosis Code |
invoice_number, diagnosis_code |
| Temporal meaning | Order DateTime (assumed UTC) |
order_datetime |
| Classification | Order Status, Customer Type |
order_status, customer_type |
| Measures | Invoice Amount, Interest Rate |
invoice_amount, interest_rate |
| Boolean indicators | Is Active, Has Consent |
is_active, has_consent |