Standards and Conventions

Return to home

These standards are opinionated and designed to ensure consistency, governance, and automation across an organisation. They largely reflect an Azure Databricks environment, however can be adapted to other platforms. Organisations should adapt these standards to fit their existing internal conventions.

Mesh

Domain Names

All lower case: {optional:organisation_}{functional area/domain}_{subdomain}

e.g: intuitas_corporate

Platform

Environment

Environment name: dev/test/prod/sandbox/poc (pat - production acceptance testing is optional as preferred)

VNET

Name: vn-{organisation_name}-{domain_name}
e.g: vn-intuitas-corporate

Resource Groups

Name: rg-{organisation_name}-{domain_name}
e.g: rg-intuitas-corporate

Databricks workspace

Name: ws-{organisation_name}-{domain_name}
e.g: ws-intuitas-corporate

Key vault

Name: kv-{organisation_name}-{domain_name}
e.g: kv-intuitas-corporate

Secrets

Name: {secret_name}

Entra Group Names

Name: eg-{organisation_name}-{domain_name}
e.g: eg-intuitas-corporate

Azure Data Factory (ADF)

Name: adf-{organisation_name}-{domain_name}
e.g: adf-intuitas-corporate

SQL Server

Name: sql-{organisation_name}-{domain_name}
e.g: sql-intuitas-corporate

SQL Database

Name: sqldb-{purpose}-{organisation_name}-{domain_name}-{optional:environment}
e.g: sqldb-metadata-intuitas-corporate

Storage

The section describes naming standards and conventions for cloud storage resources.

Lakehouse storage

Lakehouse storage account name: dl{organisation_name}{domain_name}

Lakehouse storage containers

Name: {environment} (dev/test/preprod/prod)

Lakehouse storage folders

Level 1 Name: {zone} (raw/edw/infomart) // (bronze/silver/gold)` if using medallion approach
Level 2 Name: {layer_name}
e.g:
raw/landing
raw/ods
raw/pds
raw/schema (for autoloader metadata)
raw/checkpoints (for autoloader metadata)
edw/automatically determined by unity catalog
infomart/automatically determined by unity catalog

Generic Blob storage

Generic Blob storage can be used for all non-lakehouse data; or alternatively within the lakehouse storage account in the appropriate container and folder.

Resource: ADLSGen2
Generic storage account name: sa{organisation_name}{domain_name}{functional_description}
Tier: Standard/Premium (depends on workload)
Redundancy:
Minimum ZRS or GRS for prod
Minimum LRS for poc, dev, test and preprod

Generic Blob files and folders

No standard naming conventions for files and folders.

Databricks

This section provides naming standards and conventions for Databricks.

Workspace and cluster names

Workspace name: ws-{organisation_name}_{domain_name}
Cluster name: {personal_name/shared_name} Cluster
Workflow name: {dev/test} {workflow_name}

Jobs and Pipelines

Job names

Job names: {domain}__{zone}__{purpose}__{source}{optional: __target}{optional: __schedule}{optional: __version}__{env}
e.g. clinical__raw__ingest__fhircdr__dev

For Delta Live Table (DLT) Pipelines

Pipeline names: {domain_name}__{zone}__pipeline__{dataset}{optional: __schedule}{optional: __version}__{env}
e.g. clinical__raw__pipeline__fhircdr__dev
e.g. supplychain__infomart__pipeline__inventorymart__prod

Note on zone terminology: - This framework uses raw/edw/infomart terminology (aligns with enterprise data warehousing concepts) - Databricks documentation uses bronze/silver/gold terminology (medallion architecture) - Both are valid; choose one and use consistently: - raw = bronze (raw ingested data) - edw = silver (cleaned, conformed data) - infomart = gold (business-ready aggregated data)

Note on {zone}: If the 'business outcome' is Infomart, you call it Infomart, even if it produces Raw + EDW on the way. i.e "This is the production DLT pipeline in the supply chain domain, which builds and maintains the curated Infomart zone dataset called Inventory Mart"
Include pipeline so it’s distinguishable from ad hoc jobs.
Dataset can be a logical grouping (e.g., patient, encounter, claims).

Orchestration job names

Orchestration job names: {domain}__orchestration__{workflow-name}{optional: __schedule}{optional: __version}__{env}
e.g. clinical__orchestrate__fhirworkflow__daily__dev

which then orchestrates:

clinical__raw__pipeline__fhircdr__prod (or clinical__bronze__pipeline__fhircdr__prod if using medallion terminology)
clinical__edw__pipeline__fhirclean__prod (or clinical__silver__pipeline__fhirclean__prod if using medallion terminology)
clinical__infomart__pipeline__clinicalmart__prod (or clinical__gold__pipeline__clinicalmart__prod if using medallion terminology)

Job logging

event log catalog: {domain}__audit__{env}
event log schema: audit__event_log

Optional

Versioning (if needed): add v1, v2 if a job is redesigned but old one stays around.
Scheduling frequency (optional): suffix with _hourly, _daily, _weekly if relevant.

Catalog naming and conventions

Refer to Data zones and layers for further context and definitions applicable to this section.

Catalog name:

The choice of granularity depends on domain topology, zone and layer conventions, and desired level of segregation for access and sharing controls (i.e. catalog or layer level)

Minimum granularity (domain level): {domain_name}{__environment (dev/test/pat/prod)} (prod is implied optional) e.g: corporate__dev
Optional granularity (domain-zone level): {domain_name}{_zone: (raw/edw/infomart)}{__environment (dev/test/pat/prod)} e.g: corporate_raw__dev
Optional granularity (domain-zone and layer level): {domain_name}{_zone: (raw/edw/infomart)}{_layer: (ods/pds/stg/ref/mart)}{__environment (dev/test/pat/prod)} e.g: corporate_raw_ods__dev
Optional granularity (domain-layer level): {domain_name}{_layer: (ods/pds/stg/ref/mart)}{__environment (dev/test/pat/prod)} e.g: corporate_ods__dev
Optional granularity (subdomain-zone level): {domain_name}{_descriptor (subdomain/subject/project*)}{_zone: (raw/edw/infomart)}{__environment (dev/test/pat/prod)} e.g: corporate_finance_raw__dev

In the examples provided - we have opted for domain level - with schema separation for the lower levels of grain via prefixes. i.e engineering__dev.ods__fhirhouse__dbo__lakeflow.encounter

Note that projects are temporary constructs, and hence are not recommended for naming

Catalog storage root: abfss://{environment}@dl{organisation_name}{domain_name}.dfs.core.windows.net/{domain_name}_{environment}_catalog

Externally mounted (lakehouse federation) Catalog Names

Foreign Catalog name: {domain_name (owner)}__fc__{source_system}{optional:__other_useful_descriptors e.g:__environment}
e.g: corporate__fc__sqlonpremsource
e.g: corporate__fc__sqlonpremsource__prod

Catalog Metadata tags:

The following metadata should be added when creating a catalog:

Key: domain (owner): {domain_name}
Key: environment: {environment}
Key: managing_domain: {domain_name} e.g: if delegating to engineering domain

Schema and object conventions

Refer to Data zones and layers for further context and definitions applicable to this section.

Refer to Physical Models for the standard relating to column naming, types, and conventions.

Terminology Hierarchy:

Data is organized in a three-level hierarchy: - Zone = Top-level organizational boundary (Raw, EDW, Infomart) - Within each zone, data flows through... - Layer = Processing layers within a zone (landing, ods, pds, stg, ref, mart, etc.) - Each layer typically maps to... - Schema = Unity Catalog schema that stores objects (ods, pds, edw_stg, edw_ref, edw, im_stg, im)

Example: The Raw zone contains the ODS layer which maps to the ods schema.

Schema level external storage locations

Recommendations:

For managed tables (default): do nothing. Let dbt create schemas without additional configuration. Databricks will manage storage and metadata.Objects will then be stored in the catalog storage root. e.g: abfss://dev@dlintuitas{domain}.dfs.core.windows.net/engineering__dev_catalog/__unitystorage/catalogs/catalog-guid/tables/object-guid
For granular control over schema-level storage locations: Pre-create schemas with LOCATION mapped to external paths or configure the catalog-level location.
Ensure dbt's dbt_project.yml and environment variables align with storage locations.

Metadata Schemas and Objects

Refer to Data zones and layers for further context and definitions applicable to this section.

Contains metadata that supports engineering and governance. This will vary depending on engineering and governance toolsets

Engineering - ingestion framework:

Schema naming convention: meta {optional: __function}
Naming convention: {function/descriptor}
e.g: corporate__dev.meta.ingestion_control
e.g: corporate__dev.meta__ingestion.ingestion_control

Raw (aka Bronze) (Data according to systems)

The Raw zone stores raw, immutable data as it is ingested from source systems. See Data zones and layers for definitions and context.

The Raw zone uses two primary schemas: - ods - Operational Data Store (current state) - pds - Persistent Data Store (historical snapshots)

In the examples provided - we have opted for domain level catalogs - with schema separation for the lower levels of grain via prefixes. i.e engineering__dev.ods__fhirhouse__dbo__lakeflow.encounter

Note: Using consistent double underscore (__) delimiters throughout ensures compatibility with Databricks and avoids parsing issues. Parentheses notation (e.g., dbo(adf)) should be avoided in schema and object names.

Persistent Landing:

Persistent Landing uses Unity Catalog Volumes for storing raw files and unstructured data as they arrive from source systems.

Volume naming convention: [domain][__env].[layer][__source_system][__source_schema][__channel].[source_object/volume_name]

[domain][__env]: Domain and environment identifier (e.g., corporate_dev, engineering_prod)
[layer]: Data layer identifier within the Raw zone (e.g., landing)
[__source_system]: Source system identifier (e.g., workdayapi, saphr)
[__source_schema]: Optional source schema or subsystem identifier
[__channel]: Optional ingestion channel/method (e.g., __adf, __fivetran, __api)
[source_object/volume_name]: Descriptive volume name or source object identifier
e.g: corporate__dev.landing__workdayapi.schedule_volume
e.g: engineering__prod.landing__fhirapi__patients.raw_data

Operational Data Store (ODS):

The objective of raw zone conventions is to provide clarity over which zone, layer, and schema it belongs to, what the data relates to, where it was sourced from, and via what channel it arrived (as there may be nuances in data depending on its channel).

ODS can be replicated from source systems, or prepared for use from semi/unstructured data via hard-transformation and hence will have these associated conventions:

Database replicated ODS (structured sources like SQL Server):

Format: [domain][__env].[layer][__source_system][__source_schema][__channel].[source_object/volume_name]

[domain][__env]: Domain and environment (e.g., clinical__dev)
[layer]: Data layer within the Raw zone (e.g., ods)
[__source_system]: Source system identifier
[__source_schema]: Source schema (if applicable, e.g., __dbo, __reporting)
[__channel]: Optional ingestion channel (e.g., __adf, __fivetran, __lakeflow)
[source_object/volume_name]: Table name as per source
e.g: clinical__dev.ods__patientflowmanager01__reporting.patients
e.g: clinical__dev.ods__fhirhouse__dbo__adf.encounter

Prepped semi/unstructured ODS data:

Format: [domain][__env].[layer][__source_system][__source_descriptor][__channel].[source_object/volume_name]

[domain][__env]: Domain and environment (e.g., clinical__dev)
[layer]: Data layer within the Raw zone (e.g., ods)
[__source_system]: Source system identifier
[__source_descriptor]: Source descriptor or subsystem identifier
[__channel]: Optional ingestion channel (e.g., __kafka, __databricks, __api)
[source_object/volume_name]: Named as per source or unique assigned name (e.g., topic/folder name)
e.g: clinical__dev.ods__ambosim__confluent__kafka.encounter
e.g: corporate__dev.ods__workdayapi__employees.raw_data

Persistent Data Store (PDS):

PDS conventions will mirror ODS conventions:

Database replicated PDS (structured sources like SQL Server):

Format: [domain][__env].[layer][__source_system][__source_schema][__channel].[source_object/volume_name]

[domain][__env]: Domain and environment (e.g., clinical__dev)
[layer]: Data layer within the Raw zone (e.g., pds)
[__source_system]: Source system identifier
[__source_schema]: Source schema (if applicable, e.g., __dbo, __reporting)
[__channel]: Optional ingestion channel (e.g., __adf, __fivetran, __lakeflow)
[source_object/volume_name]: Table name as per source
e.g: clinical__dev.pds__patientflowmanager01__reporting.patients
e.g: clinical__dev.pds__fhirhouse__dbo__adf.encounter

Prepped semi/unstructured PDS data:

Format: [domain][__env].[layer][__source_system][__source_descriptor][__channel].[source_object/volume_name]

[domain][__env]: Domain and environment (e.g., clinical__dev)
[layer]: Data layer within the Raw zone (e.g., pds)
[__source_system]: Source system identifier
[__source_descriptor]: Source descriptor or subsystem identifier
[__channel]: Optional ingestion channel (e.g., __kafka, __databricks, __api)
[source_object/volume_name]: Named as per source or unique assigned name (e.g., topic/folder name)
e.g: clinical__dev.pds__ambosim__confluent__kafka.encounter
e.g: corporate__dev.pds__workdayapi__employees.raw_data

EDW (aka Silver) (Data according to business entities)

The EDW (Enterprise Data Warehouse) zone focuses on transforming raw data into cleaned, enriched, and validated datasets that are the building blocks for downstream consumption and analysis.

Refer to Data zones and layers for further context and definitions applicable to this section.

The EDW zone uses three primary schemas: - edw_stg - Source-centric and entity-centric staging objects (transformations, cleaning, normalization, integration) - edw_ref - Reference data (curated, conformed reference entities and code sets intended for broad reuse) - edw - Base marts (full-featured dimensions with SCD logic, facts - curated entities)

These marts are objects that are aligned to business entities and broad requirements, hence they must contain source-specific objects at the lowest grain. There may be further enrichment and joins applied across sources.

In the examples provided - we have opted for domain level catalogs - with schema separation for the lower levels of grain via prefixes. i.e engineering__dev.edw.dim_customer

All schemas may be optionally prefixed with additional descriptors if not already decomposed at domain-level i.e. edw_stg__, edw_ref__, edw__
All entity names which align to facts should be named in plural.
All entity names which align to dims should be named in singular.

EDW Staging Objects: Staging models serve as intermediary models that transform source data into EDW models. According to dbt best practices, there is a distinction between Staging and Intermediate models. Under this blueprint the use of Intermediate models is optional. Reference

Two Complementary Staging Approaches:

EDW staging supports two patterns that work together:

Source-Centric Staging - Organized by source system:
Cleans and normalizes data from each source independently
May feed data quality test results to reports for actioning
Prepares reference data for broad use and maps it for mart conformance
Creates mappings of keys to resolve system keys to universal surrogate or business keys
Entity-Centric Staging - Organized by business entity:
Integrates data across multiple source-centric staging models
Applies cross-source deduplication and conflict resolution
Establishes unified entity keys and relationships
Creates a single, conformed view of each business entity

Typical Flow:

Raw Sources → Source-Centric Staging → Entity-Centric Staging → Base Marts (Dimensions/Facts)

Source-Centric Staging Approach:

Staging objects are organized around source systems and source objects, cleaning and normalizing data according to expectations/standards. This approach: - Cleans and normalizes data from each source independently - May feed data quality test results to reports for actioning - Prepares reference data for broad use and maps it for mart conformance - Creates mappings of keys to resolve and map system keys to universal surrogate keys or business keys, which can then be reused downstream for integration

Naming Convention:

[domain][__env] . [schema] . stg__[source_system](optional__[source_channel])__[source_object](optional__[other_descriptors])__[id]__[transformation_descriptors]

Where:

[domain][__env] - Domain and environment (e.g., corporate__prod, corporate__dev)
[schema] - EDW schema (edw_stg for staging, edw_ref for reference staging)
stg__ - Staging object prefix
[source_system] - Source system identifier (e.g., sap, workday, finance_system)
(optional__[source_channel]) - Optional source channel (e.g., __adf, __api)
[source_object] - Source object/table name
(optional__[other_descriptors]) - Optional additional descriptors
[id] - Unique identifier suffix
[transformation_descriptors] - Numbered transformation step(s)

Standard Transformation Steps (optional, applied as needed):

00_keyed - Key assignment/generation
01_renamed_and_typed - Column renaming and type casting
02_deduped - Deduplication
03_cleaned - Data cleaning
04_filtered or 04_split - Filtering or splitting data
05_column_selected - Column selection
06_business_validated - Business rule validation
07_desensitised - Sensitive data masking/removal

Reference Data Staging:

Reference data follows the same two-stage approach as marts when coming from multiple sources.

Use Entity-Centric Staging when: - Multiple source systems provide the same business entity/code set, and you need a single conformed view - You need cross-source mapping, hierarchy standardisation, or conflict resolution (e.g., code collisions, different naming, different rollups)

Source-Centric Staging (with transformations in edw_stg schema):

Cleans individual source system reference data:

e.g: corporate__prod.edw_stg.stg__sap__facilities__01_renamed_and_typed
e.g: corporate__prod.edw_stg.stg__sap__facilities__02_cleaned
e.g: corporate__prod.edw_stg.stg__finance_system__adf__account_codes__01_renamed_and_typed
e.g: corporate__prod.edw_stg.stg__workday__locations__01_renamed_and_typed
e.g: corporate__prod.edw_stg.stg__workday__locations__02_deduped

Entity-Centric Staging (when unifying across sources in edw_stg schema):

Integrates reference data from multiple sources:

e.g: corporate__prod.edw_stg.stg__facility__01_unified (combines SAP and Workday facilities)
e.g: corporate__prod.edw_stg.stg__facility__02_mapped (standardizes codes/hierarchies)
e.g: corporate__prod.edw_stg.stg__account_code__01_unified (combines old/new finance systems)
e.g: corporate__prod.edw_stg.stg__location__01_unified (integrates location references)

Final Reference Data (cleaned and conformed in edw_ref schema):

Reference data in edw_ref is materialized as curated, conformed, reusable reference entities and code sets.

Typically, edw_ref models: - Are keyed by natural/business keys (and may also include stable internal IDs where useful) - Are lightweight and broadly reusable (avoid mart-specific denormalisation and calculations) - Are often current-state, but may include effective dating / limited historisation where the reference itself changes over time (e.g., organisational hierarchies)

e.g: corporate__prod.edw_ref.ref_facility (facility_id, facility_name, facility_code, location_key - simple key-value)
e.g: corporate__prod.edw_ref.ref_location (location_id, location_name, country_code - basic attributes)
e.g: corporate__prod.edw_ref.ref_account_code (account_code, account_name, hierarchy_level - minimal structure)
e.g: corporate__prod.edw_ref.ref_country_codes (country_code, country_name - simple lookup)

Note on Reference Data vs Dimensions:

Reference data in edw_ref serves as a foundation and can be shaped into dimensions in the edw (marts) schema:

edw_ref.ref_* - Conformed reference entities/code sets intended for broad reuse (generally natural/business keys, minimal mart-specific logic)
edw.dim_* - Analytics-facing dimensional models that may add:
Surrogate keys (and persisted natural/business keys as attributes)
SCD Type 1 and/or Type 2 behaviour as required (e.g., valid_from, valid_to, is_current)
Denormalisation, role-playing dimensions, and mart-specific hierarchies/business rules

Examples of the relationship:

edw_ref.ref_facility (conformed reference) → edw.dim_facility (often Type 1 if facility attributes are treated as current-state)
edw_ref.ref_account_code (conformed codes/hierarchy) → edw.dim_account (often Type 2 if names/rollups change over time)
edw_ref.ref_product_category (conformed categories) → edw.dim_product (often Type 2 if classifications change over time)

Base Mart (EDW) Staging:

EDW staging can follow two complementary patterns:

Source-Centric Staging: Cleans and transforms data from individual sources
Entity-Centric Staging: Integrates and unifies data across multiple sources for a business entity

Source-Centric Staging (with transformations in edw_stg schema):

Focused on cleaning individual source systems:

e.g: corporate__prod.edw_stg.stg__sap__employees__01_renamed_and_typed
e.g: corporate__prod.edw_stg.stg__workday__employees__01_renamed_and_typed
e.g: corporate__prod.edw_stg.stg__workday__employees__02_cleaned
e.g: corporate__prod.edw_stg.stg__crm__adf__customers__01_renamed_and_typed
e.g: corporate__prod.edw_stg.stg__crm__adf__customers__02_cleaned
e.g: corporate__prod.edw_stg.stg__new_finance_system__adf__payments__01_renamed_and_typed
e.g: corporate__prod.edw_stg.stg__old_finance_system__adf__payments__01_renamed_and_typed
e.g: corporate__prod.edw_stg.stg__finance_system__api__accounts__01_renamed_and_typed

Entity-Centric Staging (with transformations in edw_stg schema):

Focused on integrating and unifying data across sources for a specific business entity. These models: - Union, join, or merge data from multiple source-centric staging models - Apply cross-source deduplication and conflict resolution - Establish entity-level keys and relationships - Prepare unified entity views for final base marts

Naming Convention for Entity-Centric Staging:

[domain][__env] . [schema] . stg__[entity]__[transformation_descriptors]

Examples:

e.g: corporate__prod.edw_stg.stg__employee__01_unified (unions SAP and Workday employees)
e.g: corporate__prod.edw_stg.stg__employee__02_deduped (resolves duplicates across sources)
e.g: corporate__prod.edw_stg.stg__employee__03_keyed (applies unified business keys)
e.g: corporate__prod.edw_stg.stg__customer__01_unified (merges multiple CRM sources)
e.g: corporate__prod.edw_stg.stg__customer__02_conflict_resolved (resolves conflicting attributes)
e.g: corporate__prod.edw_stg.stg__payment__01_unified (combines old and new finance systems)
e.g: health__prod.edw_stg.stg__patient__01_unified (integrates patient data across clinical systems)

These entity-centric staging models then feed into the final base marts (dimensions and facts).

Base Mart Keysets:

Keysets are created in the edw schema to resolve and map system keys to universal surrogate keys or business keys. These can be reused downstream for integration and conforming/deduplicating entities across multiple sources:

e.g: corporate__prod.edw.keys__employee (conforms sap and workday employee keys)
e.g: corporate__prod.edw.keys__customer (conforms multiple crm source keys)
e.g: corporate__prod.edw.keys__account (conforms finance system keys)
e.g: health__prod.edw.keys__patient (conforms clinical system keys)

Base Marts (Final Dimensional Models):

Base marts represent full-featured dimensional models with SCD logic, business rules, and complete attributes. These may be built from entity-centric staging, source-centric staging, or by shaping reference data from edw_ref.

Source-specific dimensions (preserving source identity):

e.g: corporate__prod.edw.dim_employee__sap
e.g: corporate__prod.edw.dim_employee__workday
e.g: corporate__prod.edw.dim_account__old_finance_system
e.g: corporate__prod.edw.dim_account__new_finance_system

Conformed dimensions (unified, non-source specific):

e.g: corporate__prod.edw.dim_employee (Type 1 SCD - current state only)
e.g: corporate__prod.edw.dim_employee__type2 (Type 2 SCD - with historical tracking)
e.g: corporate__prod.edw.dim_customer
e.g: corporate__prod.edw.dim_account (unified across finance systems)

Dimensions built from reference data (shaped with SCD logic):

e.g: corporate__prod.edw.dim_facility (built from edw_ref.ref_facility, adds Type 1 or Type 2 SCD logic)
e.g: corporate__prod.edw.dim_product_category (built from edw_ref.ref_product_category, with hierarchies)
e.g: corporate__prod.edw.dim_location (built from edw_ref.ref_location, with geographic hierarchies)

Fact tables:

e.g: corporate__prod.edw.fact_payment
e.g: corporate__prod.edw.fact_transaction
e.g: health__prod.edw.fact_encounter

Common Transformation Suffixes:

__01_renamed or __01_typed
__02_cleaned
__03_deduped
__04_filtered
__05_split
__06_validated
__07_desensitised

Additional Notes:

For further context on reference data modeling, refer to Reference Data Standards and Conventions.

Raw Vault: Optional warehousing construct.

Schema naming convention: edw_rv
Object naming convention: {vault object named as per data vault standards}
e.g: corporate__dev.edw_rv.hs_payments__finance_system__adf

Business Vault: Optional warehousing construct.

Schema naming convention: edw_bv
Object naming convention: {vault object named as per data vault standards}
e.g: corporate__dev.edw_bv.hs_late_payments__finance_system__adf

Infomart (aka Gold) (Data according to requirements)

The Infomart zone focuses on requirement-aligned products (datasets, aggregations, and reporting structures). Products are predominantly source agnostic, however optionality exists when source-specific views are needed.

Refer to Data zones and layers for further context and definitions applicable to this section.

The Infomart zone uses two primary schemas: - im_stg - Product mart staging (intermediate transformations) - im - Information marts (final business-aligned products)

Naming Convention:

[domain][__env] . [schema] . [object_type][product_name/descriptor](__source)(__transformation)

Where:

[domain][__env] - Domain and environment (e.g., corporate__prod, health__dev)
[schema] - Infomart zone schema (im_stg for staging, im for final products)
[object_type] - Object type prefix (e.g., stg_, fact_, dim_, obt_)
[product_name/descriptor] - Product or business-aligned name
(__source) - Optional source system identifier (only when source-specific view is needed)
(__transformation) - Optional transformation step (for staging only)

Naming Guidelines:

All entity names which align to facts should be named in plural
All entity names which align to dimensions should be named in singular
Product marts are business/requirement-aligned, not technical constructs

Infomart Staging Models:

Staging models serve as intermediary models that transform base marts (EDW) into requirement-aligned product marts. According to dbt best practices, there is a distinction between Staging and Intermediate models. Under this blueprint the use of Intermediate models is optional. Reference

These models exist in the im_stg schema to stage information marts with business-specific transformations:

Common Transformations:

Pivoting
Aggregation
Joining across multiple base marts
Business rule application
Desensitization
Complex calculations

Examples:

e.g: corporate__prod.im_stg.stg_downtime_by_region__01_pivoted
e.g: corporate__prod.im_stg.stg_downtime_by_region__02_desensitised
e.g: corporate__prod.im_stg.stg_late_payments__01_aggregated
e.g: corporate__prod.im_stg.stg_late_payments__02_joined
e.g: corporate__prod.im_stg.stg_fte_calculations__01_pivoted
e.g: corporate__prod.im_stg.stg_fte_calculations__02_aggregated
e.g: health__prod.im_stg.stg_patient_outcomes__01_joined
e.g: health__prod.im_stg.stg_patient_outcomes__02_desensitised

Information Marts (Final Products):

Final business-aligned data products ready for consumption by analytics, reporting, and BI tools in the im schema.

Examples:

Fact tables (aggregated/business-focused):

e.g: corporate__prod.im.fact_downtime_by_region
e.g: corporate__prod.im.fact_late_payments
e.g: corporate__prod.im.fact_regional_account_payments
e.g: health__prod.im.fact_patient_outcomes

Dimension tables (product-specific):

e.g: corporate__prod.im.dim_region
e.g: corporate__prod.im.dim_payment_category

One Big Tables (OBT) - Denormalized aggregates:

e.g: corporate__prod.im.obt_fte_calculations
e.g: corporate__prod.im.obt_financial_summary
e.g: corporate__prod.im.obt_executive_dashboard

Source-specific products (when needed):

e.g: corporate__prod.im.fact_account_payments__old_finance_system
e.g: corporate__prod.im.fact_account_payments__new_finance_system
e.g: corporate__prod.im.obt_employee_metrics__sap
e.g: corporate__prod.im.obt_employee_metrics__workday

Share names: {domain_name}{optional:subdomain_name}{optional:purpose}{schema_name or description}{object_name or description}__{share_purpose and or target_audience} e.g: intuitas_corporate__finance__reporting__account_payments__payments

Azure Data Factory

Linked service names: ls_{database_name}(if not in database_name:{organisation_name}{domain_name}) e.g: ls_financedb_intuitas_corporate
Dataset names: ds_{database_name}_{object_name}
Pipeline names: pl_{description: e.g copy_{source_name}to{destination_name}}
Trigger names: tr_{pipeline_name}_{optional:start_time / frequency}

Streaming

Cluster name: {domain_name}__cluster__{optional:environment}
Topic names: {domain_name}__{object/entity?}__{optional:source_system}___{optional:source_channel}__{optional:environment}
Consumer group names: {domain_name}__{unique_group_name}__{optional:environment}

dbt

The following standards and conventions relate to dbt projects.

Documentation and model metadata

Within each respective model folder (as needed)

md: _{path to model folder using _ separators}__docs.md
e.g: models/edw/ambo_sim__kafka__local/_edw__ambo_sim__kafka__local__docs.md
model yml: _{path to model folder using _ separators}__models.yml
e.g: models/edw/ambo_sim__kafka__local/_edw__ambo_sim__kafka__local__models.yml

Sources

Folder: models/sources/{raw/edw/infomart}
yml: {schema}__sources.yml (one for each source schema)
e.g: ods__ambo_sim__kafka__sources.yml

Model and Folder Names

dbt model names are verbose (inclusive of schema and domain) to ensure global uniqueness and better traceability to folders. Actual object names should be aliased to match object naming standards.

Raw

Raw zone contains operational data store (ODS) for current state and persistent data store (PDS) for historical tracking

ODS (Operational Data Store) - Current State: - Folder: seeds/{optional: domain name}{optional: __subdomain name(s)}/ → materializes to [domain__env].ods - Folder: sources/raw/{optional: domain name}{optional: __subdomain name(s)}/ → references external sources to [domain__env].ods

Seed Naming Convention:

Seeds (CSV files) should follow a descriptive naming pattern and materialize to the ODS schema: - ref_[entity]_[descriptor].csv for reference/lookup data - config_[descriptor].csv for configuration data - e.g: ref_country_codes.csv → corporate__dev.ods.ref_country_codes - e.g: ref_currency_rates.csv → corporate__dev.ods.ref_currency_rates - e.g: config_holidays.csv → corporate__dev.ods.config_holidays

PDS (Persistent Data Store) - Historical Tracking: - Folder: models/raw/{optional: domain name}{optional: __subdomain name(s)}/pds/ → materializes to [domain__env].pds - Models in PDS capture type 2 SCD (slowly changing dimensions) or snapshots of ODS data

Type 2 Snapshot Model Examples: - e.g: models/raw/pds/ref_country_codes_snapshot.sql → corporate__dev.pds.ref_country_codes_snapshot - e.g: models/raw/pds/ref_currency_rates_snapshot.sql → corporate__dev.pds.ref_currency_rates_snapshot - e.g: models/raw/pds/config_holidays_snapshot.sql → corporate__dev.pds.config_holidays_snapshot

These models typically reference their corresponding seeds from ODS and add historical tracking columns (valid_from, valid_to, is_current, etc.)

EDW

EDW staging follows two complementary patterns:

Source-Centric Staging: Organized by source system, cleaning individual sources
Entity-Centric Staging: Organized by business entity, integrating across sources

Staging Source-Centric:

Focused on cleaning and transforming data from individual source systems:

Folder: models/edw/{optional: domain name}{optional: __subdomain name(s)}/source/{source_system}{optional:__{source_channel}}/stg/
Models: {optional: domain name} {optional: __subdomain name(s)} {optional:__edw__} stg__{source_system} __{source_channel} __{source_object} __{ordinal}_{transformation description}

   *e.g:*

      - *edw\source\new_finance_system__adf\stg\corporate__edw__stg__new_finance_system__adf__accounts__01_renamed_and_typed.sql*
      - or *edw\source\new_finance_system__adf\stg\stg__new_finance_system__adf__accounts__01_renamed_and_typed.sql*
      - materialises to: *corporate__dev.edw_stg.stg__new_finance_system__adf__accounts__01_renamed_and_typed*

Staging Entity-Centric:

Focused on integrating and unifying data across multiple sources for a specific business entity:

Folder: models/edw/{optional: domain name}{optional: __subdomain name(s)}/mart/{entity}/stg/
Models: {optional: domain name} {optional: __subdomain name(s)} {optional:__edw__} stg__{entity} __{ordinal}_{transformation description}

These models: - Union, join, or merge data from multiple source-centric staging models - Apply cross-source deduplication and conflict resolution - Establish unified entity keys

   *e.g:*

      - *edw\mart\employee\stg\stg__employee__01_unified.sql* (unions SAP and Workday)
      - *edw\mart\employee\stg\stg__employee__02_deduped.sql* (resolves duplicates)
      - *edw\mart\employee\stg\stg__employee__03_keyed.sql* (applies unified keys)
      - materialises to: *corporate__dev.edw_stg.stg__employee__01_unified*

Mart (Entity-Centric Base Marts):

Base marts are organized by business entity. Entity-centric staging (if needed) lives in a nested stg/ folder within each entity folder.

Folder: models/edw/{optional: domain name}{optional: __subdomain name(s)}/mart/{entity}/
Entity-centric staging subfolder: models/edw/.../mart/{entity}/stg/
Models: {optional: domain name} {optional: __subdomain name(s)} {optional:__edw__} {dim_/fact_}{__entity /_object_description}{optional:__{source_system}__{source_channel}}

Source-specific marts (when preserving source identity):

Models: {optional:__edw__} dim_{entity}__{source_system}
e.g: dim_employee__sap.sql, dim_employee__workday.sql

Non-source specific (unified) marts:

Models: {optional:__edw__} dim_{entity} or fact_{entity_plural}
e.g: dim_employee.sql, fact_payments.sql

Reference Data:

Reference data follows the same pattern as marts - organized by reference entity with optional entity-centric staging.

Folder: models/edw/{optional: domain name}{optional: __subdomain name(s)}/ref/{reference_entity}/
Entity-centric staging subfolder: models/edw/.../ref/{reference_entity}/stg/ (when unifying multiple sources)
Models: {optional: domain name} {optional: __subdomain name(s)} {optional:__edw__} ref_{__reference data set name} {optional:__{source_system}__{source_channel}}

Examples: - e.g: models/edw/ref/facility/ref_facility.sql → corporate__dev.edw_ref.ref_facility - e.g: models/edw/ref/facility/stg/stg__facility__01_unified.sql → corporate__dev.edw_stg.stg__facility__01_unified (combines SAP + Workday) - e.g: models/edw/ref/account_code/ref_account_code.sql → corporate__dev.edw_ref.ref_account_code

Raw Vault:

Folder: models/edw/{optional: domain name}{optional: __subdomain name(s)}/edw/rv
Models: edw_rv__{vault object named as per data vault standards}

Business Vault:

Folder: models/edw/{optional: domain name}{optional: __subdomain name(s)}/edw/bv
Models: edw_bv__{vault object named as per data vault standards}

Infomart

Staging:

Folder: models/infomart/{optional: domain name}{optional: __subdomain name(s)}/mart/{entity / product description}/stg
Models: {optional: domain name} {optional: __subdomain name(s)} {optional:__im__} stg{__entity / product description} __{ordinal}_{transformation description} {optional: __{source_system} __{source_channel}}

Dimensions:

Folder: models/infomart/{optional: domain name}{optional: __subdomain name(s)}/mart/{entity / product description}
Models: {optional: domain name} {optional: __subdomain name(s)} {optional:__im__} dim_{__entity / product description} (optional: __{source_system} __{source_channel})

Facts:

Folder: models/infomart/{optional: domain name}{optional: __subdomain name(s)}/mart/{entity / product description}
Models: {optional: domain name} {optional: __subdomain name(s)} {optional:__im__} fact_{__entity / product description} (optional: __{source_system} __{source_channel})

Denormalized (One Big Table):

Folder: models/infomart/{optional: domain name}{optional: __subdomain name(s)}/mart/{entity / product description}
Models: {optional: domain name} {optional: __subdomain name(s)} {optional:__im__} obt_{__entity / product description} {optional: __{source_system} __{source_channel}}

Example dbt model structure:

The model structure below reflects a single catalog for domain+environment and schema separation for zones and layers:

{{domain/enterprise} _project_name}
├── README.md
├── analyses
├── seeds  <ods>
│   └── ref_country_codes.csv  <e.g: corporate__dev.ods.ref_country_codes>
│   └── config_holidays.csv  <e.g: corporate__dev.ods.config_holidays>
├── dbt_project.yml
├── macros
│   └── custom_macro.sql
│   ├── utilities
│       └── all_dates.sql
├── models/raw
│   /{optional: domain and subdomains}
│   ├── _raw.md
│   ├── ods  <ods>
│   |   └── (typically empty - ODS populated by seeds/sources)
│   └── pds  <pds>
│       └── ref_country_codes_snapshot.sql  <e.g: corporate__dev.pds.ref_country_codes_snapshot>
│       └── config_holidays_snapshot.sql  <e.g: corporate__dev.pds.config_holidays_snapshot>
├── models/edw/
│   ├── _edw.md
│   ├── source  (source-centric staging - organized by source system) <edw_stg>
│   |   /{optional: domain and subdomains}
│   |   ├── source_system_1
│   |   |   ├── _source_system_1__docs.md
│   |   |   ├── _source_system_1__models.yml
│   |   |   └── stg
│   |   |       ├── stg__source_system_1__object__01_renamed_and_typed.sql  <e.g: corporate__dev.edw_stg.stg__source_system_1__object__01_renamed_and_typed>
│   |   |       └── stg__source_system_1__object__02_cleaned.sql  <e.g: corporate__dev.edw_stg.stg__source_system_1__object__02_cleaned>
│   |   └── source_system_2__adf
│   |       ├── _source_system_2__adf__docs.md
│   |       ├── _source_system_2__adf__models.yml
│   |       └── stg
│   |           ├── stg__source_system_2__adf__accounts__01_renamed_and_typed.sql  <e.g: corporate__dev.edw_stg.stg__source_system_2__adf__accounts__01_renamed_and_typed>
│   |           └── stg__source_system_2__adf__accounts__02_cleaned.sql  <e.g: corporate__dev.edw_stg.stg__source_system_2__adf__accounts__02_cleaned>
│   ├── mart  (entity centric base marts and entity-centric staging) <edw> and <edw_stg>
│   |   /{optional: domain and subdomains}
│   |   └── employee
│   |       ├── _employee__docs.md
│   |       ├── _employee__models.yml
│   |       ├── dim_employee.sql  <e.g: corporate__dev.edw.dim_employee (built from stg__employee__03_keyed)>
│   |       └── stg  (entity-centric staging) <edw_stg>
│   |           ├── stg__employee__01_unified.sql  <e.g: corporate__dev.edw_stg.stg__employee__01_unified (unions source_system_1 & source_system_2)>
│   |           ├── stg__employee__02_deduped.sql  <e.g: corporate__dev.edw_stg.stg__employee__02_deduped>
│   |           └── stg__employee__03_keyed.sql  <e.g: corporate__dev.edw_stg.stg__employee__03_keyed>
│   |   └── account
│   |       └── dim_account.sql  <e.g: corporate__dev.edw.dim_account>
│   |   └── date
│   |       └── dim_date.sql  <e.g: corporate__dev.edw.dim_date>
│   ├── ref  (reference data with optional entity-centric staging) <edw_ref> and <edw_stg>
│   |   ├── _reference_data__docs.md
│   |   ├── facility
│   |   |   ├── _facility__models.yml
│   |   |   ├── ref_facility.sql  <e.g: corporate__dev.edw_ref.ref_facility>
│   |   |   └── stg  (entity-centric staging when unifying multiple sources) <edw_stg>
│   |   |       ├── stg__facility__01_unified.sql  <e.g: corporate__dev.edw_stg.stg__facility__01_unified>
│   |   |       └── stg__facility__02_mapped.sql  <e.g: corporate__dev.edw_stg.stg__facility__02_mapped>
│   |   └── country_code
│   |       ├── _country_code__models.yml
│   |       └── ref_country_code.sql  <e.g: corporate__dev.edw_ref.ref_country_code (single source, no staging)>
│   └── sources
│        └── {optional: domain}
│        └── {optional: raw/edw/infomart}
│             └── _source_system_1__sources.yml 
├── models/infomart  <im>
│   /{optional: domain and subdomains}
│   ├── _infomart.md
│   └── domain_name e.g: finance 
│       └── mart
│           ├── _finance__models.yml
│           ├── orders.sql  <e.g: corporate__dev.im.orders>
│           └── payments.sql  <e.g: corporate__dev.im.payments>
│               └── stg  <im_stg>
│                   └── stg_payments_pivoted_to_orders.sql  <e.g: corporate__dev.im_stg.stg_payments_pivoted_to_orders>
├── packages.yml
├── snapshots
└── tests
    └── assert_positive_value_for_total_amount.sql

dbt_project.yml

The yml structure below reflects a single catalog for domain+environment and schema separation for zones and layers:

models:
  health_lakehouse__engineering__dbt:
    +persist_docs: #enables injection of metadata into unity catalog
        relation: true
        columns: true

    raw:
      ods:
        +description: "Operational data store - current state"
        +schema: ods
        +materialized: table
      pds:
        +description: "Persistent data store - historical snapshots (Type 2 SCD)"
        +schema: pds
        +materialized: table

    edw:
      ref:
        +description: "Reference data (conformed lookups, master data)"
        +schema: edw_ref
        +materialized: table
        stg:
          +description: "Entity-centric reference staging (when unifying multiple sources)"
          +schema: edw_stg
          +materialized: view
      source:
        +description: "Source-centric staging organized by source system"
        +schema: edw_stg
        +materialized: view
      mart:
        +description: "Entity-centric base marts (dimensions, facts)"
        +schema: edw
        +materialized: table
        stg:
          +description: "Entity-centric staging (integrates across sources)"
          +schema: edw_stg
          +materialized: view

    infomart:
      +materialized: view # default for speed
      mart:
        +description: "Information marts (business-aligned products)"
        +schema: im
        +materialized: table
        stg:
          +description: "Information mart staging (intermediate transformations)"
          +schema: im_stg
          +materialized: view

seeds:
  health_lakehouse__engineering__dbt:
    +schema: ods
    +description: "Static reference and configuration data loaded from CSV files to ODS"

CI/CD

The following standards and conventions relate to Continuous Improvement and Continuous Delivery constructs.

Repository naming

All lowercase with hyphens as separators
Format: {org}-{domain}-{purpose}-{optional:descriptor}

Examples: - intuitas-corporate-dbt - intuitas-corporate-ingestion-framework
- intuitas-corporate-cicd-templates - intuitas-corporate-infrastructure

Branch naming

All lowercase with hyphens as separators
Naming convention: {type}-{optional:ticket-id}-{description}

Types: - feature: New functionality - bugfix: Bug fixes - hotfix: Critical fixes for production - release: Release branches - docs: Documentation updates only - refactor: Code improvements with no functional changes - test: Test-related changes

Examples: - feature-eng123-add-new-data-source - bugfix-eng456-fix-null-values - hotfix-prod-outage-fix - release-v2.1.0 - docs-update-readme - refactor-optimize-transforms - test-add-integration-tests

Branch lifecycle

Simple branch lifecycle:

main/master: Primary branch
branch: Short-lived branches development branch, merged or rebased to main/master

Comprehensive team branch lifecycle:

Main/master: Primary branch
Development: Active development branch
Feature/bugfix: Short-lived branches merged to development
Release: Created from development, merged to main/master
Hotfix: Created from main/master for urgent fixes

Databricks Asset Bundles

Databricks asset bundles are encouraged for all Databricks projects.

project/bundle name:
{domain_name}__databricks (for general databricks projects)
{domain_name}__dbt (for dbt databricks bundles)
Bundle tags:
Key: environment: {environment}
Key: manager: {team_name} and or {email_address}
Key: managing_domain:{domain_name}` e.g: if delegating to engineering domain
Key: owner: {owner_name}
Key: owning_domain: {domain_name}
Key: dab: {bundle_name}
Key: project: {project_name}

e.g: yml tags: environment: ${bundle.target} project: health-lakehouse dab: health_lakehouse__engineering__databricks owning_domain: intuitas_engineering owner: engineering-admin@intuitas.com manager: engineering-engineer@intuitas.com managing_domain: intuitas_engineering

Resources:
Folder level 1: {meaningful sub-project name}
Folder level 2:
- notebooks
- workflows
Databricks.yml
For both dev and prod: root_path: /Workspace/Users/engineering-engineer@intuitas.com/.bundle/${bundle.name}/${bundle.target}

Example databricks.yml

# This is a Databricks asset bundle definition for health_lakehouse__engineering.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
  name: health_lakehouse__engineering__databricks

variables:
  default_cluster_id:
    value: "-----"  


include:
  - resources/*.yml
  - resources/**/*.yml
  - resources/**/**/*.yml

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://------.15.azuredatabricks.net
      root_path: /Workspace/Users/engineering-engineer@intuitas.com/.bundle/${bundle.name}/${bundle.target}
  prod:
    mode: production
    workspace:
      host: https://------.15.azuredatabricks.net
      root_path: /Workspace/Users/engineering-engineer@intuitas.com/.bundle/${bundle.name}/${bundle.target}
    permissions:
      - user_name: engineering-engineer@intuitas.com
        level: CAN_MANAGE
    run_as:
      user_name: engineering-engineer@intuitas.com

Security

Security standards and conventioned provided here provide a starter set, however existing organisational and applicable industry standards should take precedence. Consult with your cybersecurity advisor.

Entra

Under development. (Contact us to know more).

Most organisations will already have an established set of groups and conventions. Where there are gaps, the following can still be considered.

Recommended areas to align to organisational governance and cyber requirements:

Naming conventions for admin, service, and user groups
Role-based access alignment (least privilege, separation of duties)
Alignment to domains - Cross-domain vs. domain-specific group patterns

Entra Group Names:

Pattern: grp-<org>-<domain>-<plat>-<scope>-<role>-<env>[-<region>][-ext-<partner>][-idx]
Lowercase, hyphen-separated; no spaces.
Keep to ≤ 120 chars total.
No PII in names.
Use Security groups (not M365) for RBAC; enable PIM where appropriate e.g. Admins.

role:

owner — full control of the named scope
admin — administrative (non-ownership) rights
contrib — create/modify within scope
editor — modify data/artifacts, not permissions
reader — read-only
steward — governance/metadata rights
custodian — key/secret/storage control
operator — run/ops rights (pipelines, jobs)
viewer — read dashboards/reports

plat:

dbx (Databricks), uc (Unity Catalog), pbi (Power BI), adf (Data Factory),
dlk (Data Lake), sql (Azure SQL), kva (Key Vault), syn (Synapse)

scope (or object):

Databricks Workspace: ws-

Unity Catalog: uc-meta (metastore), uc-cat-, uc-sch-., uc-obj-..

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

Standards and Conventions

Table of Contents

Mesh

Domain Names

Platform

Environment

VNET

Resource Groups

Databricks workspace

Key vault

Secrets

Entra Group Names

Azure Data Factory (ADF)

SQL Server

SQL Database

Storage

Lakehouse storage

Lakehouse storage containers

Lakehouse storage folders

Generic Blob storage

Generic Blob files and folders

Databricks

Workspace and cluster names

Jobs and Pipelines

Job names

For Delta Live Table (DLT) Pipelines

Orchestration job names

Job logging

Optional

Catalog naming and conventions

Externally mounted (lakehouse federation) Catalog Names

Catalog Metadata tags:

Schema and object conventions

Schema level external storage locations

Metadata Schemas and Objects

Raw (aka Bronze) (Data according to systems)

EDW (aka Silver) (Data according to business entities)

Infomart (aka Gold) (Data according to requirements)

Delta Sharing

Azure Data Factory

Streaming

dbt

Documentation and model metadata

Sources

Model and Folder Names

Raw

EDW

Infomart

Example dbt model structure:

dbt_project.yml

CI/CD

Repository naming

Branch naming

Branch lifecycle

Simple branch lifecycle:

Comprehensive team branch lifecycle:

Databricks Asset Bundles

Security

Entra

Policies

Frameworks