Schemas

Schemas are supported in data contracts.

Schemas in the data contract describe the syntax and semantics of provided datasets. Schemas are a way to ensure data quality and manage expectations.

The Data Mesh Manager supports schemas in different types.


dbt model

Use this schema when you use dbt.

Each table, view, or materialized view is described as a model in a dbt YAML file. Each model contains a name, description, a list of columns, and a list of tests.

Learn more: dbt model properties reference.

Example of a dbt schema in YAML

schema:
  type: dbt
  specification:
    version: 2
    models:
      - name: "My View"
        description: "My description"
        config:
          materialized: view
        columns:
          - name: "My column"
            data_type: text
            description: "My description"
            tests:
              - dbt_expectations.expect_column_to_exist
              - not_null

Google BigQuery

Use this schema when on Google BigQuery.

The schema structure is defined by the Google BigQuery Table object. You can extract such a Table object via the tables.get endpoint.

Instead of providing a single Table object, you can also provide an array of such objects. Be aware that tables.list only returns a subset of the full Table object. You need to call every Table object via tables.get to get the full Table object, including the actual schema.

Learn more: Google BigQuery REST Reference v2.

Example of a Google BigQuery schema in JSON

schema:
  type: bigquery
  specification: >
    {"tableReference": {
        "projectId": "my-project",
        "datasetId": "my_dataset",
        "tableId": "my_table"
      },
      "description": "This is a description",
      "type": "TABLE",
      "schema": {
        "fields": [
          {
            "name": "name",
            "type": "STRING",
            "mode": "NULLABLE",
            "description": "This is a description"
          }
        ]
      }
    }

JSON Schema

Use this schema for general modeling, or you already have JSON schema definitions.

The schema is a data model represented by a JSON object.

You can also pass in an array of JSON objects for multiple models.

Learn more: JSON schema specification.

Example of a JSON schema in JSON

schema:
type: jsonschema
specification: >
    {
        "$id": "https://example.com/my_table.schema.json",
        "$schema": "http://json-schema.org/draft-07/schema#",
        "title": "my_table",
        "type": "object",
        "description": "Description of the model",
        "properties": {
            "name": {
                "description": "Description of the column",
                "type": "string"
            }
        }
    }

Avro

Use this schema when working with messages coming from Kafka that use Apache Avro.

The schema is a data model represented by a Avro record.

You can also pass in an array of Avro records for multiple models.

Learn more: Avro Documentation.

Example of a Avro schema in JSON

schema:
type: jsonschema
specification: >
    {
        "type": "record",
        "namespace": "com.example",
        "name": "my_table",
        "doc": "Description of the model",
        "fields": [{
            "name": "name",
            "type": "string",
            "doc": "Description of the column"
        }]
    }

Custom

Use a custom schema if none of the above schemas fit your needs. Be aware that the Data Mesh Manager will not be able to validate or parse the schema, and therefore, cannot provide any automation.

Example of a custom schema

schema:
  type: custom
  specification: >
    Any schema
    in any format
    as a string