Lineage

Entropy Data ingests OpenLineage events and renders an interactive lineage graph on data product pages, showing how datasets flow through your pipelines.

Overview

When OpenLineage events are submitted for a data product, a Lineage section appears on the data product detail page. It shows an interactive graph of pipeline jobs, input datasets, and output datasets. Click Expand to open the full-screen lineage visualizer.

OpenLineage is an open standard for pipeline metadata. Many tools emit OpenLineage events natively, including dbt, Airflow, Spark, Flink, and Dagster.


Linking events to data products

Every event must be linked to a data product so that the lineage graph appears on the correct data product page. The API rejects events whose dataProductId is missing (400) or doesn't match any data product in the organization (422); supplying an outputPortId that doesn't exist on the resolved data product also returns 422. There are two options for providing the identifiers:

Option 1 — Query parameters in the endpoint

Bake the identifiers into the endpoint field of your OpenLineage transport config. The openlineage-python HTTP transport concatenates url + endpoint and preserves the query string as-is:

openlineage.yml

transport:
  type: http
  url: https://api.entropy-data.com
  endpoint: api/v1/lineage?dataProductId=orders&outputPortId=snowflake_orders_v2
  auth:
    type: api_key
    apiKey: <entropy-data-api-key>

outputPortName is accepted as a deprecated alias for outputPortId for backwards compatibility.

Option 2 — entropy_data run facet

Embed the link in the event JSON itself:

entropy_data run facet

{
  "run": {
    "runId": "...",
    "facets": {
      "entropy_data": {
        "_producer": "https://entropy-data.com",
        "_schemaURL": "https://entropy-data.com/spec/facets/1-0-0/EntropyDataRunFacet.json",
        "dataProductId": "orders",
        "outputPortId": "snowflake_orders_v2"
      }
    }
  }
}

Query parameters take precedence over the facet.


Attributing datasets to data products

The lineage visualizer groups inputs and outputs into data product containers based on which data product produces each dataset. By default a dataset is shown outside any container — to place it inside, attach an entropy_data facet to the dataset describing its owning data product, output port, data contract, and asset:

entropy_data dataset facet

{
  "inputs": [
    {
      "namespace": "snowflake://account.snowflakecomputing.com",
      "name": "raw_db.public.raw_orders",
      "facets": {
        "entropy_data": {
          "_producer": "https://entropy-data.com",
          "_schemaURL": "https://entropy-data.com/spec/facets/1-0-0/EntropyDataDatasetFacet.json",
          "dataProductId": "orders",
          "dataProductName": "Orders",
          "dataProductHref": "/dataproducts/orders",
          "outputPortId": "snowflake_orders_v2",
          "outputPortName": "Snowflake Orders v2",
          "outputPortHref": "/dataproducts/orders/outputports/snowflake_orders_v2",
          "dataContractId": "orders_snowflake_v2",
          "dataContractName": "Orders Snowflake v2",
          "dataContractHref": "/datacontracts/orders_snowflake_v2",
          "assetId": "raw-orders",
          "assetName": "RAW_ORDERS",
          "assetHref": "/assets/raw-orders"
        }
      }
    }
  ]
}

Attach the same facet to entries in outputs to attribute produced datasets the same way.

The fields follow a consistent Id/Name/Href triple for each entity:

  • dataProductId, dataProductName, dataProductHref — the data product the dataset belongs to. The visualizer uses dataProductId to place the dataset inside that data product's container, and the Name/Href for the chip label and link.
  • outputPortId, outputPortName, outputPortHref — the output port the dataset is published as. Shown in the dataset's references panel.
  • dataContractId, dataContractName, dataContractHref — the data contract that governs the dataset's schema and semantics.
  • assetId, assetName, assetHref — the registered asset the dataset corresponds to.

Within each triple Id is the lookup key; Name and Href are display overrides — when omitted, the visualizer falls back to Id for the label and the data product / output port / data contract / asset is referenced by id only. Provide Name and Href whenever the referenced entity lives in a different Entropy Data instance or isn't registered locally.

Datasets without an entropy_data facet still appear in the graph but float outside any data product container, with a chip indicating which selected data product consumes them.


dbt integration

To send lineage events from dbt, install openlineage-dbt and configure the transport:

Install

pip install openlineage-dbt

Create an openlineage.yml in your dbt project root:

openlineage.yml

transport:
  type: http
  url: https://api.entropy-data.com
  endpoint: api/v1/lineage?dataProductId=orders&outputPortId=snowflake_orders_v2
  auth:
    type: api_key
    apiKey: <entropy-data-api-key>

Then run dbt as usual. The openlineage-dbt integration emits START, COMPLETE, and FAIL events automatically.


Retention

Events older than 90 days are automatically deleted when new events are submitted.


OpenAPI Specification


POST/api/v1/lineage

Submit an OpenLineage event

Submit an OpenLineage RunEvent. Compatible with all OpenLineage producers (dbt, Airflow, Spark, Flink, etc.).

Required parameters

  • Name
    dataProductId
    Type
    string
    Required
    Description

    Data product external ID to associate the event with. Required as a URL query parameter or as the dataProductId field of the entropy_data run facet. 400 if missing, 422 if the value does not match any data product in this organization.

Optional parameters

  • Name
    outputPortId
    Type
    string
    Required
    Description

    Output port ID within the data product. Used to auto-resolve the data contract. 422 if the value does not match any output port on the resolved data product.

  • Name
    outputPortName
    Type
    string
    Required
    Description

    Deprecated alias for outputPortId. Accepted for backwards compatibility.

Request

POST
/api/v1/lineage
curl --request POST https://api.entropy-data.com/api/v1/lineage \
  --header "x-api-key: $DMM_API_KEY" \
  --header "content-type: application/json" \
  --data @- << EOF
{
  "eventType": "COMPLETE",
  "eventTime": "2024-01-15T10:00:00.000Z",
  "run": {
    "runId": "d46e465b-d358-4d32-83d4-df660ff614dd"
  },
  "job": {
    "namespace": "dbt",
    "name": "analytics.dp_orders.stg_orders"
  },
  "inputs": [
    {
      "namespace": "snowflake://account.snowflakecomputing.com",
      "name": "raw_db.public.raw_orders"
    }
  ],
  "outputs": [
    {
      "namespace": "snowflake://account.snowflakecomputing.com",
      "name": "analytics_db.public.orders"
    }
  ],
  "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.18.0/integration/dbt"
}
EOF

GET/api/v1/lineage

Get OpenLineage events

Retrieve stored OpenLineage events. All filters are optional; omit all to get every event.

Optional parameters

  • Name
    jobNamespace
    Type
    string
    Required
    Description

    Filter by job namespace.

  • Name
    jobName
    Type
    string
    Required
    Description

    Filter by job name.

  • Name
    runId
    Type
    string
    Required
    Description

    Filter by run ID.

  • Name
    eventType
    Type
    string
    Required
    Description

    Filter by event type: START, RUNNING, COMPLETE, ABORT, FAIL.

  • Name
    dataProductId
    Type
    string
    Required
    Description

    Filter by data product external ID.

Request

GET
/api/v1/lineage
curl --get https://api.entropy-data.com/api/v1/lineage \
  --header "x-api-key: $DMM_API_KEY" \
  --data-urlencode "dataProductId=orders"

DELETE/api/v1/lineage

Delete OpenLineage events

Delete events by run ID, by job namespace + name, or delete all events if no filters are provided.

Optional parameters

  • Name
    runId
    Type
    string
    Required
    Description

    Delete events for this run ID.

  • Name
    jobNamespace
    Type
    string
    Required
    Description

    Delete events for this job namespace (requires jobName).

  • Name
    jobName
    Type
    string
    Required
    Description

    Delete events for this job name (requires jobNamespace).

Request

DELETE
/api/v1/lineage
curl --request DELETE https://api.entropy-data.com/api/v1/lineage \
  --header "x-api-key: $DMM_API_KEY" \
  --data-urlencode "runId=d46e465b-d358-4d32-83d4-df660ff614dd"