Lineage

Entropy Data ingests OpenLineage events and renders an interactive lineage graph on data product pages, showing how datasets flow through your pipelines.

Overview

When OpenLineage events are submitted for a data product, a Lineage section appears on the data product detail page. It shows an interactive graph of pipeline jobs, input datasets, and output datasets. Click Expand to open the full-screen lineage visualizer.

OpenLineage is an open standard for pipeline metadata. Many tools emit OpenLineage events natively, including dbt, Airflow, Spark, Flink, and Dagster.


Linking events to data products

Events are linked to a data product and output port so that the lineage graph appears on the correct data product page. There are two options:

Option 1 — Query parameters

Configure urlParams in your OpenLineage transport config:

openlineage.yml

transport:
  type: http
  url: https://api.entropy-data.com
  endpoint: api/v1/lineage
  auth:
    type: api_key
    apiKey: <entropy-data-api-key>
  urlParams:
    dataProductId: orders
    outputPortName: snowflake_orders_v2

Option 2 — entropy_data run facet

Embed the link in the event JSON itself:

entropy_data run facet

{
  "run": {
    "runId": "...",
    "facets": {
      "entropy_data": {
        "_producer": "https://entropy-data.com",
        "_schemaURL": "https://entropy-data.com/spec/facets/1-0-0/EntropyDataRunFacet.json",
        "dataProductId": "orders",
        "outputPortName": "snowflake_orders_v2"
      }
    }
  }
}

Query parameters take precedence over the facet.


dbt integration

To send lineage events from dbt, install openlineage-dbt and configure the transport:

Install

pip install openlineage-dbt

Create an openlineage.yml in your dbt project root:

openlineage.yml

transport:
  type: http
  url: https://api.entropy-data.com
  endpoint: api/v1/lineage
  auth:
    type: api_key
    apiKey: <entropy-data-api-key>
  urlParams:
    dataProductId: orders
    outputPortName: snowflake_orders_v2

Then run dbt as usual. The openlineage-dbt integration emits START, COMPLETE, and FAIL events automatically.


Retention

Events older than 90 days are automatically deleted when new events are submitted.


OpenAPI Specification


POST/api/v1/lineage

Submit an OpenLineage event

Submit an OpenLineage RunEvent. Compatible with all OpenLineage producers (dbt, Airflow, Spark, Flink, etc.).

Optional parameters

  • Name
    dataProductId
    Type
    string
    Required
    Description

    Data product external ID to associate the event with.

  • Name
    outputPortName
    Type
    string
    Required
    Description

    Output port name within the data product. Used to auto-resolve the data contract.

Request

POST
/api/v1/lineage
curl --request POST https://api.entropy-data.com/api/v1/lineage \
  --header "x-api-key: $DMM_API_KEY" \
  --header "content-type: application/json" \
  --data @- << EOF
{
  "eventType": "COMPLETE",
  "eventTime": "2024-01-15T10:00:00.000Z",
  "run": {
    "runId": "d46e465b-d358-4d32-83d4-df660ff614dd"
  },
  "job": {
    "namespace": "dbt",
    "name": "analytics.dp_orders.stg_orders"
  },
  "inputs": [
    {
      "namespace": "snowflake://account.snowflakecomputing.com",
      "name": "raw_db.public.raw_orders"
    }
  ],
  "outputs": [
    {
      "namespace": "snowflake://account.snowflakecomputing.com",
      "name": "analytics_db.public.orders"
    }
  ],
  "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.18.0/integration/dbt"
}
EOF

GET/api/v1/lineage

Get OpenLineage events

Retrieve stored OpenLineage events. All filters are optional; omit all to get every event.

Optional parameters

  • Name
    jobNamespace
    Type
    string
    Required
    Description

    Filter by job namespace.

  • Name
    jobName
    Type
    string
    Required
    Description

    Filter by job name.

  • Name
    runId
    Type
    string
    Required
    Description

    Filter by run ID.

  • Name
    eventType
    Type
    string
    Required
    Description

    Filter by event type: START, RUNNING, COMPLETE, ABORT, FAIL.

  • Name
    dataProductId
    Type
    string
    Required
    Description

    Filter by data product external ID.

Request

GET
/api/v1/lineage
curl --get https://api.entropy-data.com/api/v1/lineage \
  --header "x-api-key: $DMM_API_KEY" \
  --data-urlencode "dataProductId=orders"

DELETE/api/v1/lineage

Delete OpenLineage events

Delete events by run ID, by job namespace + name, or delete all events if no filters are provided.

Optional parameters

  • Name
    runId
    Type
    string
    Required
    Description

    Delete events for this run ID.

  • Name
    jobNamespace
    Type
    string
    Required
    Description

    Delete events for this job namespace (requires jobName).

  • Name
    jobName
    Type
    string
    Required
    Description

    Delete events for this job name (requires jobNamespace).

Request

DELETE
/api/v1/lineage
curl --request DELETE https://api.entropy-data.com/api/v1/lineage \
  --header "x-api-key: $DMM_API_KEY" \
  --data-urlencode "runId=d46e465b-d358-4d32-83d4-df660ff614dd"