Lineage
Entropy Data ingests OpenLineage events and renders an interactive lineage graph on data product pages, showing how datasets flow through your pipelines.
The Lineage API is experimental. The endpoints and event schema may change.
Overview
When OpenLineage events are submitted for a data product, a Lineage section appears on the data product detail page. It shows an interactive graph of pipeline jobs, input datasets, and output datasets. Click Expand to open the full-screen lineage visualizer.
OpenLineage is an open standard for pipeline metadata. Many tools emit OpenLineage events natively, including dbt, Airflow, Spark, Flink, and Dagster.
Linking events to data products
Every event must be linked to a data product so that the lineage graph appears on the correct data product page. The API rejects events whose dataProductId is missing (400) or doesn't match any data product in the organization (422); supplying an outputPortId that doesn't exist on the resolved data product also returns 422. There are two options for providing the identifiers:
Option 1 — Query parameters in the endpoint
Bake the identifiers into the endpoint field of your OpenLineage transport config. The openlineage-python HTTP transport concatenates url + endpoint and preserves the query string as-is:
openlineage.yml
transport:
type: http
url: https://api.entropy-data.com
endpoint: api/v1/lineage?dataProductId=orders&outputPortId=snowflake_orders_v2
auth:
type: api_key
apiKey: <entropy-data-api-key>
outputPortName is accepted as a deprecated alias for outputPortId for backwards compatibility.
openlineage-python does not currently expose a top-level url_params (or urlParams) field on the HTTP transport — any such block in openlineage.yml is silently dropped. Encoding the parameters in endpoint is the supported workaround. If you need different identifiers per pipeline run, use Option 2 below.
Option 2 — entropy_data run facet
Embed the link in the event JSON itself:
entropy_data run facet
{
"run": {
"runId": "...",
"facets": {
"entropy_data": {
"_producer": "https://entropy-data.com",
"_schemaURL": "https://entropy-data.com/spec/facets/1-0-0/EntropyDataRunFacet.json",
"dataProductId": "orders",
"outputPortId": "snowflake_orders_v2"
}
}
}
}
Query parameters take precedence over the facet.
Attributing datasets to data products
The lineage visualizer groups inputs and outputs into data product containers based on which data product produces each dataset. By default a dataset is shown outside any container — to place it inside, attach an entropy_data facet to the dataset describing its owning data product, output port, data contract, and asset:
entropy_data dataset facet
{
"inputs": [
{
"namespace": "snowflake://account.snowflakecomputing.com",
"name": "raw_db.public.raw_orders",
"facets": {
"entropy_data": {
"_producer": "https://entropy-data.com",
"_schemaURL": "https://entropy-data.com/spec/facets/1-0-0/EntropyDataDatasetFacet.json",
"dataProductId": "orders",
"dataProductName": "Orders",
"dataProductHref": "/dataproducts/orders",
"outputPortId": "snowflake_orders_v2",
"outputPortName": "Snowflake Orders v2",
"outputPortHref": "/dataproducts/orders/outputports/snowflake_orders_v2",
"dataContractId": "orders_snowflake_v2",
"dataContractName": "Orders Snowflake v2",
"dataContractHref": "/datacontracts/orders_snowflake_v2",
"assetId": "raw-orders",
"assetName": "RAW_ORDERS",
"assetHref": "/assets/raw-orders"
}
}
}
]
}
Attach the same facet to entries in outputs to attribute produced datasets the same way.
The fields follow a consistent Id/Name/Href triple for each entity:
dataProductId,dataProductName,dataProductHref— the data product the dataset belongs to. The visualizer usesdataProductIdto place the dataset inside that data product's container, and theName/Hreffor the chip label and link.outputPortId,outputPortName,outputPortHref— the output port the dataset is published as. Shown in the dataset's references panel.dataContractId,dataContractName,dataContractHref— the data contract that governs the dataset's schema and semantics.assetId,assetName,assetHref— the registered asset the dataset corresponds to.
Within each triple Id is the lookup key; Name and Href are display overrides — when omitted, the visualizer falls back to Id for the label and the data product / output port / data contract / asset is referenced by id only. Provide Name and Href whenever the referenced entity lives in a different Entropy Data instance or isn't registered locally.
Datasets without an entropy_data facet still appear in the graph but float outside any data product container, with a chip indicating which selected data product consumes them.
dbt integration
To send lineage events from dbt, install openlineage-dbt and configure the transport:
Install
pip install openlineage-dbt
Create an openlineage.yml in your dbt project root:
openlineage.yml
transport:
type: http
url: https://api.entropy-data.com
endpoint: api/v1/lineage?dataProductId=orders&outputPortId=snowflake_orders_v2
auth:
type: api_key
apiKey: <entropy-data-api-key>
Then run dbt as usual. The openlineage-dbt integration emits START, COMPLETE, and FAIL events automatically.
openlineage.yml does not expand ${VAR} placeholders, so the API key cannot be templated into the file. To keep the secret out of source control — for example in CI — omit apiKey from the YAML and set it via the OpenLineage env-var override:
export OPENLINEAGE__TRANSPORT__AUTH__APIKEY=<entropy-data-api-key>
The OpenLineage client merges env-var overrides into the transport config before validation.
Retention
Events older than 90 days are automatically deleted when new events are submitted.
OpenAPI Specification
Refer to the OpenAPI Specification for the full formal API documentation.
Submit an OpenLineage event
Submit an OpenLineage RunEvent. Compatible with all OpenLineage producers (dbt, Airflow, Spark, Flink, etc.).
Required parameters
- Name
dataProductId- Type
- string
- Required
- Description
Data product external ID to associate the event with. Required as a URL query parameter or as the
dataProductIdfield of theentropy_datarun facet.400if missing,422if the value does not match any data product in this organization.
Optional parameters
- Name
outputPortId- Type
- string
- Required
- Description
Output port ID within the data product. Used to auto-resolve the data contract.
422if the value does not match any output port on the resolved data product.
- Name
outputPortName- Type
- string
- Required
- Description
Deprecated alias for
outputPortId. Accepted for backwards compatibility.
Request
curl --request POST https://api.entropy-data.com/api/v1/lineage \
--header "x-api-key: $DMM_API_KEY" \
--header "content-type: application/json" \
--data @- << EOF
{
"eventType": "COMPLETE",
"eventTime": "2024-01-15T10:00:00.000Z",
"run": {
"runId": "d46e465b-d358-4d32-83d4-df660ff614dd"
},
"job": {
"namespace": "dbt",
"name": "analytics.dp_orders.stg_orders"
},
"inputs": [
{
"namespace": "snowflake://account.snowflakecomputing.com",
"name": "raw_db.public.raw_orders"
}
],
"outputs": [
{
"namespace": "snowflake://account.snowflakecomputing.com",
"name": "analytics_db.public.orders"
}
],
"producer": "https://github.com/OpenLineage/OpenLineage/tree/0.18.0/integration/dbt"
}
EOF
Get OpenLineage events
Retrieve stored OpenLineage events. All filters are optional; omit all to get every event.
Optional parameters
- Name
jobNamespace- Type
- string
- Required
- Description
Filter by job namespace.
- Name
jobName- Type
- string
- Required
- Description
Filter by job name.
- Name
runId- Type
- string
- Required
- Description
Filter by run ID.
- Name
eventType- Type
- string
- Required
- Description
Filter by event type:
START,RUNNING,COMPLETE,ABORT,FAIL.
- Name
dataProductId- Type
- string
- Required
- Description
Filter by data product external ID.
Request
curl --get https://api.entropy-data.com/api/v1/lineage \
--header "x-api-key: $DMM_API_KEY" \
--data-urlencode "dataProductId=orders"
Delete OpenLineage events
Delete events by run ID, by job namespace + name, or delete all events if no filters are provided.
Optional parameters
- Name
runId- Type
- string
- Required
- Description
Delete events for this run ID.
- Name
jobNamespace- Type
- string
- Required
- Description
Delete events for this job namespace (requires
jobName).
- Name
jobName- Type
- string
- Required
- Description
Delete events for this job name (requires
jobNamespace).
Request
curl --request DELETE https://api.entropy-data.com/api/v1/lineage \
--header "x-api-key: $DMM_API_KEY" \
--data-urlencode "runId=d46e465b-d358-4d32-83d4-df660ff614dd"