Usage Tracking

Track and monitor how your data products are being used by sending usage data to Entropy Data using the OpenTelemetry Traces API. This enables you to understand how your consumers are using your data product, identify unmanaged consumers, and track usage trends.

Usage Details

Overview

Entropy Data provides an OpenTelemetry-compatible API endpoint that accepts trace data in the OTLP (OpenTelemetry Protocol) JSON format. Usage data is stored for 30 days and automatically visualized in the data product usage dashboard.

The usage tracking system supports:

  • Query execution tracking (SELECT, INSERT, UPDATE, DELETE, etc.)
  • User and role attribution
  • Query result status (SUCCESS, FAILURE)
  • Row count metrics
  • Query preview text
  • Asset/table access tracking

API Endpoint

POST /api/v1/traces
GET /api/v1/traces
DELETE /api/v1/traces

Authentication

All requests require an API Key with organization scope.

curl -H "X-API-Key: YOUR_API_KEY" \
     https://api.entropy-data.com/api/v1/traces

Sending Usage Data

Request Format

Usage data is sent as OpenTelemetry traces using the OTLP JSON format. The request structure consists of:

  1. Resource attributes - Identify the data product and service
  2. Scope - Define the type of telemetry (use "usage" for query tracking)
  3. Spans - Individual usage events with detailed attributes

Example Request

curl -X POST https://api.entropy-data.com/api/v1/traces \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "resourceSpans": [
      {
        "resource": {
          "attributes": [
            {
              "key": "service.name",
              "value": { "stringValue": "snowflake" }
            },
            {
              "key": "dataproduct.id",
              "value": { "stringValue": "orders-data-product" }
            },
            {
              "key": "outputport.name",
              "value": { "stringValue": "orders_pii_v2" }
            },
            {
              "key": "outputport.version",
              "value": { "stringValue": "2.0.0" }
            }
          ]
        },
        "scopeSpans": [
          {
            "scope": {
              "name": "usage",
              "version": "1.0.0"
            },
            "spans": [
              {
                "traceId": "9f8c6b2e1d2b47b48a3aef4c2c5b7d10",
                "spanId": "123abc456def7890",
                "name": "usage",
                "kind": "SPAN_KIND_INTERNAL",
                "startTimeUnixNano": 1760488800000000000,
                "endTimeUnixNano": 1760488800050000000,
                "attributes": [
                  {
                    "key": "user",
                    "value": { "stringValue": "john.doe@example.com" }
                  },
                  {
                    "key": "role",
                    "value": { "stringValue": "DATA_ANALYST" }
                  },
                  {
                    "key": "query.type",
                    "value": { "stringValue": "SELECT" }
                  },
                  {
                    "key": "query.preview",
                    "value": {
                      "stringValue": "SELECT order_id, customer_id, total FROM orders WHERE date > '2024-01-01'"
                    }
                  },
                  {
                    "key": "result",
                    "value": { "stringValue": "SUCCESS" }
                  },
                  {
                    "key": "rows",
                    "value": { "intValue": 1523 }
                  },
                  {
                    "key": "access.id",
                    "value": { "stringValue": "047bde7c-87d4-488a-b6d2-cef6f6f60000" }
                  },
                  {
                    "key": "asset.ids",
                    "value": {
                      "arrayValue": {
                        "values": [
                          { "stringValue": "047bde7c-87d4-488a-b6d2-cef6f6f60000" }
                        ]
                      }
                    }
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }'

Resource Attributes

Resource attributes identify the data product and service. These are set at the resource level.

AttributeRequiredTypeDescription
service.nameNostringThe name of the service (e.g., "snowflake", "bigquery", "databricks")
dataproduct.idRecommendedstringThe external ID of the data product
datacontract.idOptionalstringThe external ID of the data contract (alternative to dataproduct.id)
outputport.nameOptionalstringThe name of the output port being accessed
outputport.versionOptionalstringThe version of the output port

Note: Either dataproduct.id or datacontract.id must be provided to associate the usage data with a specific resource.

Scope Configuration

The scope defines the type of telemetry being sent. For usage tracking, use:

{
  "scope": {
    "name": "usage",
    "version": "1.0.0"
  }
}

Span Attributes

Span attributes capture details about individual usage events.

Core Attributes

AttributeRequiredTypeDescription
userRecommendedstringUser identifier (email, username, or service account)
roleOptionalstringUser's role or permission level
query.typeRecommendedstringType of query (SELECT, INSERT, UPDATE, DELETE, etc.)
query.previewRecommendedstringPreview of the query (typically first 200 characters). Remove confidential information.
resultRecommendedstringQuery execution result (SUCCESS, ERROR)
rowsOptionalintegerNumber of rows returned or affected
access.idOptionalstringReference to an access agreement
asset.idsOptionalarray of stringsIDs of assets/tables accessed in the query

Span Identifiers

FieldRequiredTypeDescription
traceIdYesstringUnique identifier for the trace
spanIdYesstringUnique identifier for the span. This is the identifier. Sending the spanId again would overwrite existing spans.
nameNostringName of the span (use "usage")
kindNostringSpan kind (use "SPAN_KIND_INTERNAL")
startTimeUnixNanoYesintegerStart time in nanoseconds since Unix epoch
endTimeUnixNanoYesintegerEnd time in nanoseconds since Unix epoch

Retrieving Usage Data

You can retrieve stored usage data using the GET endpoint with optional filters.

Query Parameters

ParameterTypeDescription
scopeNamestringFilter by scope name (e.g., "usage")
dataProductIdstringFilter by data product external ID
dataContractIdstringFilter by data contract external ID

Example

Get usage data for a data product

curl -X GET "https://api.entropy-data.com/api/v1/traces?scopeName=usage&dataProductId=orders-data-product" \
  -H "X-API-Key: YOUR_API_KEY"

The response follows the same OTLP JSON structure as the POST request.

Deleting Usage Data

You can delete usage data using the DELETE endpoint with the same query parameters.

Example

Delete all usage data for a data product

curl -X DELETE "https://api.entropy-data.com/api/v1/traces?scopeName=usage&dataProductId=orders-data-product" \
  -H "X-API-Key: YOUR_API_KEY"

Data Retention

Usage data is automatically deleted after 30 days. This retention period is enforced when new traces are submitted - the system automatically cleans up traces older than 30 days for your organization.

Best Practices

  1. Use consistent identifiers: Ensure dataproduct.id matches the external ID in Entropy Data
  2. Include user context: Always populate the user attribute for access auditing
  3. Limit query preview size: Truncate queries to ~200 characters to avoid excessive storage
  4. Use appropriate query types: Standardize on query type values (SELECT, INSERT, UPDATE, DELETE)
  5. Track query results: Always include the result attribute to monitor failures
  6. Unique span IDs: Generate unique spanId values for each usage event to allow updates
  7. Batch submissions: For high-volume scenarios, consider batching multiple spans in a single request

Integration Examples

The following examples show how to track usage data for data products. The recommended workflow is:

  1. Query Entropy Data to get data products and their output ports
  2. Extract server configuration (database, schema) from output ports
  3. Query the data platform's query history filtered by database/schema
  4. Optionally filter by specific table names from data product assets
  5. Send filtered usage data to Entropy Data

Snowflake Query History

import snowflake.connector
import requests
import uuid
from datetime import datetime, timedelta

def get_data_products(api_key):
    """Fetch all data products from Entropy Data"""
    response = requests.get(
        "https://api.entropy-data.com/api/dataproducts",
        headers={"X-API-Key": api_key}
    )
    return response.json()

def get_snowflake_queries_for_output_port(snowflake_conn, database, schema, table_names=None, hours=24):
    """
    Query Snowflake query history filtered by database and schema.
    Optionally filter by specific table names from data product assets.
    """
    cursor = snowflake_conn.cursor()

    # Base query for query history
    query = """
        SELECT
            query_id,
            query_text,
            user_name,
            role_name,
            query_type,
            start_time,
            end_time,
            rows_produced,
            error_code,
            database_name,
            schema_name
        FROM snowflake.account_usage.query_history
        WHERE start_time >= DATEADD(hour, -%s, CURRENT_TIMESTAMP())
            AND database_name = %s
            AND schema_name = %s
    """

    params = [hours, database.upper(), schema.upper()]

    # Optionally filter by specific tables (from data product assets)
    if table_names:
        table_filters = " OR ".join(["query_text ILIKE %s" for _ in table_names])
        query += f" AND ({table_filters})"
        params.extend([f"%{table}%" for table in table_names])

    cursor.execute(query, params)

    columns = [desc[0].lower() for desc in cursor.description]
    results = []
    for row in cursor.fetchall():
        results.append(dict(zip(columns, row)))

    return results

def send_usage_to_entropy_data(api_key, data_product_id, output_port_name, queries):
    """Send filtered query history to Entropy Data as usage traces"""

    spans = []
    for query in queries:
        span = {
            "traceId": uuid.uuid4().hex[:32],
            "spanId": uuid.uuid4().hex[:16],
            "name": "usage",
            "kind": "SPAN_KIND_INTERNAL",
            "startTimeUnixNano": int(query['start_time'].timestamp() * 1_000_000_000),
            "endTimeUnixNano": int(query['end_time'].timestamp() * 1_000_000_000),
            "attributes": [
                {"key": "user", "value": {"stringValue": query['user_name']}},
                {"key": "role", "value": {"stringValue": query['role_name']}},
                {"key": "query.type", "value": {"stringValue": query['query_type']}},
                {"key": "query.preview", "value": {"stringValue": query['query_text'][:200]}},
                {"key": "result", "value": {"stringValue": "SUCCESS" if query['error_code'] is None else "FAILURE"}},
                {"key": "rows", "value": {"intValue": query['rows_produced'] or 0}}
            ]
        }
        spans.append(span)

    if not spans:
        return True

    payload = {
        "resourceSpans": [{
            "resource": {
                "attributes": [
                    {"key": "service.name", "value": {"stringValue": "snowflake"}},
                    {"key": "dataproduct.id", "value": {"stringValue": data_product_id}},
                    {"key": "outputport.name", "value": {"stringValue": output_port_name}}
                ]
            },
            "scopeSpans": [{
                "scope": {"name": "usage", "version": "1.0.0"},
                "spans": spans
            }]
        }]
    }

    response = requests.post(
        "https://api.entropy-data.com/api/v1/traces",
        headers={"X-API-Key": api_key, "Content-Type": "application/json"},
        json=payload
    )

    return response.status_code == 200

# Main workflow
def sync_snowflake_usage(entropy_api_key, snowflake_conn):
    """
    Complete workflow to sync Snowflake usage data for all data products
    """
    # Step 1: Get all data products from Entropy Data
    data_products = get_data_products(entropy_api_key)

    for dp in data_products:
        data_product_id = dp['id']

        # Step 2: For each output port, get server configuration
        for output_port in dp.get('outputPorts', []):
            output_port_name = output_port['id']

            # Extract database and schema from server configuration
            server_config = output_port.get('server', {})
            database = server_config.get('database')
            schema = server_config.get('schema')

            if not database or not schema:
                print(f"Skipping {data_product_id}/{output_port_name}: missing database/schema")
                continue

            # Step 3: Optionally get table names from assets
            table_names = None
            if 'assets' in dp:
                table_names = [asset.get('name') for asset in dp['assets']
                             if asset.get('name')]

            # Step 4: Query Snowflake for relevant queries
            queries = get_snowflake_queries_for_output_port(
                snowflake_conn,
                database,
                schema,
                table_names,
                hours=24
            )

            print(f"Found {len(queries)} queries for {data_product_id}/{output_port_name}")

            # Step 5: Send to Entropy Data
            if queries:
                send_usage_to_entropy_data(
                    entropy_api_key,
                    data_product_id,
                    output_port_name,
                    queries
                )

# Usage
snowflake_conn = snowflake.connector.connect(
    user='YOUR_USER',
    password='YOUR_PASSWORD',
    account='YOUR_ACCOUNT'
)

sync_snowflake_usage('YOUR_ENTROPY_API_KEY', snowflake_conn)

Viewing Usage Data

Once usage data is submitted, the Usage KPI becomes available on the data product page in the Studio. The KPI is only displayed to data product owners and team members with edit permissions.

To view usage data:

  1. Navigate to your data product
  2. Click on the "Usage" tab
  3. View charts and detailed query logs for the last 30 days

Usage KPIs

The usage dashboard displays:

  • Query volume over time
  • Top users by query count
  • Query type distribution
  • Success/failure rates
  • Detailed query logs with all attributes

The detailed view shows individual query logs with all captured attributes:

Usage Details