Ingest Assets
Build your own integration to sync assets from your data platform to Entropy Data when prebuilt connectors don't fit your needs.
Before building your own integration, check if a prebuilt connector is available for your data platform (Databricks, Snowflake, AWS, Azure, Google Cloud, Kafka). Connectors provide the best experience with minimal setup.
Overview
Asset ingestion synchronizes metadata about your physical data sources (tables, views, schemas, topics, etc.) from your data platform to Entropy Data. This enables you to:
- Automatically generate data contracts from existing table structures
- Import assets as data products with minimal manual effort
- Link technical resources to business concepts (data products and output ports)
- Track data lineage across your data platform
You can implement asset ingestion using:
- SDK: Java library for building integrations (recommended for long-running applications)
- REST API: Direct API access for any programming language
Understanding the Asset Data Model
An asset in Entropy Data consists of three main parts:
1. Info Object
The info object contains metadata about the asset:
- source: Identifies the data source (e.g.,
snowflake,unity,purview,postgres). This is not about the data catalog, but rather the data source itself. Do not use values likeopenmetadataorcollibrahere, as they represent catalogs, not data sources. - type: Describes the asset type, prefixed by the source (e.g.,
snowflake_table,unity_schema,kafka_topic) - name: Name of the asset. It must not be the full qualified name, but rather the last segment (e.g., for a table
CUSTOMERS, notSALES_DB.PUBLIC.CUSTOMERS) - qualifiedName: Unique identifier, often used to extract connection information (e.g.,
SALES_DB.PUBLIC.CUSTOMERSfor Snowflake,production.sales.customersfor Databricks Unity Catalog). If you provide a JDBC URL, the server details can be parsed automatically. - properties: Key-value map for additional metadata and connection details. Provide server details here if not included in
qualifiedName. For example, for Snowflake, includeaccount,database, andschemaas properties so that they will be picked up when converting an asset to a data product. Use theenvironmentproperty to indicate which environment the asset belongs to (e.g.,dev,staging,prod).
2. Columns
For table-like assets, the columns array defines the schema structure. Use the types from the type system of the data source, e.g., for Snowflake use the SQL types supported by Snowflake. This enables automatic generation of data contracts. The types will become the physicalType in the data contract, and the logicalType will be automatically derived.
3. Relationships
Assets can have hierarchical relationships using the relationships array. Common patterns include:
- Two-tier: Schema → Table
- Three-tier: Database → Schema → Table
Indicate that relationship at the child element pointing to its parent using relationshipType: "parent".
Example: Databricks Catalog (Top-Level Parent)
{
"id": "prod-catalog",
"info": {
"source": "databricks",
"type": "databricks_catalog",
"name": "production",
"qualifiedName": "production"
},
"properties": {
"catalog": "production",
"environment": "prod"
}
}
Example: Databricks Schema (Child of Catalog)
{
"id": "prod-sales-schema",
"info": {
"source": "databricks",
"type": "databricks_schema",
"name": "sales",
"qualifiedName": "production.sales"
},
"properties": {
"catalog": "production",
"schema": "sales",
"environment": "prod"
},
"relationships": [
{
"assetId": "prod-catalog",
"relationshipType": "parent"
}
]
}
Example: Databricks Table (Child of Schema)
{
"id": "prod-sales-customers",
"info": {
"source": "databricks",
"type": "databricks_table",
"name": "customers",
"qualifiedName": "production.sales.customers"
},
"properties": {
"catalog": "production",
"schema": "sales",
"environment": "prod"
},
"relationships": [
{
"assetId": "prod-sales-schema",
"relationshipType": "parent"
}
]
}
Mapping Your Data Source to Assets
Select the right values for source and type that represent your data platform. Use server types from the Open Data Contract Standard when available.
| Platform | source | type examples | Common properties | Notes |
|---|---|---|---|---|
| API | api | api_endpoint | location | |
| AWS Athena | athena | athena_database, athena_table | schema, catalog, stagingDir, regionName | |
| AWS Glue | glue | glue_database, glue_table | account, database, location, format | Also used for Glue Catalog |
| AWS Kinesis | kinesis | kinesis_stream | stream, region, format | |
| AWS Redshift | redshift | redshift_database, redshift_schema, redshift_table | host, database, schema, region, account | |
| AWS S3 | s3 | s3_bucket, s3_folder, s3_object | location, format, delimiter, endpointUrl | |
| Azure Storage | azure | azure_container, azure_blob | location, format, delimiter | |
| Azure Synapse | synapse | synapse_database, synapse_schema, synapse_table | host, port, database | |
| ClickHouse | clickhouse | clickhouse_database, clickhouse_table | host, port, database | |
| Custom Platform | custom | custom_<type> | Define your own properties | |
| Databricks | databricks | databricks_catalog, databricks_schema, databricks_table, databricks_view | host, catalog, schema | Modern Databricks with Unity Catalog |
| Dremio | dremio | dremio_source, dremio_schema, dremio_table | host, port, schema | |
| DuckDB | duckdb | duckdb_schema, duckdb_table | database, schema | |
| Google BigQuery | bigquery | bigquery_project, bigquery_dataset, bigquery_table, bigquery_view | project, dataset | |
| Google Cloud SQL | cloudsql | cloudsql_database, cloudsql_table | host, port, database, schema | |
| Google Pub/Sub | pubsub | pubsub_topic, pubsub_subscription | project | |
| IBM DB2 | db2 | db2_database, db2_schema, db2_table | host, port, database, schema | |
| Kafka | kafka | kafka_cluster, kafka_topic | host, format | |
| Microsoft Purview | purview | purview_database, purview_schema, purview_table | Varies by source | Prefer underlying data source type (e.g., sqlserver) when possible |
| MySQL | mysql | mysql_database, mysql_schema, mysql_table | host, port, database | |
| Oracle | oracle | oracle_database, oracle_schema, oracle_table | host, port, serviceName | |
| PostgreSQL | postgresql | postgresql_database, postgresql_schema, postgresql_table, postgresql_view | host, port, database, schema | |
| Presto | presto | presto_catalog, presto_schema, presto_table | host, catalog, schema | |
| SFTP | sftp | sftp_folder, sftp_file | location, format, delimiter | |
| Snowflake | snowflake | snowflake_database, snowflake_schema, snowflake_table, snowflake_view | account, database, schema, warehouse | |
| SQL Server | sqlserver | sqlserver_database, sqlserver_schema, sqlserver_table | host, port, database, schema | |
| Trino | trino | trino_catalog, trino_schema, trino_table | host, port, catalog, schema | |
| Vertica | vertica | vertica_database, vertica_schema, vertica_table | host, port, database, schema |
Naming Conventions
For source:
- Use the server type from ODCS when available
- Use lowercase, no spaces or special characters
- Prefer data source name over catalog name (e.g.,
sqlservernotpurview)
For type:
- Follow the pattern:
<source>_<asset_type> - Common asset types:
database,catalog,schema,table,view,topic,bucket,folder,file - Examples:
snowflake_table,kafka_topic,s3_bucket - Use the same terminology as your data platform (e.g., Databricks uses
catalog, Kafka usestopic)
Complete Examples
Example 1: Snowflake Table Asset
{
"id": "sales-customers",
"info": {
"source": "snowflake",
"type": "snowflake_table",
"name": "CUSTOMERS",
"qualifiedName": "SALES_DB.PUBLIC.CUSTOMERS",
"description": "Customer master data"
},
"properties": {
"account": "mycompany",
"database": "SALES_DB",
"schema": "PUBLIC",
"environment": "prod"
},
"columns": [
{
"name": "CUSTOMER_ID",
"type": "NUMBER(38,0)",
"description": "Unique customer identifier",
"required": true
},
{
"name": "EMAIL",
"type": "VARCHAR(255)",
"description": "Customer email address",
"required": true
},
{
"name": "FIRST_NAME",
"type": "VARCHAR(100)",
"description": "Customer first name",
"required": false
},
{
"name": "LAST_NAME",
"type": "VARCHAR(100)",
"description": "Customer last name",
"required": false
},
{
"name": "CREATED_AT",
"type": "TIMESTAMP_NTZ(9)",
"description": "Account creation timestamp",
"required": true
}
]
}
Example 2: Databricks Unity Table Asset
{
"id": "prod-sales-customers",
"info": {
"source": "unity",
"type": "unity_table",
"name": "customers",
"qualifiedName": "production.sales.customers",
"description": "Customer dimension table"
},
"properties": {
"host": "adb-1234567890.5.azuredatabricks.net",
"path": "/mnt/production/sales/customers",
"format": "delta",
"environment": "prod"
},
"columns": [
{
"name": "customer_id",
"type": "BIGINT",
"description": "Unique customer identifier",
"required": true
},
{
"name": "email",
"type": "STRING",
"description": "Customer email address",
"required": true
},
{
"name": "signup_date",
"type": "DATE",
"description": "Date customer signed up",
"required": true
}
],
"relationships": [
{
"assetId": "prod-sales-schema",
"relationshipType": "parent"
}
]
}
Example 3: Kafka Topic Asset
{
"id": "orders-topic",
"info": {
"source": "kafka",
"type": "kafka_topic",
"name": "orders",
"qualifiedName": "prod-cluster.orders",
"description": "Order events stream"
},
"properties": {
"bootstrap_servers": "kafka-1.example.com:9092,kafka-2.example.com:9092",
"cluster": "prod-cluster",
"partitions": "12",
"replication_factor": "3",
"environment": "prod"
},
"relationships": [
{
"assetId": "prod-kafka-cluster",
"relationshipType": "parent"
}
]
}
Example 4: Custom Data Platform
{
"id": "customer-dataset",
"info": {
"source": "custom_platform",
"type": "custom_dataset",
"name": "customers",
"qualifiedName": "custom://prod/sales/customers",
"description": "Customer dataset"
},
"properties": {
"location": "s3://my-bucket/datasets/customers",
"format": "parquet",
"environment": "prod"
},
"columns": [
{
"name": "id",
"type": "STRING",
"description": "Customer ID",
"required": true
},
{
"name": "name",
"type": "STRING",
"description": "Customer name",
"required": true
}
]
}
Next Steps
Once you've ingested your assets:
For questions or support, refer to the SDK documentation or API reference.