Assets

Assets are an optional concept in the Entropy Data that allows you to integrate your data products with your data platform, especially with an already existing data catalog.

The Entropy Data uses the term "asset" to represent a technical representation of your data, such as a table, schema, or topic in your data platform. We chose this term to be platform agnostic, so it can be used for any data platform, such as Databricks, Snowflake, AWS, Azure, Google Cloud, or Kafka, or the terminology your existing data catalog uses.

Assets (resp. your data) can be linked to data products and output ports to provide a clear mapping between the technical representation of your data and those business concepts. This supports you in keeping track of data lineage and helps to reduce manual effort when creating data contracts and output ports.

What features are available for Assets?

Entropy Data provides a set of features to manage assets and their relationships with data products and output ports. The main features are:

Generate models for data contracts
Import data from external data platforms to support the creation of data products
Import data from external data platforms to support the creation of output ports
Align the entities inside of Entropy Data with the entities of your external data platform(s)
Track the technical representation of your data products on your data platform

Assets

As representation of externally existing data, assets shall be synced with your data platform(s).

That is why we intentionally exclude support for creating assets in the UI, as the real power of this feature comes to play when you use one of the available automation features. So in the UI you can only view the assets and their relationships, but not create or change them.

To make it as easy as possible to integrate your existing data platform(s) with the asset concept, we support the following ways to add them:

Connector

As the first and recommended option, you can choose an integration with one of our connectors. For more info about how they integrate with the overall architecture, see the Architecture page. Basically, a connector runs in your secure network environment and connects to your data platform(s) to fetch the assets and their relationships. The connector then syncs the assets with the Entropy Data:

Connector Architecture

You can find the list of available connectors in the Integrations section, together with information on how to configure them for your setup. The following connectors are available at the moment:

SDK

You can use our SDK to create assets programmatically. The SDK is available for Java

REST-API

You can also create your own integration with our REST-API. You can find further information about the API in the API documentation.

Decide which integration you want to use based on your needs. If one of our connectors is available, you should use it, as it will provide you with the best experience and most features. If you want to create assets programmatically, you can use our SDK or the REST-API, based on your preferred programming language.

Asset Model

You find the exact complete data structure of an asset in the API documentation.

Three parts are of special interest when adding assets, because these properties are relevant when generating data contracts or when importing data products and output ports:

Info The general meta information about an asset is stored as an object in this property. Fields you might want to set are source and type. Those determine which data platform you are using and what kind of structure is represented by the asset.
Columns If your asset is a table-like structure, add the items here to enable automatic generation of models in data contracts.
Relationships Relationships can be established between assets. E.g. they enable the opportunity to generate multiple models from a schema or a schema-like structure. You can find more information in chapter Asset Structure: Child Assets.

Hint: To support the import of connection information to the underlying servers, you can store the info in the info.qualifiedName as URI or you can use the properties map. We try to map some keywords to connection information when importing. We added some documentation for different types of assets in the Server Properties section at the end of this article.

Asset Structure: Child Assets

We support assets to have relationships. You are completely free in how you describe the edge between two assets. E.g. you could model something like:

On the left hand side you find a two-tier relationship structure with a parent schema "Sales Schema" and a child table "Sales Table". The parent relationship indicates that the "Sales Table" is part of the "Sales Schema". On the right hand side you find a three-tier structure that adds a "catalog" level above the schema. This kind of structure is used in Databricks Unity Catalog for example. Based on your use case, you can decide to leave it out, if you are only using one single catalog.

Please note that asset relationships are not foreign keys between tables or anything like that, but should represent the technical structure of your data platform. For a relational database that would be something like Database -> Schema -> Table. Or for a Kafka Cluster this would be something like Cluster -> Topic.

You are free in naming the asset relationshipType, but if you want to use our data contract creation, you will have to use the relation parent to support creation of data contracts with multiple models.

Recommendations for naming your asset source and type

We recommend to use the following names for your asset source: Databricks: unity Snowflake: snowflake, Microsoft Purview: purview

For the asset type we have some special handling in place for the following key identifiers:

Databricks: unity_catalog, unity_schema, unity_table
Snowflake: snowflake_schema, snowflake_table
Microsoft Purview: purview_schema, purview_database, purview_table

If your asset structure contains table or schema we will try to supply the same functionality.

Server Properties

When importing assets, the system maps server-related properties from the source asset to the output port's connection properties. The mapping logic depends on the asset's source and the format of its qualifiedName. Below are the rules and examples for each supported source.

General Rules

If a property can be extracted from the qualifiedName, it takes precedence.
If not, the system falls back to the values in asset.properties.
For unknown sources, the location property is set to the qualifiedName or asset.properties["location"].

Source: Databricks / Unity

If qualifiedName has 2 or 3 parts (e.g., catalog.schema or catalog.schema.table):
catalog and schema are extracted from qualifiedName.
host is taken from asset.properties["host"].
Otherwise:
host, catalog, and schema are taken from asset.properties.

Example:

{
  "qualifiedName": "mycatalog.myschema",
  "properties": {
    "host": "adb-1234.5.azuredatabricks.net"
  }
}

Resulting output port server fields:

host: adb-1234.5.azuredatabricks.net
catalog: mycatalog
schema: myschema

Source: Purview

If qualifiedName matches mssql://host:port/database/schema/table:
Extract host, port, database, and schema from qualifiedName.
Otherwise:
Use values from asset.properties.

Example:

{
  "qualifiedName": "mssql://sqlserver.example.com:1433/mydb/myschema/mytable"
}

Resulting output port server fields:

host: sqlserver.example.com
port: 1433
database: mydb
schema: myschema

Source: Snowflake

All values are taken from asset.properties:
account, database, schema, host, port

Example:

{
  "properties": {
    "account": "myaccount",
    "database": "MYDB",
    "schema": "PUBLIC"
  }
}

Resulting output port server fields:

account: myaccount
database: MYDB
schema: PUBLIC

Source: Postgres

If qualifiedName matches postgresql://host:port/dbs/database/schemas/schema/tables/table:
Extract host, port (default 5432 if missing), database, and schema.
Otherwise:
Use values from asset.properties.

Example:

{
  "qualifiedName": "postgresql://db.example.com:5432/dbs/mydb/schemas/public/tables/mytable"
}

Resulting output port server fields:

host: db.example.com
port: 5432
database: mydb
schema: public

Other Sources

The location property is set to the asset's qualifiedName or, if not present, to asset.properties["location"].