Data Products
A data product is a logical data unit on the data platform that is actively managed by a team (owner).
A data product exposes its data through output ports, ensuring downstream users or systems can easily consume it. An output port is specified by a data contract, which defines the schema, semantics, and quality for using the data, along with the terms of use.
Data Product Types
Data Mesh Manager supports the following data product types:
- Source-aligned Data Product: Data Products that are closely aligned to the entities or events generated in corresponding operational systems without significant transformations.
- Aggregated Data Product: Data Products that are built on top of one or more source-aligned data products and provide aggregated or transformed data for multiple use cases.
- Consumer-aligned Data Product: Data Products that are designed to meet the specific needs of a particular consumer or group of consumers, often involving significant transformations or aggregations.
In addition, the following systems can be added that also provide or consume data, while they are usually not considered as data products:
- Data Consumer: A system that consumes data from data products, but does not provide data itself. Examples: reports, dashboards, notebooks, BI tools
- Application: A system or software that implements business processes and generates or consumes data. Applications typically have an API. Examples: operational systems, microservices, databases, CRM, MDM, source systems, external data providers
With these, you can model the data landscape of your organization and understand how data flows between different systems, both in operational and in the analytical realm.
Output Ports
A data product can have zero, one, or multiple output ports. An output port is the technical endpoint to a specific dataset.
An output port is usually the combination of:
- data model (e.g., one or multiple tables, PII, non-PII)
- version (v1, v2)
- server technology (Databricks, Snowflake, S3, Kafka, etc.)
- environment (prod, dev, test)
The output port has a server, to which a data consumer can connect to access the data (e.g., the hostname, database, and schema name in Snowflake).
An output port can be specified by a data contract, which defines the schema, semantics, and quality for using the data, along with the terms of use.
Data consumers (users, teams, and other data products) request access to a specific output port.
Input Ports
Input ports represent upstream data products or applications that provide the source data for the data product.
To add an input port, request access add or request access for the consuming data product. The input port will be created automatically.
Assets (optional)
Internal components, such as data pipelines, raw and intermediary tables, ingestion methods, test code, and infrastructure details that are not relevant for data consumers are usually not part of the data product in Data Mesh Manager. However, these assets can be assigned to data products or output ports for documentation, navigation, search, and lineage purposes.
Costs (optional)
Infrastructure costs and other expenses related to the data product can be assigned to the data product to track the costs for building and running the data product. The cost information can be used for data product controlling to evaluate the business value of the data product.
Data Product Specification
A data product can be edited in the YAML editor or through the API as JSON for automated provisioning. It follows the Data Product Specification.