AWS Glue Integration

Integration of AWS Glue Data Catalog with Entropy Data.

The AWS Glue integration is managed within Entropy Data. Configure the connection and sync schedule to start syncing with AWS Glue Data Catalog.

No additional deployments are needed.

Features

  • Asset Synchronization: Sync databases and tables from AWS Glue Data Catalog to Entropy Data as Assets.

Prerequisites

You need an Entropy Data Enterprise License or the Cloud Edition. To enable the integration, set APPLICATION_INGESTIONS_ENABLED to true in your environment. See Configuration for more information.

To start, navigate to Settings > Integrations > Add Integration. This opens a wizard that guides you through configuring the integration.

Configuration

Select the Integration Type

Select the Integration Type

Configure the Credentials

The integration uses AWS IAM credentials for authentication. Refer to the AWS documentation for creating a new access key. While creating the access key, select the use case 'Other'.

The IAM user or role associated with the access key needs permissions for glue:GetDatabases and glue:GetTables. You may use the following AWS policy to grant them:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetTables",
                "glue:GetDatabases"
            ],
            "Resource": "*"
        }
    ]
}

Note: Credentials are stored encrypted in the Entropy Data database. To enable encryption in your environment, set a 64 hex character APPLICATION_ENCRYPTION_KEYS in your environment (see Configuration).

Configure the Credentials

Configure Filters

Configure filters to limit which assets are synchronized. Both include and exclude filters are supported. For AWS Glue, filters can be applied to Databases and Tables.

Filters support '*' as a wildcard character to match any number of characters.

Configure Entropy Data

Configure Schedule

Set a schedule to automatically synchronize assets. You can choose from predefined schedules or define a custom schedule using the cron expression format.

Note: All schedules use the UTC timezone, so make sure to take this into account when configuring your schedule. Please do not synchronize the assets more than once or twice per day. We reserve the right to disable the integration if this happens. You will be able to trigger a synchronization manually if you need an immediate update.

Configure the Schedule

Complete the Integration Configuration

Choose a unique name for the integration, review your configuration, and click Create Integration. Complete the Integration configuration

Next Steps

The integration is now configured and will run according to the schedule. To check the integration status, navigate to Settings > Integrations. Here you'll find the current status and the last 10 integration runs.

Integration details

You can adjust the integration configuration and credentials at any time. The configuration is saved in YAML format with syntax validation support in the editor.

Edit the integration configuration

Note: The previously stored secret access key is not displayed in the edit view for security reasons.

Deselecting the Enabled checkbox disables the automatic schedule. Manual integration runs are still possible.