Bringing pipeline run details into Atlan's Asset Profiles (using custom metadata attributes)
➰

Bringing pipeline run details into Atlan's Asset Profiles (using custom metadata attributes)

Join the Weekly Digest
image

πŸ€“ Why?

Is the data in this table actually updated?

When was the last time this data was updated?

Can I use this data for my analysis?

Do these questions seem familiar? For data consumers, the data pipeline can be a black box.

Sometimes you might finish an entire analysis project only to find that the data was outdated. Or worse, an executive dashboard might be shipped with the wrong data. 😨 On the other hand, data engineers waste a ton of time responding to Slack messages and emails from panicking data consumers.

πŸ’‘

To solve this once and for all, some amazing DataOps champions integrate metadata from their data pipeline into Atlan to create a single source of truth.

πŸ“„ How data pipeline metadata can help data consumers

Data pipelines can be a rich source of metadata. When made easily accessible to the end data consumers, this can significantly improve trust in your data team.

image

Here are a few typical kinds of metadata we see DataOps champions bring into Atlan's data asset profiles from their data pipelines:

  • Data freshness (e.g. last updated date and time)
  • Pipeline run status (e.g. success or failure)
  • Links to the pipeline (e.g. a link to the relevant Airflow DAG for troubleshooting)

πŸ‘‰ TL;DR: Integrate your pipeline metadata into Atlan asset profiles

1️⃣ Create the list of attributes that you want to add from your pipeline.

This uses Atlan's DIY "Custom metadata attributes" editor. Just go to the admin panel's "Business Metadata" section, and define the attributes you want to bring in from your data pipeline and display on an Atlan Asset Profile.

image

2️⃣ Integrate your pipeline Runs with Atlan's APIs.

Use this endpoint to post metadata from your ETL tool into Atlan. You can plug this API call at the end of each pipeline's run, so that the data in Atlan is updated automatically whenever a pipeline run happens.

image

πŸ”— Related reads

⚑️ Resources

πŸƒ
Integrate your Airflow DAG pipeline runs with Atlan