Is the data in this table actually updated?
When was the last time this data was updated?
Can I use this data for my analysis?
Do these questions seem familiar? For data consumers, the data pipeline can be a black box.
Sometimes you might finish an entire analysis project only to find that the data was outdated. Or worse, an executive dashboard might be shipped with the wrong data. 😨 On the other hand, data engineers waste a ton of time responding to Slack messages and emails from panicking data consumers.
📄 How data pipeline metadata can help data consumers
Data pipelines can be a rich source of metadata. When made easily accessible to the end data consumers, this can significantly improve trust in your data team.
Here are a few typical kinds of metadata we see DataOps champions bring into Atlan's data asset profiles from their data pipelines:
- Data freshness (e.g. last updated date and time)
- Pipeline run status (e.g. success or failure)
- Links to the pipeline (e.g. a link to the relevant Airflow DAG for troubleshooting)
👉 TL;DR: Integrate your pipeline metadata into Atlan asset profiles
1️⃣ Create the list of attributes that you want to add from your pipeline.
This uses Atlan's DIY "Custom metadata attributes" editor. Just go to the admin panel's "Business Metadata" section, and define the attributes you want to bring in from your data pipeline and display on an Atlan Asset Profile.
2️⃣ Integrate your pipeline Runs with Atlan's APIs.
Use this endpoint to post metadata from your ETL tool into Atlan. You can plug this API call at the end of each pipeline's run, so that the data in Atlan is updated automatically whenever a pipeline run happens.