How to prioritize documentation efforts

How to prioritize documentation efforts


🤓 Why?

Imagine you’re writing a book — starting with a blank page is daunting. When you start implementing a metadata catalog in your organization, you might get that same feeling. When you face an empty catalog, it can feel overwhelming to see how much there is to do and figure out where to start.

These are some of the best practices we have seen data champions adopting in their organizations.

🌟 Best practices

As with any prioritization task, it helps to think of the classic 2x2 matrix:


Make items in the top-right corner the highest priority. Make those in the lower-left corner the lowest priority.

Do not assume you must document everything, especially from day 1.

Example of prioritizing your data assets
Example of prioritizing your data assets

Think about the task of documentation from two perspectives:

  • Looking backward, to document existing assets and capture existing knowledge.
  • Looking forward, to reduce the burden of looking backwards over time.

⬅️ Looking backward

1️⃣ Backlog

When writing a book, you might start with outlining its key themes, events, and examples. The equivalent in our world is a backlog. If your organization has already identified assets with knowledge issues or documentation gaps, then you have a backlog.

Use the 2x2 matrix to prioritize that backlog.

2️⃣ High impacts

Another place to start is high-impact assets like BI reports and dashboards. How many times have you been challenged on a particular figure on these reports? Answering those challenges is difficult without a clear definition of the figure and the inputs used to calculate it. In this case, start by documenting these figures and the assets involved in their calculations.

Another dimension of this same approach is documenting the assets that are most used. For example, document tables on which users run the most queries.

Both of these could be considered some of the “big bet” priorities (top-left corner of the prioritization matrix).

3️⃣ The “bus factor”

One more dimension to consider is the 🚌 factor. If you have a single team member who knows everything about an asset, what happens if that person disappears tomorrow? (They could leave your company, for example.)

Prioritize assets that have limited or no shared knowledge across more than one team member. These would be “big bet” priorities.

➡️ Looking forward

4️⃣ Onboarding of new assets

In combination with the techniques above, use a DataDoc template to streamline onboarding of new assets. This doesn’t clear an existing backlog, but ensures that backlog does not continue to grow with each new asset.

This assumes all new assets are of value. If you are onboarding experimental assets, be careful. Classifying or certifying the experimental assets may be enough initially. Too much up-front documentation for assets that may be deleted anyway could be a “thankless task/money pit” priority.

Focus on the documentation of new data products to avoid this pitfall.

5️⃣ Crowdsourcing

It is still worth capturing shared knowledge when it exists in the team. When onboarding new people to the team, for example, they can use this captured knowledge to learn. If many team members are able to contribute, crowdfunding should be an “incremental / filler task” priority.

Use gamification to further invigorate or incentivize crowdsourcing documentation.

📔 Resources

🔗 Related reads

Running a weekly "Documentation Hour" for data teams
Creating a DataDoc template to make documentation easy (and standardized!)
Using a Gamification Planner to drive the maximum value from your gamification drive