A bit of a boring but necessary topic today, SQL documentation. It's like cleaning your kitchen or flossing. Like those, it's not the most thrilling task, but it's important. Avoiding it might not have immediate consequences, but over time it can get complicated. I'll be diving into this mundane but important task.
What can it include?
Data dictionaries that define the tables, columns, data types, and constraints of the database
Diagrams that illustrate the relationships between your tables
Documents with old versions of SQL queries (more on that in a bit).
There are various tools you can use like text files, spreadsheets, diagrams, wikis or a specific software. If your company already has a documentation tool in place it's best to use that. Feel free to explore others (especially if you're in smaller company) to see if there's anything that can work better for your company. A few popular documentation tools are: Confluence, Notion, or specific SQL documentation like Dataedo.
The most important thing is to make sure your documentation is up-to-date, accurate, and accessible.
What about Queries?
Most of what I listed above is information about the database. Like what tables mean, the relationships between your tables, and constraints of the database. I want to talk about documenting your SQL queries.
Specifically including information like:
Explanation of any complex SQL functions used
Concerns related to the query (like if it's resource intensive)
Note whether the query is used in any reports
Why Document?
Now why would you want to document your SQL queries? If you already have documentation on your databases. A few reasons:
Make it easier for others to understand, maintain and update your queries
Help with collaboration
Keep data integrity (governance)
Help explain purpose and logic of your queries
Easy to reuse across different projects and teams
Encourages consistent SQL queries
How to Document SQL Queries
Great, now that we know the why let's move onto the how. There are two main ways of doing this:
Commenting in the query itself
Documenting changes, updates, etc. (aka version control)
Comments
The first step is writing comments on your SQL queries.
Comments are a great way to add notes or annotations to SQL queries to explain or clarify the code. A header (something you put at the beginning) can include descriptions and parameters. Use inline codes to explain specific lines or sections of the query and block comments to explain larger chunks.
For example:
/*
This query gets the total purchases per day for each location
Columns: date, location, total purchases
*/
SELECT
--Date of Purchases
date,
location,
--This is the total purchases
SUM(purchase_amount) AS total_purchases,
SUM(order) AS count_of_orders
FROM
table1
GROUP BY
column1,
column2
Important to note: write your comment and documentation while you're writing your code, not after. This way you can explain the context while it's fresh in your mind. Depending on the complexity of the query you can add more comments, if it's less then you can reduce your comments. Use your judgement.
For your query you should have a consistent naming (which may already be documented at your company) and formatting conventions.
Documenting Changes (Version Control)
What happens if you update or change the SQL query? You need to make sure you have some sort of version control. This is different for every organization. You can use tools like Github or Gitlab.
If you don't have that, then when you make changes to a SQL query make sure to keep the old SQL query. It can be as simple as copying and pasting the old SQL into a Word document (see example below). In your documentation make a note of any changes made, why, and when the changes were made.
For example:
(V2) [This is the New Query]
Updated 8/23/23. Added in a column to calculate the total number of orders
SELECT
--Date of Purchases
date,
location,
--This is the total purchases
SUM(purchase_amount) AS total_purchases,
SUM(order) AS count_of_orders
FROM
table1
GROUP BY
column1,
column2
Old Query (V1) [This was the old query]
Removed 8/23/23.
SELECT
--Date of Purchases
date,
location,
--This is the total purchases
SUM(purchase_amount) AS total_purchases
FROM
table1
GROUP BY
column1,
column2
Now you have a history of your changes. It can also help you if you need to go back to a previous version.
General Tips
Below are some general tips when documenting your SQL queries.
Gather feedback - ask your team to provide feedback on the documentation. What's unclear? What needs more explanation?
Documentation Template - having a template for your documentation can make things go quicker
Review - make sure to get a colleague to look at your work for a second set of eyes
For dashboards - make a note of the SQL queries used and provide a brief description on how it works
Conclusion
SQL skills are important for querying data but it's equally important to effectively document your work. Without proper documentation you may lose time (forgetting why you made a change and what changes you made) and it might lead to inefficient workflows and miscommunication within the team.