I recently read SQL for Data Analysis by Cathy Tanimura . I wanted to share a brief review of my thoughts. For context I’ve been a professional data analyst for a year and a half so not too long, but SQL is my most used tool at my work. I work at a startup and in the social gaming industry. This is just to give a background on where I’m coming from with my review.
Note: The links to the books mentioned in this blog post are Amazon affiliate links, so if you buy the book through this link I earn a small commission at no extra cost to you. This helps me create more content.
Overview
I’ve divided this review into four sections:
What I liked – Highlights of the book and its strengths.
What I didn’t – Areas that could be improved, with suggestions.
How I used – Ways I applied lessons from the book in my work.
Who this is for – The person I think will benefit most from this book.
What I liked
Real Examples
The book excelled in showing real analyses like retention, customer lifetime value, and more. Not only did it explain the SQL techniques but also the reasoning behind them and their business application. Cathy Tanimura didn’t just focus on making the SQL work but highlighted solving real business problems. Which was refreshing since it’s often missing in many data analytics resources.
Less Used SQL Functions
The book covered less commonly used SQL techniques like CROSS JOINs and self-JOINs, displaying how these can be viable alternatives to Window Functions. This gave me a better understanding of how to approach analyses from different angles. For example, it demonstrated the use of CROSS JOINs for filling in gaps in a dataset when doing a time series analysis, making sure every date is accounted for, even if a user wasn’t active on that day.
Methodology
Cathy’s step-by-step process for writing queries was one of the highlights. She didn’t just present complicated queries but broke them down into manageable parts, explaining how to approach each section logically. This method is especially useful for beginners who might feel overwhelmed by large SQL scripts.
Favorite Chapters
Personally my favorite chapters were:
Chapter 3: Time Series Analysis - It taught practical approaches to analyze data over time, like sales trends across months. The examples were relevant to my work, and the concepts were easy to apply.
Chapter 4: Cohort Analysis - The detailed breakdown of retention and survivorship analysis was insightful, making this chapter a go-to resource for any cohort-related projects I’ll work on.
Chapter 7: Experiment Analysis - The gaming-related dataset made this chapter stand out for me. It aligned closely with my industry, and I could directly apply its lessons to my job.
Business Applications
What stood out most was the focus on solving practical business problems. The examples weren’t just about writing SQL but about understanding the “why” behind the queries, which I believe is a key skill for any data analyst.
What I didn’t
Not Beginner Friendly: While it’s marketed towards beginners but besides the first chapter, it dives straight into SQL techniques. For beginners, I would’ve preferred a more comprehensive introduction with foundational concepts, such as generating basic statistical summaries of datasets.
The Second Chapter: The second chapter was less structured and felt like a mix of an introduction and analysis techniques. Frankly it was a bit confusing. Splitting it into two chapters: one for beginner-friendly analyses and another for data preparation could improve its readability.
Overuse of Subqueries: The author primarily used subqueries instead of Common Table Expressions (CTEs). While she explained her preference (CTEs weren’t standard when she learned SQL), I find CTEs are easier to understand and explain, especially for complex queries.
Irrelevant Datasets: Some of datasets used lacked practical relevance. For instance, analyzing extraterrestrial sightings isn’t applicable to most analysts. Using datasets tied to real business problems, such as product reviews or customer feedback, would have made the examples more impactful.
Query Tips for Beginners: SQL best practices, like using comments and optimizing queries, were covered at the back of the book. These tips would have been more helpful earlier, especially for readers new to structuring and debugging queries.
How I Used this
While not every chapter was applicable to my role, I focused on sections that could make the biggest impact, like the chapter on cohort analysis to improve retention metrics. My general strategy for technical books is to thoroughly read chapters that are directly relevant to my work while skimming less applicable sections.
I see this book as a reference that I’ll keep in my office to review frequently. Here are two others I use as well:
Practical Statistics for Data Scientists (2nd Edition) by Peter Bruce, Andrew Bruce, Peter Gedeck
Essential Mathematics for Data Science by Thomas Nield
Who is This For
This book is best for intermediate SQL users looking to improve their skills with practical business applications, like cohort or experiment analysis. However, I wouldn’t recommend it to complete beginners, as it assumes a level of familiarity with SQL basics. For beginners I’d recommend the following books:
Practical SQL (2nd Edition) by Anthony DeBarros
Learning SQL by Alan Beaulieu
Getting Started with SQL by Thomas Nield
Summary
SQL for Data Analysis by Cathy Tanimura is a great supplementary resource for leveling up your SQL skills. While it’s not beginner-friendly, the techniques and methods presented can be directly applied to solving real business problems. Despite some flaws, I found it valuable for refining my approach to SQL analysis.