I love spreadsheets. I have a spreadsheet for almost every area of my life. From time I spend learning, to how far I've taken to learn a skill, and even my weightlifting stats. I love collecting and analyzing data. One of the easiest ways to do this is by using spreadsheets like Excel or Google Sheets. Many of us use spreadsheets track our budget, calculate how much we owe on our mortgage, and much more. Excel/Google Sheets are the most widely used tools for creating and maintaining a spreadsheet.
If we want to quickly look at the information we merely scroll through the spreadsheet. We can use tools like: sorting to arrange the table in a certain way; functions to calculate specific things; and filters to find certain values. Spreadsheets work well with a small amount of data, like 50 - 1000 entries (but it depends on other conditions). But what happens when you have a data that contains 10,000s or 100,000s or more entries? At that point it would be difficult to use an Excel spreadsheet. Not only because it would be slow to do calculate anything (unless you have a very powerful computer) but it wouldn't be efficient.
So what do you do instead?
Databases
First off we need to understand where this data would be storied. In a database. Which is a program that helps store data and provides functionality for adding, modifying, and querying that data. And doing it all fast. The most popular type of database is a relational database. Which stores each kind of data in a table (similar to a spreadsheet in Excel). A row represents an item and a column represents properties about that item. It also makes it easy to form relationships between tables.
For example a book publishing company may have a authors table with information about the authors (unique id, name, location, email, books published, genre, etc.). Then in order to store information about their books they would have a books table with information about the book (unqiue id, title, publishing date, author, language, pages, etc.). Then they would have a books_authors table which remembers which author has written each book by mapping author ID to book IDs.
This is more efficient than having to repeat everything about the book and everything about the author.
How to Interact with a Database
Most databases come with a query language to interact with the database. The most common is SQL or Structured Query Language. These larger databases are used in companies like Meta (formerly known as Facebook), Khan Academy, Google and many more! What kind of data is stored in these companies? For something like Facebook they store data about: users, user_friends, user_posts, user_likes. for Khan Academy it would be: users, badges, user_progress, topics and more.
Using SQL you can: create tables, change data, and get back the data we're interested in. It's a way for a person to talk to a database.
There are different versions of SQL used like: MySQL, SQL Lite, SQL Server. But in this post I am going to go go over the major commands that almost all of them use.
Syntax
First is the syntax for SQL Statements, which is the specific language you use to query data.
SELECT column1, column2...,
FROM table_name;
In all of the queries you need to SELECT the columns or in some cases all of the columns using *. And you need to specify which table you are getting the data from using the FROM statement. These two statements are used in every query you will write.
Also for some SQL versions you need to add a semicolon at the end of the statement. In these examples I will add a semicolon at the end.
Functions
Below is the list of all of the functions we will go over:
Example:
I'm going to be using this example for the rest of the article. Here we have a books table with information on books. In this table I've only included 5 rows (entries). But imagine this is a much larger database with 10,000s of entries. Each book as an id, title, author, genre, language, copies sold, and publishing year
id | title | author | genre | language | copies_sold | publish_year | pages |
1 | Harry Potter and the Philosopher's Stone | J. K. Rowling | Fantasy | English | 120000000 | 1997 | 223 |
2 | The Diary of a Young Girl | Anne Frank | Autobiography | Dutch | 35000000 | 1947 | NULL |
3 | The Girl with the Dragon Tattoo | Stieg Larsson | Thriller | Swedish | 30000000 | 2005 | 480 |
4 | The Da Vinci Code | Dan Brown | Mystery | English | 80000000 | 2003 | 689 |
5 | The Little Prince | Antoine de Saint-Exupery | Novella | French | 100000000 | 1943 | 96 |
For each function I will be including:
A basic overview of what it is and how to use it.
An example using the above books table.
What the example means in words.
The output the example would give us.
SELECT
Specific Column/s
If you want to return a specific column or columns you include the column/s name after the SELECT keyword.
Example
SELECT title
FROM books
What it means:
Select the title column from the books table.
Output:
Harry Potter and the Philosopher's Stone
The Diary of a Young Girl
The Girl with the Dragon Tattoo
The Da Vinci Code
The Little Prince
All Columns
In the case you want to return all columns you use *.
Example
SELECT *
FROM books
What it means:
Select all the column from the books table
Output:
SELECT DISTINCT
If you want to return all unique values in a column, meaning no duplicates. You use the DISTINCT keyword.
Example
SELECT DISTINCT language
FROM books;
What it means:
Select the unique languages from the language column in the books table.
Output:
WHERE
When you want filter out specific information you can use the WHERE clause. It can be used to extract only entries that fulfill as specific condition. You use the following operators for the WHERE clause:
= equal
> greater than
< less than
>= greater than or equal
<= less than or equal
<> or != not equal
BETWEEN a certain range (see the next section for this)
LIKE search for a pattern (see the next section for this)
IN to specific multiple possible values for a column (see the next section for this)
Example 1:
SELECT *
FROM books
WHERE publish_year > 2000
What it means:
Select all of the columns from the books table where the publishing year was after 2000.
Output:
Example 2:
SELECT *
FROM books
WHERE publish_year = 1947
What it means:
Select all of the columns from the books table where the publishing year was 1947.
Output:
AND, OR, BETWEEN, IN, NOT
Often you'll want to select data based on multiple conditions. You can build up your WHERE queries by combining multiple conditions.
AND
Example
SELECT title
FROM books
WHERE publish_year > 1996
AND genre = 'Fantasy'
What it means:
You're selecting the title column in the books table where the publishing year is after 1996 and it's in the Fantasy genre.
Output:
Harry Potter and the Philosopher's Stone
OR
For selecting rows based on multiple conditions where some but not all of the conditions need to be met. Use the OR operator.
Example:
SELECT title
FROM books
WHERE publish_year = 1997
OR publish_year = 2003;
What it means:
Select from the title column in the books table, where the publishing year is either 1994 or 2000.
Output:
Harry Potter and the Philosopher's Stone
The Da Vinci Code
BETWEEN
To check for ranges use the BETWEEN keyword to filter values within a range.
Example:
SELECT title
FROM books
WHERE publish_year
BETWEEN 1994 AND 2000;
What it means:
Select from the title column in the books table, where the publishing year is between the year 1994 and 2000.
Output:
Harry Potter and the Philosopher's Stone
IN
If you want to filter based on many conditions use the IN operator. Which allows you to specify multiple values in a WHERE clause. It's easy to specify multiple OR conditions. Instead of having to have several OR conditions.
Example:
SELECT title
FROM books
WHERE publish_year IN (1997, 2003, 2005)
What it means:
Select from the title column in the books table, where the publishing year is either 1997, 2003 or 2005.
Output:
Harry Potter and the Philosopher's Stone
The Da Vinci Code
The Girl with the Dragon Tattoo
NOT
If you want to display record if the condition is NOT TRUE, you use the NOT operator.
Example:
SELECT title
FROM books
WHERE NOT publish_year = 1997
What it means:
Select from the title column in the books table, to list all titles that the publishing year is not in 1997.
Output:
The Diary of a Young Girl
The Girl with the Dragon Tattoo
The Da Vinci Code
The Little Prince
NULL, IS NULL
Null represents an unknown or missing value. You can check for NULL values using the expression IS NULL.
Example:
SELECT title, author
FROM books
WHERE pages IS NULL
What it means:
Select from the title and author columns in the books table, where there is no publishing year.
Output:
LIKE, NOT LIKE
LIKE
To search for a patter in a column you can use the LIKE operator in the WHERE clause. To do this you use a wildcard, as a placeholder for some other value.
%
The % wildcard will match zero, one, or many characters in text.
Example:
SELECT title
FROM books
WHERE title LIKE 'The%'
What it means:
Select from the title column in the books table, where the beginning of the title starts with 'The'.
Output:
The Diary of a Young Girl
The Girl with the Dragon Tattoo
The Da Vinci Code
The Little Prince
_
While the _ wildcard will match a single character.
Example:
SELECT title
FROM books
WHERE title LIKE 'The Li_tle'
What it means:
Select from the title column in the books table, where the title matches titles like 'The Li_tle'.
Output:
The Little Prince
NOT LIKE
You can also use NOT LIKE operator to find records that don't match the pattern you specify.
SELECT title
FROM books
WHERE title NOT LIKE 'H%'
What it means:
Select from the title column in the books table, where the title does not start with the letter H.
Output:
The Diary of a Young Girl
The Girl with the Dragon Tattoo
The Da Vinci Code
The Little Prince
AS
To avoid situations where you will get two columns with the same name. You do something called aliasing. Which means as you assign a temporary name to something. To alias, you use the AS keyword.
Example:
SELECT MAX(copies_sold) AS max_sold,
MAX(pages) AS max_pages
FROM books
What it means:
Select the maximum number of copies sold and rename the column as max_sold, and grab the max number of pages and rename the column as max_pages from the books table.
Output:
ORDER BY
If you want to sort your results in either ascending (A-Z or 1-10) or descending (Z-A, 10-1) according to the values of one or more columns order you will need to use the ORDER BY keyword.
ASC
By default ORDER BY will sort in ascending order.
Example:
SELECT title
FROM books
ORDER BY publish_year;
What it means:
Select from the title column in the books table and order all of it by the publishing year in ascending order.
Output:
The Girl with the Dragon Tattoo
The Da Vinci Code
Harry Potter and the Philosopher's Stone
The Diary of a Young Girl
The Little Prince
DESC
If you want to sort it in descending order, you can use the DESC keyword.
Example:
SELECT title
FROM books
ORDER BY publish_year DESC;
What it means:
Select from the title column in the books table and order all of it by the publishing year in descending order.
Output:
The Little Prince
The Diary of a Young Girl
Harry Potter and the Philosopher's Stone
The Da Vinci Code
The Girl with the Dragon Tattoo
GROUP BY
When you want to aggregate results you can use GROUP BY. This groups certain values together based on one or more properties (columns).
Example:
SELECT language, count(*)
FROM books
GROUP BY language;
What it means:
Select the language column and count the number of times a language appears, then group by the language.
Output:
Conclusion
That was a basic overview of how to use SQL statements to query data. Now you can select specific columns of data in a table. Along with adding conditions to check for (operators) and how to order the data you queried.
This is part 1 in my SQL blog post series which goes over the functions used in SQL. There will be a part 2 to this post, that will be linked below once it's published. It will go over more intermediate functions like: joins (inner, left, right); group by; min and max; count, avg, sum. If you want to be notified when that blog post is published subscribe to my newsletter Kelly's Bytes.