path,title,body
/tutorials/clean-data,Cleaning data with sqlite-utils and Datasette,"Tutorials > Cleaning data with sqlite-utils and Datasette

Cleaning data with sqlite-utils and Datasette

This tutorial will show you how to use the sqlite-utils command line tool import data into SQLite, clean it and then explore it with Datasette.

The concepts in this tutorial are also demonstrated in this video of a talk I gave at the HYTRADBOI conference in April 2022.


Start with some CSV data
The Florida Fish and Wildlife Conservation Commission maintain a database of manatee carcass recovery locations, with details dating back to April 1974.
You can request the latest copy as a CSV file from their website - or download this copy of the CSV: Manatee_Carcass_Recovery_Locations_in_Florida.csv generated on July 31st 2022.
Install sqlite-utils
Here are the installation instructions. If you are on a Mac with Homebrew installed you can run brew install sqlite-utils. Python users can run pip install sqlite-utils.
Import the CSV into a SQLite database
sqlite-utils is a command line tool, so you'll need to open up a terminal on your computer and change directory to the place where you saved the CSV file.
You can create a new manatees.db SQLite database like this:
sqlite-utils insert manatees.db locations \
  Manatee_Carcass_Recovery_Locations_in_Florida.csv --csv -d

This sqlite-utils insert command inserts data into a database table, creating the table if necessary.

manatees.db is the SQLite database file we are using - this will be created if it doesn't exist yet.
locations is the name of the table we are creating
Manatee_Carcass_Recovery_Locations_in_Florida.csv is the CSV file we are importing
--csv tells the tool to treat the incoming data as CSV data
The -d option tells the tool to automatically detect the types of data. Without this, every column will be treated as text. With this option columns that have numeric data in will be used to populate numeric columns.

Having created the database, we can use the schema command to view its schema:
sqlite-utils schema manatees.db

The output of that command looks like this:
CREATE TABLE ""locations"" (
   [X] FLOAT,
   [Y] FLOAT,
   [OBJECTID] INTEGER,
   [FIELDID] TEXT,
   [REPDATE] TEXT,
   [REPYEAR] INTEGER,
   [REPMONTH] INTEGER,
   [REPDAY] INTEGER,
   [SEX] TEXT,
   [TLENGTH] FLOAT,
   [STATE] TEXT,
   [COUNTY] TEXT,
   [LAT] FLOAT,
   [LONG_] FLOAT,
   [DCODE] INTEGER,
   [MORTALITY] TEXT,
   [created_user] TEXT,
   [created_date] TEXT,
   [last_edited_user] TEXT,
   [last_edited_date] TEXT
);

The locations table here has been created with columns that match the columns that were present in the CSV file. The columns have the appropriate types thanks to the -d option we passed to sqlite-utils insert.
We can use the sqlite-utils tables command to see those tables:
sqlite-utils tables manatees.db --counts

Output is:
[{""table"": ""locations"", ""count"": 13568}]

The --counts option adds a count of the rows in each table. That's over 13,000 deceased manatees!
By default, the tables command outputs JSON. You can change this to be a table format using the --table or -t options:
sqlite-utils tables manatees.db --counts -t
table        count
---------  -------
locations    13568

Querying the data
Let's run a SQL query against your data. The sqlite-utils command to do that looks like this:
sqlite-utils manatees.db ""select * from locations limit 2""

This outputs JSON, showing the first two rows and every column (thanks to that select *):
[{""X"": -80.1176098026034, ""Y"": 26.1004059191504, ""OBJECTID"": 149433, ""FIELDID"": ""M7401"", ""REPDATE"": ""1974/04/03 00:00:00+00"", ""REPYEAR"": 1974, ""REPMONTH"": 4, ""REPDAY"": 3, ""SEX"": ""F"", ""TLENGTH"": 260.0, ""STATE"": ""FL"", ""COUNTY"": ""Broward"", ""LAT"": 26.100401, ""LONG_"": -80.117607, ""DCODE"": 9, ""MORTALITY"": ""Undetermined: Other"", ""created_user"": ""FWC"", ""created_date"": ""2021/10/15 11:30:55+00"", ""last_edited_user"": ""FWC"", ""last_edited_date"": ""2021/10/15 11:30:55+00""},
 {""X"": -80.1664354041335, ""Y"": 25.8337161801256, ""OBJECTID"": 149434, ""FIELDID"": ""M7402"", ""REPDATE"": ""1974/06/27 00:00:00+00"", ""REPYEAR"": 1974, ""REPMONTH"": 6, ""REPDAY"": 27, ""SEX"": ""M"", ""TLENGTH"": 290.0, ""STATE"": ""FL"", ""COUNTY"": ""Miami-Dade"", ""LAT"": 25.833711, ""LONG_"": -80.166433, ""DCODE"": 1, ""MORTALITY"": ""Human Related: Watercraft Collision"", ""created_user"": ""FWC"", ""created_date"": ""2021/10/15 11:30:55+00"", ""last_edited_user"": ""FWC"", ""last_edited_date"": ""2021/10/15 11:30:55+00""}]

Let's specify some columns, limit to 10 and add -t to output the results as a table.
sqlite-utils manatees.db -t \
  ""select REPDATE, MORTALITY, COUNTY from locations limit 10""

This outputs:
REPDATE                 MORTALITY                            COUNTY
----------------------  -----------------------------------  ------------
1974/04/03 00:00:00+00  Undetermined: Other                  Broward
1974/06/27 00:00:00+00  Human Related: Watercraft Collision  Miami-Dade
1974/08/20 00:00:00+00  Human Related: Watercraft Collision  Miami-Dade
1974/10/23 00:00:00+00  Human Related: Other                 Volusia
1974/11/11 00:00:00+00  Human Related: Other                 Brevard
1974/12/25 00:00:00+00  Human Related: Watercraft Collision  Citrus
1974/12/27 00:00:00+00  Undetermined: Too Decomposed         Indian River
1975/01/01 00:00:00+00  Verified: Not Necropsied             Brevard
1975/01/19 00:00:00+00  Human Related: Watercraft Collision  Miami-Dade
1975/01/31 00:00:00+00  Verified: Not Necropsied             Brevard

Opening it in Datasette
Datasette provides a browser-based interface for exploring a SQLite database. Install that, (brew install datasette works for Homebrew users) and run it against your database:
datasette manatees.db

Then navigate to http://localhost:8001/ to start exploring your data.
Transforming the columns
Exploring the data helps identify improvements we can make to the way it is structured.
The LAT and LONG_ column look like they might be latitudes and longitudes. Renaming those columns to latitude and longitude will make that more obvious, and will also let us visualize the data using a Datasette plugin in a moment.
The created_user, last_edited_user and STATE columns aren't interesting: they always store the same value.
The X and Y columns are duplicates of the latitude and longitude.
Finally, the FIELDID column looks to be a unique identifier for every row in the database. This would make a good primary key for our table.
The sqlite-utils transform command can be used to apply transformations to a table - rename columns, dropping columns, assigning primary keys and more.
The following command will make a bunch of changes to the table structure in one go:
sqlite-utils transform manatees.db locations \
  --rename LAT latitude \
  --rename LONG_ longitude \
  --drop created_user \
  --drop last_edited_user \
  --drop X \
  --drop Y \
  --drop STATE \
  --drop OBJECTID \
  --pk FIELDID

Now run sqlite-utils schema manatees.db to see the result of those changes:
CREATE TABLE ""locations"" (
   [FIELDID] TEXT PRIMARY KEY,
   [REPDATE] TEXT,
   [REPYEAR] INTEGER,
   [REPMONTH] INTEGER,
   [REPDAY] INTEGER,
   [SEX] TEXT,
   [TLENGTH] FLOAT,
   [COUNTY] TEXT,
   [latitude] FLOAT,
   [longitude] FLOAT,
   [DCODE] INTEGER,
   [MORTALITY] TEXT,
   [created_date] TEXT,
   [last_edited_date] TEXT
);

Visualizing the locations on a map
Datasette supports plugins, which enable extra features such as data visualizations.
The datasette-cluster-map plugin runs against any table with a latitude and longitude column, displaying them as points on a map. This is why we renamed those LAT and LONG_ columns earlier!
To install the plugin, quit the Datasette server (using Ctrl+C in the terminal window) and run the following:
datasette install datasette-cluster-map

Then start Datasette running again:
datasette manatees.db

If you visit http://localhost:8001/manatees/locations you will now see the first 1,000 rows as points on the map:

Click the ""load all"" button to load and display all 13,000+ locations.
Fixing the date columns
The date columns currently look like this:
2021/10/15 11:30:55+00

This is using YYYY/MM/DD format (at least it's not that confusing - for Europeans - American MM/DD/YYYY thing). I prefer the ISO standard format YYYY-MM-DD - so let's convert it.
The sqlite-utils transform command can be used to apply a transformation function to the values in one or more columns.
Here's how to use that to parse miscellaneous date and time formats and turn them into that standard ISO format:
sqlite-utils convert manatees.db locations \
  REPDATE created_date last_edited_date \
  'r.parsedatetime(value)'

This runs against the REPDATE, created_date and last_edited_date columns. It will use the built-in r.parsedatetime() recipe (documented here) to convert those values and save them back to the same column.
Run that command, then view the data in Datasette again. Those columns have now been converted!
Extracting columns into a separate table
There's one more improvement we will make to our data.
Open the table in Datasette, then use either the suggested facets list or the cog menu at the top of the columns to facet by both DCODE and MORTALITY.

The result counts shown for the two facets are the same. This suggests that these two columns represent the same data - a DCODE of 1 corresponds to a MORTALITY of ""Human Related: Watercraft Collision"", 2 means ""Human Related: Flood Gate/Canal Lock"" and so on.
We can use the sqlite-utils extract command to extract these columns into a separate table.
sqlite-utils extract manatees.db locations DCODE MORTALITY \
  --rename MORTALITY name \
  --table mortality

Running this will create a new table called mortality and populate it with one row for each of the DCODE and MORTALITY pairs in the database. It removes those columns from the locations table and replaces them with a mortality_id foreign key column pointing to the new table.
Here's the output of sqlite-utils schema manatees.db showing the impact of those changes on the schema:
CREATE TABLE [mortality] (
  [id] INTEGER PRIMARY KEY,
  [DCODE] INTEGER,
  [name] TEXT
);
CREATE TABLE ""locations"" (
  [FIELDID] TEXT PRIMARY KEY,
  [REPDATE] TEXT,
  [REPYEAR] INTEGER,
  [REPMONTH] INTEGER,
  [REPDAY] INTEGER,
  [SEX] TEXT,
  [TLENGTH] FLOAT,
  [COUNTY] TEXT,
  [latitude] FLOAT,
  [longitude] FLOAT,
  [mortality_id] INTEGER,
  [created_date] TEXT,
  [last_edited_date] TEXT,
  FOREIGN KEY([mortality_id]) REFERENCES [mortality]([id])
);
CREATE UNIQUE INDEX [idx_mortality_DCODE_name]
   ON [mortality] ([DCODE], [name]);

Further reading
sqlite-utils offers a lot more tools for cleaning up data. Check out this documentation and these blog posts for more details.

sqlite-utils command line tool documentation
Executing advanced ALTER TABLE operations in SQLite
Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool
Refactoring databases with sqlite-utils extract"
/tutorials/explore,Exploring a database with Datasette,"Tutorials > Exploring a database with Datasette

Exploring a database with Datasette
This tutorial will teach you how to use Datasette to explore a new database.
Datasette offers flexible tools for exploring data tables. It's always worth spending time familiarizing yourself with data in its raw, tabula form before thinking about ways to apply more sophisticated analysis or visualization.
The congress-legislators data
We'll be using an example database of Members of the United States Congress, 1789 to present. I built this example using data from the unitedstates/congress-legislators project on GitHub, maintained by Joshua Tauberer, Eric Mill and over 100 other contributors.
Access the example in Datasette here: https://congress-legislators.datasettes.com/legislators

Understanding tables
The key to understanding relational databases such as SQLite (which we are using here), MySQL, PostgreSQL, Oracle and so on is to learn to think in tables.
Database tables are like spreadsheets - they have columns and rows, and each row provides a value for every column. Unlike spreadsheets you can't merge cells together or apply formatting - a database table is a straight-forward two-dimensional grid. It's just the data.
Each column in a table has a type. In SQLite these types can be one of text, integer, real (for floating point values) and blob (for binary data). The type of the column matters because it affects what happens when you sort the table by that column, or which mathematical operations you can use against those values.  
Browsing tables
The tables in our example database are:

legislators - 12,590 rows - individuals who have served as senators or representatives
legislator_terms - 44,542 rows - terms served by legislators, back to 1789
executives - 80 rows - individuals who have served as President or Vice President
executive_terms - 129 rows - terms served by executives
offices - 1,192 rows - district officers maintained by current legislators
social_media - 525 rows - social media accounts for current legislators

Follow these links now and explore the tables to get a feel for the way the data is organized in this database.
Tables can relate to each other
Sometimes you'll see a row in a table that links to another table, for example the legislator_id column in the legislator_terms table:

Clicking one of the linked names in that column will take you to a page for that individual legislator row that looks like this:

The ""Links from other tables"" section shows how many rows in other tables refer back to this legislator.
In a database these are called ""foreign keys"" - they work by storing the ID of a row from another table in a dedicated foreign key column.
Foreign keys are key to understanding why relational databases are so much more powerful than separate, stand-alone data files stored in something like CSV.
Using facets
Facets are one of the most powerful Datasette features. They can help you take a table with thousands of rows and start quickly identifying interesting trends and patterns within that data.
I'll show some examples using the executive_terms table.
Facets can be applied in two ways: you can select an option from the ""suggested facets"" list, or you can select the ""Facet by this"" option from the cog menu next to each column.

Once selected, the facet interface is shown above the table, like in this example:

Each facet shows a list of the most common values for that column, with a total count number for each of those values.
These numbers will update as you further filter the data: if you select ""Democratic"" you will see just the rows matching that party, and the ""type"" facet will update to show you that there have been 21 presidentals terms and 19 vice presidental terms for that party.

Exercises

Apply the type, state and party facets to the legislator_terms table
Use them to filter just to Republican Senators from North Carolina (NC)
Figure out which state has had the highest number of senatorial terms held by a member of the Republican party. (solution if you get stuck)

Using filters
When you select a facet, you're applying a filter to the data. These are reflected in the boxes at the top of the page:

You don't have to use faceting for these - you can edit them directly.
To see Democratic Vice Presidents just for the 1800s, we can add a filter for rows where the 'start' column begins with '18':

Here's the result of that query.

Exercises
Using the legislator_terms table, find:

Every senatorial term served in Delaware (DE)
... that started in a year starting with 18
... and then facet by party to see which parties had the most senatorial terms

Sharing links
Every page within Datasette is designed to be shared. Copy and paste the page URL to share it with someone else.
This includes applied filters and facets - so if you find an interesting way to explore the data you can share that directly by sharing the URL.
Some examples:

Every senatorial term in the state of Texas
Every executive term held by a member of the Democratic-Republican party

Exercise: share some links!
Use the above exercises to find something interesting in the data. Copy and paste that URL to your notes, or share it with a friend, or post about it on Twitter.
Exporting data
If you can see it in Datasette, you can export the raw data. This is a fundamental principle of the project.
Look for the CSV and .json links on any page to export the data in those formats.

The ""Advanced export"" box provides additional options. Use ""download file"" to download the CSV file to your computer. 
The ""expand labels"" option will add an extra column with the label associated with any foreign key columns - in this example, it will add legislator_id_label to accompany the ID in legislator_id.
Additional formats can be enabled using plugins. The example instance is running the datasette-copyable plugin, which provides a `""copyable"" link which can be used to export the data in different formats using copy-and-paste!
Datasette can also be configured to allow users to download the entire database as a single file. You can find that legislators.db download option at the bottom of https://congress-legislators.datasettes.com/legislators - the entire database is only 5.2MB.
Exercises

Download a CSV file of Democratic Vice Presidents in the 1800s, using the ""download file"" checkbox in the ""Advanced export"" box.
Use the ""copyable"" option to copy and paste that data directly into Google Sheets, Excel or Numbers.

Data visualization via plugins
You may have noticed this already on the offices table, which shows an interactive map of all of the office locations:

You can click on clusters of markers to zoom in, and click on individual markers to see the details of the row.
It's fun to use these maps to spot obvious outliers - in this case, the office of representative Aumua Amata Radewagen in American Samoa instantly stands out.
This is another example of a Datasette plugin - in this case datasette-cluster-map. Plugins can add additional features to Datasette, such as visualizations or alternative export methods (see ""copyable"" described above).
For a larger example of a map visualization, take a look at this map of 33,000+ global power plants visualized using datasette-cluster-map.
Next steps
Now that you know how to explore tables using Datasette, the next covers how to use Datasette to learn SQL."
/tutorials/learn-sql,Learn SQL with Datasette,"Tutorials > Learn SQL with Datasette

Learn SQL with Datasette
This tutorial follows the tutorial on exploring data with Datasette and shows how you can use Datasette to start learning to write custom SQL queries of your own.
We will be using the same example database as that tutorial:

https://congress-legislators.datasettes.com/legislators

This database contains information about US Presidents, Vice Presidents and Members of Congress from 1789 to today. You can follow the previous tutorial to become familiar with the data.
SQL and SQLite
Datasette is software that runs on top of a SQL database. There are many different SQL database systems, such as MySQL, PostgreSQL, Microsoft SQL Server and Oracle. The database used by Datasette is called SQLite.
You may not realize it, but you already use SQLite every day: it is built into many popular applications, including Google Chrome and Firefox, and runs on laptops, iPhones, Android phones and all sorts of other smaller devices.
SQL stands for Structured Query Language - it is a text language for running queries against a database. Every database implements a slightly different dialect of SQL - for this tutorial I will try to keep to the subset of SQLite's dialect that is most likely to work across other databases as well.
View and edit SQL
Every table page in Datasette - such as this one - includes a ""View and edit SQL"" link that looks like this:

Click that link to see the SQL query that was used for the current page - including any filters that have been applied, in this case the filter for bio_gender = ""F"".
You can then edit that query to make it do something else!

Exercises
Start with that example, then:

Click ""Format SQL"" to tidy this up and make it easier to read and edit
Modify that query to return only the id and name columns
Get it to order by name instead of sorting by id
Change the p0 value to M to return male instead of female legislators
Try replacing the list of columns with select * from - a shortcut for all columns

The structure of a basic select query
Here's the formatted version of the above query:
select
  id,
  name,
  id_bioguide,
  id_govtrack,
  id_icpsr,
  id_wikipedia,
  id_wikidata,
  id_google_entity_id,
  name_first,
  name_last,
  bio_birthday,
  bio_gender,
  id_house_history,
  name_middle,
  name_nickname,
  id_ballotpedia,
  name_suffix,
  id_bioguide_previous,
  id_house_history_alternate,
  other_names,
  id_thomas,
  id_cspan,
  id_votesmart,
  id_lis,
  name_official_full,
  id_opensecrets,
  id_fec,
  id_maplight,
  leadership_roles,
  family
from
  legislators
where
  ""bio_gender"" = :p0
order by
  id
limit
  101

select
The select section specifies which columns you would like to return. Each column name is separated by a comma - but if you have a comma following the last column you you will get an error message.

near ""from"": syntax error

from
The from section specifies which table the records should be selected from - here we want the legislators table.
where
The where section adds filter conditions. These can be combined using and, for example this query will select just legislators who are male and have a Jr. name suffix:
where
  ""bio_gender"" = 'M'
  and ""name_suffix"" = 'Jr.'

This section is optional - if you do not include a where clause you will get back every row in the table.
order by
The optional order by clause specifies the order you woud like the rows to be returned in. This would order them alphabetically by name:
order by
  name

Or add desc to reverse the order. This query returns the youngest legislators:
select
  *
from
  legislators
order by
  bio_birthday desc

limit
The limit 101 clause limits the query to returning just the first 101 results. In most SQL databases omitting this will cause all results will be returned - but Datasette applies an additional limit of 1,000 (example here) to prevent large queries from causing performance issues.
Named parameters
In the previous tutorial we used filters to list presidental terms that occurred in the 1800s, by filtering on rows where the start column began with the string 18 and the type column equals prez.
Here's that example as a filtered table.

Clicking that ""View and edit SQL"" link (and then clicking ""Format SQL"") shows this query:

select
  rowid,
  type,
  start,
  [end],
  party,
  how,
  executive_id
from
  executive_terms
where
  ""start"" like :p0
  and ""type"" = :p1

The :p0 and :p1 fields here correspond to ""named parameters"" - they provide a safe way to pass values to a SQL query, without having to worry about SQL injection.
These field names are extracted from the query. If you change the where clause to look like this:
where
  ""start"" like :century
  and ""type"" = :type

Then fields called century and type will be displayed instead of p0 and p1.
Since they are form fields, you can change those values too. Try changing prez to viceprez to see the vice presidential terms for the 1800s instead.
If you weren't using named parameters, the where clause of the query could look like this instead:
where
  ""start"" like '18%'
  and ""type"" = 'prez'

String values like '18%' and 'prez' must be enclosed in single or double quotes (single quotes are preferred), if you are not using the :p0 syntax.
The double quotes around the column names here are optional - they are only required if the column has a name that might clash with an existing SQL keyword, such as select or where - or if the column name contains a space. So the following where clause would work the same way:
where
  start like '18%'
  and type = 'prez'

SQL LIKE queries
To find rows where the start column begins with the string 18 we use this where filter:
where
  ""start"" like '18%'

The SQL like operator applies wildcards to strings - % means ""match anything"" and _ (underscore) means ""match a single character"".
Exercise

Use a LIKE filter to find presidential terms that occurred in the 90s decade of any century - you'll need to use both _ and % for this. (solution)

SQL Joins
The executive_terms table has a column called executive_id which displays both a numeric ID and a link containing the name of the executive:

But... when you click the ""View and edit SQL"" link the resulting query returns just the ID, not the name.

A SQL join can be used to combine the data from multiple tables. Here's a query that uses a join to show the name pulled from the executives table:
select
  executive_terms.type,
  executive_terms.start,
  executive_terms.[end],
  executive_terms.party,
  executive_terms.how,
  executives.name
from
  executive_terms
  join executives
    on executive_terms.executive_id = executives.id

Try that here.

Since there are now two tables involved in the query, it's a good idea to include explicit table names as part of the select clause:
select
  executive_terms.type,
  executive_terms.start,
  executive_terms.[end],
  executive_terms.party,
  executive_terms.how,
  executives.name

If a column exists in only one of the two tables you can use its name without specifying the table, but this can quickly get confusing so it's better to always use the table names when you are executing a join.
The from clause is where the join is defined. This describes how the two tables should be combined together:
from
  executive_terms
  join executives
    on executive_terms.executive_id = executives.id

We are joining executive_terms to executives, using the fact that the executive_id column in executive_terms contains values from the id column in executives.
This is why IDs and foreign keys are important concepts to understand in SQL databases!
This kind of join is also known as an ""inner join"" - it is the most commonly used join. Other join types include outer joins and full joins, but those are beyond the scope of this tutorial.
GROUP BY / COUNT
A common operation in SQL is to ask for a count of the most popular values in a column.
Datasette exposes this capability in its faceting feature. Under the hood, that feature works by executing a group by / count query.
The following query answers the question: which party has had the most presidental terms?
select
  party,
  count(*)
from
  executive_terms
where
  type = 'prez'
group by
  party

Try that query here.

The select clause asks for the party column and the result of running a count(*). count() is a SQL aggregate function, which operates on a group of results. So we need to define a group.
The group by party clause at the end creates those groups. The combination of the two produces the desired result.
You can add this order by clause at the end to see the groups sorted from highest to lowest:
order by
  count(*) desc

Here's the result.
WHERE ... IN
SQL queries can be nested together in a number of interesting ways.
One of the most useful for ad-hoc data analysis is the where column in (select ...) pattern.
Let's build a query to see everyone who has been president and has also been vice president.
We'll start with a query to return the names of every vice president. This requires a join against executives, because the executive_terms table includes the presidential and vice presidential terms, but doesn't include the names of those individuals.
select
  executives.name
from
  executive_terms
  join executives
    on executive_terms.executive_id = executives.id
where
  type = 'viceprez'

This query returns 61 rows, including some duplicate rows for individuals who served more than one term as VP.
We could use select distinct to remove those duplicates if we wanted to.
If we want to see everyone who has been a president and ALSO been a vice-president, we can combine two queries together like this:
select
  distinct executives.name
from
  executive_terms
  join executives
    on executive_terms.executive_id = executives.id
where
  type = 'prez'
  and name in (
    select
      executives.name
    from
      executive_terms
      join executives
        on executive_terms.executive_id = executives.id
    where
      type = 'viceprez'
  )

This query returns 15 names, starting with John Adams and finishing with Joseph Biden.

⚠️ An earlier version of this example revealed a subtle bug: there were two separate individuals in the executives table with the same name, George Bush and George Bush! So even though the query appeared to return the correct results it actually contained a bug. This is a useful demonstration of how important it is to work with unique, de-duplicated identifiers where-ever possible, rather than assuming that things like names are unique.

Common Table Expressions
This is a more advanced SQL technique that I find myself using every day: it can make complex SQL queries much easier to write and understand.
Common Table Expressions, or CTEs, allow you to define a temporary alias for a select that lasts for the duration of the current query. You can then treat it as if it was a regular table.
Here's an example that creates two CTEs, one called presidents and one called vice_presidents, and then uses them to answer the earlier question about presidents who had also served as vice presidents:
with presidents as (
  select
    executives.name
  from
    executive_terms
    join executives
      on executive_terms.executive_id = executives.id
  where
    executive_terms.type = 'prez'
),
vice_presidents as (
  select
    executives.name
  from
    executive_terms
    join executives
      on executive_terms.executive_id = executives.id
  where
    executive_terms.type = 'viceprez'
)
select
  distinct name
from
  presidents
where name in vice_presidents

Try this CTE query here.
The two CTEs are defined using with alias_name as (select ...), second_alias as (select ...) - then the query that returns the final results is added on at the end.
Creating bookmarkable apps
Since every page on Datasette can be linked to (see Sharing links) and named parameters automatically add form input to the query page, you can combine these two features to create bookmarkable apps.
This is particularly useful for collaborating with others who don't yet know SQL. You can write SQL queries that solve problems for them, add parameters that they can customize and then send them the entire custom application as a link that they can bookmark.
Let's create an application that returns the president and vice president for any given date.
Here's the SQL query we will be using:
select
  executives.name,
  executive_terms.type,
  executive_terms.start,
  executive_terms.end,
  executive_terms.party
from
  executive_terms
  join executives
    on executive_terms.executive_id = executives.id
where
  start <= :date
  and end > :date

This query takes a :date parameter, which should be of the format yyyy-mm-dd - for example 2016-01-01.
It joins against the executives table to get their name, and then filters to rows where the term start is less than or equal to the specified date, and the end is greater than that date. We use <= for one of these to ensure there are no gaps or overlaps.
If we execute the query without a date it will return 0 results, but give us a form field to enter the date:

Entering a date - for example 2016-01-01 - returns the president and vice president for that date:

We've built an application! We can send this link to anyone right now and they'll be able to run that same query.
The SQL at the top of the page could feel a little intimidating though. That's what the ""hide"" link is for - clicking on that hides the SQL query, providing a new link that you can share that won't cause the user to scroll down past the SQL in order to interact with the query.
One last note: the application we have built also doubles up as an API. Add .json to the path portion of the URL, or click the .json link, to get back the data as JSON. Add &_shape=array to the URL to get back a more compact form of JSON, ideal for integrating with other applications.
Next steps
Fully mastering SQL can take years. This tutorial has attempted to cover SQL basics and introduce some more advanced techniques to help manage larger queries, but there's plenty more to learn. The SQLite SELECT documentation offers a comprehensive reference to SQL as understood by SQLite, but the best way to learn is to keep on trying new things against different data, looking things up as you go along."