tutorials
3 rows
This data as json, CSV (advanced)
path ▼ | title | body |
---|---|---|
/tutorials/clean-data | Cleaning data with sqlite-utils and Datasette | Tutorials > Cleaning data with sqlite-utils and Datasette Cleaning data with sqlite-utils and Datasette This tutorial will show you how to use the sqlite-utils command line tool import data into SQLite, clean it and then explore it with Datasette. The concepts in this tutorial are also demonstrated in this video of a talk I gave at the HYTRADBOI conference in April 2022. Start with some CSV data The Florida Fish and Wildlife Conservation Commission maintain a database of manatee carcass recovery locations, with details dating back to April 1974. You can request the latest copy as a CSV file from their website - or download this copy of the CSV: Manatee_Carcass_Recovery_Locations_in_Florida.csv generated on July 31st 2022. Install sqlite-utils Here are the installation instructions. If you are on a Mac with Homebrew installed you can run brew install sqlite-utils. Python users can run pip install sqlite-utils. Import the CSV into a SQLite database sqlite-utils is a command line tool, so you'll need to open up a terminal on your computer and change directory to the place where you saved the CSV file. You can create a new manatees.db SQLite database like this: sqlite-utils insert manatees.db locations \ Manatee_Carcass_Recovery_Locations_in_Florida.csv --csv -d This sqlite-utils insert command inserts data into a database table, creating the table if necessary. manatees.db is the SQLite database file we are using - this will be created if it doesn't exist yet. locations is the name of the table we are creating Manatee_Carcass_Recovery_Locations_in_Florida.csv is the CSV file we are importing --csv tells the tool to treat the incoming data as CSV data The -d option tells the tool to automatically detect the types of data. Without this, every column will be treated as text. With this option columns that have numeric data in will be used to populate numeric columns. Having created the database, we can use the schema command to view its schema: sqlite-utils schema manatees.db The output of that command looks… |
/tutorials/explore | Exploring a database with Datasette | Tutorials > Exploring a database with Datasette Exploring a database with Datasette This tutorial will teach you how to use Datasette to explore a new database. Datasette offers flexible tools for exploring data tables. It's always worth spending time familiarizing yourself with data in its raw, tabula form before thinking about ways to apply more sophisticated analysis or visualization. The congress-legislators data We'll be using an example database of Members of the United States Congress, 1789 to present. I built this example using data from the unitedstates/congress-legislators project on GitHub, maintained by Joshua Tauberer, Eric Mill and over 100 other contributors. Access the example in Datasette here: https://congress-legislators.datasettes.com/legislators Understanding tables The key to understanding relational databases such as SQLite (which we are using here), MySQL, PostgreSQL, Oracle and so on is to learn to think in tables. Database tables are like spreadsheets - they have columns and rows, and each row provides a value for every column. Unlike spreadsheets you can't merge cells together or apply formatting - a database table is a straight-forward two-dimensional grid. It's just the data. Each column in a table has a type. In SQLite these types can be one of text, integer, real (for floating point values) and blob (for binary data). The type of the column matters because it affects what happens when you sort the table by that column, or which mathematical operations you can use against those values. Browsing tables The tables in our example database are: legislators - 12,590 rows - individuals who have served as senators or representatives legislator_terms - 44,542 rows - terms served by legislators, back to 1789 executives - 80 rows - individuals who have served as President or Vice President executive_terms - 129 rows - terms served by executives offices - 1,192 rows - district officers maintained by current legislators social_media - 525 rows - social media accounts for current legislators… |
/tutorials/learn-sql | Learn SQL with Datasette | Tutorials > Learn SQL with Datasette Learn SQL with Datasette This tutorial follows the tutorial on exploring data with Datasette and shows how you can use Datasette to start learning to write custom SQL queries of your own. We will be using the same example database as that tutorial: https://congress-legislators.datasettes.com/legislators This database contains information about US Presidents, Vice Presidents and Members of Congress from 1789 to today. You can follow the previous tutorial to become familiar with the data. SQL and SQLite Datasette is software that runs on top of a SQL database. There are many different SQL database systems, such as MySQL, PostgreSQL, Microsoft SQL Server and Oracle. The database used by Datasette is called SQLite. You may not realize it, but you already use SQLite every day: it is built into many popular applications, including Google Chrome and Firefox, and runs on laptops, iPhones, Android phones and all sorts of other smaller devices. SQL stands for Structured Query Language - it is a text language for running queries against a database. Every database implements a slightly different dialect of SQL - for this tutorial I will try to keep to the subset of SQLite's dialect that is most likely to work across other databases as well. View and edit SQL Every table page in Datasette - such as this one - includes a "View and edit SQL" link that looks like this: Click that link to see the SQL query that was used for the current page - including any filters that have been applied, in this case the filter for bio_gender = "F". You can then edit that query to make it do something else! Exercises Start with that example, then: Click "Format SQL" to tidy this up and make it easier to read and edit Modify that query to return only the id and name columns Get it to order by name instead of sorting by id Change the p0 value to M to return male instead of female legislators Try replacing the list of columns with select * from - a shortcut for all columns The structure of a basic select q… |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [tutorials] ( [path] TEXT PRIMARY KEY, [title] TEXT, [body] TEXT );