id,node_id,name,full_name,private,owner,html_url,description,fork,created_at,updated_at,pushed_at,homepage,size,stargazers_count,watchers_count,language,has_issues,has_projects,has_downloads,has_wiki,has_pages,forks_count,archived,disabled,open_issues_count,license,topics,forks,open_issues,watchers,default_branch,permissions,temp_clone_token,organization,network_count,subscribers_count,readme,readme_html,allow_forking,visibility,is_template,template_repository,web_commit_signoff_required,has_discussions
195145678,MDEwOlJlcG9zaXRvcnkxOTUxNDU2Nzg=,sqlite-diffable,simonw/sqlite-diffable,0,9599,https://github.com/simonw/sqlite-diffable,Tools for dumping/loading a SQLite database to diffable directory structure,0,2019-07-04T00:58:46Z,2022-07-12T17:00:19Z,2022-08-18T22:49:29Z,,30,42,42,Python,1,1,1,1,0,3,0,0,3,apache-2.0,"[""datasette-io"", ""datasette-tool"", ""sqlite""]",3,3,42,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,,3,1,"# sqlite-diffable
[](https://pypi.org/project/sqlite-diffable/)
[](https://github.com/simonw/sqlite-diffable/releases)
[](https://github.com/simonw/sqlite-diffable/blob/main/LICENSE)
Tools for dumping/loading a SQLite database to diffable directory structure
## Installation
pip install sqlite-diffable
## Demo
The repository at [simonw/simonwillisonblog-backup](https://github.com/simonw/simonwillisonblog-backup) contains a backup of the database on my blog, https://simonwillison.net/ - created using this tool.
## Dumping a database
Given a SQLite database called `fixtures.db` containing a table `facetable`, the following will dump out that table to the `dump/` directory:
sqlite-diffable dump fixtures.db dump/ facetable
To dump out every table in that database, use `--all`:
sqlite-diffable dump fixtures.db dump/ --all
## Loading a database
To load a previously dumped database, run the following:
sqlite-diffable load restored.db dump/
This will show an error if any of the tables that are being restored already exist in the database file.
You can replace those tables (dropping them before restoring them) using the `--replace` option:
sqlite-diffable load restored.db dump/ --replace
## Converting to JSON objects
Table rows are stored in the `.ndjson` files as newline-delimited JSON arrays, like this:
```
[""a"", ""a"", ""a-a"", 63, null, 0.7364712141640124, ""$null""]
[""a"", ""b"", ""a-b"", 51, null, 0.6020187290499803, ""$null""]
```
Sometimes it can be more convenient to work with a list of JSON objects.
The `sqlite-diffable objects` command can read a `.ndjson` file and its accompanying `.metadata.json` file and output JSON objects to standard output:
sqlite-diffable objects fixtures.db dump/sortable.ndjson
The output of that command looks something like this:
```
{""pk1"": ""a"", ""pk2"": ""a"", ""content"": ""a-a"", ""sortable"": 63, ""sortable_with_nulls"": null, ""sortable_with_nulls_2"": 0.7364712141640124, ""text"": ""$null""}
{""pk1"": ""a"", ""pk2"": ""b"", ""content"": ""a-b"", ""sortable"": 51, ""sortable_with_nulls"": null, ""sortable_with_nulls_2"": 0.6020187290499803, ""text"": ""$null""}
```
Add `-o` to write that output to a file:
sqlite-diffable objects fixtures.db dump/sortable.ndjson -o output.txt
Add `--array` to output a JSON array of objects, as opposed to a newline-delimited file:
sqlite-diffable objects fixtures.db dump/sortable.ndjson --array
Output:
```
[
{""pk1"": ""a"", ""pk2"": ""a"", ""content"": ""a-a"", ""sortable"": 63, ""sortable_with_nulls"": null, ""sortable_with_nulls_2"": 0.7364712141640124, ""text"": ""$null""},
{""pk1"": ""a"", ""pk2"": ""b"", ""content"": ""a-b"", ""sortable"": 51, ""sortable_with_nulls"": null, ""sortable_with_nulls_2"": 0.6020187290499803, ""text"": ""$null""}
]
```
## Storage format
Each table is represented as two files. The first, `table_name.metadata.json`, contains metadata describing the structure of the table. For a table called `redirects_redirect` that file might look like this:
```json
{
""name"": ""redirects_redirect"",
""columns"": [
""id"",
""domain"",
""path"",
""target"",
""created""
],
""schema"": ""CREATE TABLE [redirects_redirect] (\n [id] INTEGER PRIMARY KEY,\n [domain] TEXT,\n [path] TEXT,\n [target] TEXT,\n [created] TEXT\n)""
}
```
It is an object with three keys: `name` is the name of the table, `columns` is an array of column strings and `schema` is the SQL schema text used for tha table.
The second file, `table_name.ndjson`, contains [newline-delimited JSON](http://ndjson.org/) for every row in the table. Each row is represented as a JSON array with items corresponding to each of the columns defined in the metadata.
That file for the `redirects_redirect.ndjson` table might look like this:
```
[1, ""feeds.simonwillison.net"", ""swn-everything"", ""https://simonwillison.net/atom/everything/"", ""2017-10-01T21:11:36.440537+00:00""]
[2, ""feeds.simonwillison.net"", ""swn-entries"", ""https://simonwillison.net/atom/entries/"", ""2017-10-01T21:12:32.478849+00:00""]
[3, ""feeds.simonwillison.net"", ""swn-links"", ""https://simonwillison.net/atom/links/"", ""2017-10-01T21:12:54.820729+00:00""]
```
","
sqlite-diffable
Tools for dumping/loading a SQLite database to diffable directory structure
Each table is represented as two files. The first, table_name.metadata.json, contains metadata describing the structure of the table. For a table called redirects_redirect that file might look like this:
It is an object with three keys: name is the name of the table, columns is an array of column strings and schema is the SQL schema text used for tha table.
The second file, table_name.ndjson, contains newline-delimited JSON for every row in the table. Each row is represented as a JSON array with items corresponding to each of the columns defined in the metadata.
That file for the redirects_redirect.ndjson table might look like this:
",1,public,0,,0,
213286752,MDEwOlJlcG9zaXRvcnkyMTMyODY3NTI=,pocket-to-sqlite,dogsheep/pocket-to-sqlite,0,53015001,https://github.com/dogsheep/pocket-to-sqlite,Create a SQLite database containing data from your Pocket account,0,2019-10-07T03:24:14Z,2022-08-21T21:11:59Z,2022-08-22T16:21:34Z,,20,63,63,Python,1,1,1,1,0,3,0,0,5,apache-2.0,"[""datasette"", ""datasette-io"", ""datasette-tool"", ""dogsheep"", ""pocket"", ""pocket-api"", ""sqlite""]",3,5,63,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,53015001,3,4,"# pocket-to-sqlite
[](https://pypi.org/project/pocket-to-sqlite/)
[](https://github.com/dogsheep/pocket-to-sqlite/releases)
[](https://github.com/dogsheep/pocket-to-sqlite/actions?query=workflow%3ATest)
[](https://github.com/dogsheep/pocket-to-sqlite/blob/main/LICENSE)
Create a SQLite database containing data from your [Pocket](https://getpocket.com/) account.
## How to install
$ pip install pocket-to-sqlite
## Usage
You will need to first obtain a valid OAuth token for your Pocket account. You can do this by running the `auth` command and following the prompts:
$ pocket-to-sqlite auth
Visit this page and sign in with your Pocket account:
https://getpocket.com/auth/author...
Once you have signed in there, hit to continue
Authentication tokens written to auth.json
Now you can fetch all of your items from Pocket like this:
$ pocket-to-sqlite fetch pocket.db
The first time you run this command it will fetch all of your items, and display a progress bar while it does it.
On subsequent runs it will only fetch new items.
You can force it to fetch everything from the beginning again using `--all`. Use `--silent` to disable the progress bar.
## Using with Datasette
The SQLite database produced by this tool is designed to be browsed using [Datasette](https://datasette.readthedocs.io/). Use the [datasette-render-timestamps](https://github.com/simonw/datasette-render-timestamps) plugin to improve the display of the timestamp values.
","
pocket-to-sqlite
Create a SQLite database containing data from your Pocket account.
How to install
$ pip install pocket-to-sqlite
Usage
You will need to first obtain a valid OAuth token for your Pocket account. You can do this by running the auth command and following the prompts:
$ pocket-to-sqlite auth
Visit this page and sign in with your Pocket account:
https://getpocket.com/auth/author...
Once you have signed in there, hit <enter> to continue
Authentication tokens written to auth.json
Now you can fetch all of your items from Pocket like this:
$ pocket-to-sqlite fetch pocket.db
The first time you run this command it will fetch all of your items, and display a progress bar while it does it.
On subsequent runs it will only fetch new items.
You can force it to fetch everything from the beginning again using --all. Use --silent to disable the progress bar.
Using with Datasette
The SQLite database produced by this tool is designed to be browsed using Datasette. Use the datasette-render-timestamps plugin to improve the display of the timestamp values.
",1,public,0,,0,
237321267,MDEwOlJlcG9zaXRvcnkyMzczMjEyNjc=,geojson-to-sqlite,simonw/geojson-to-sqlite,0,9599,https://github.com/simonw/geojson-to-sqlite,CLI tool for converting GeoJSON files to SQLite (with SpatiaLite),0,2020-01-30T22:51:05Z,2022-03-05T00:40:56Z,2022-04-13T23:39:25Z,,117,34,34,Python,1,1,1,1,0,3,0,0,4,apache-2.0,"[""datasette-io"", ""datasette-tool"", ""geojson"", ""gis"", ""sqlite""]",3,4,34,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,,3,3,"# geojson-to-sqlite
[](https://pypi.org/project/geojson-to-sqlite/)
[](https://github.com/simonw/geojson-to-sqlite/releases)
[](https://github.com/simonw/geojson-to-sqlite/actions?query=workflow%3ATest)
[](https://github.com/simonw/geojson-to-sqlite/blob/main/LICENSE)
CLI tool for converting GeoJSON to SQLite (optionally with SpatiaLite)
[RFC 7946: The GeoJSON Format](https://tools.ietf.org/html/rfc7946)
## How to install
$ pip install geojson-to-sqlite
## How to use
You can run this tool against a GeoJSON file like so:
$ geojson-to-sqlite my.db features features.geojson
This will load all of the features from the `features.geojson` file into a table called `features`.
Each row will have a `geometry` column containing the feature geometry, and columns for each of the keys found in any `properties` attached to those features. (To bundle all properties into a single JSON object, use the `--properties` flag.)
The table will be created the first time you run the command.
On subsequent runs you can use the `--alter` option to add any new columns that are missing from the table.
You can pass more than one GeoJSON file, in which case the contents of all of the files will be inserted into the same table.
If your features have an `""id""` property it will be used as the primary key for the table. You can also use `--pk=PROPERTY` with the name of a different property to use that as the primary key instead. If you don't want to use the `""id""` as the primary key (maybe it contains duplicate values) you can use `--pk ''` to specify no primary key.
Specifying a primary key also will allow you to upsert data into the rows instead of insert data into new rows.
If no primary key is specified, a SQLite `rowid` column will be used.
You can use `-` as the filename to import from standard input. For example:
$ curl https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_20m.json \
| geojson-to-sqlite my.db states - --pk GEO_ID
## Using with SpatiaLite
By default, the `geometry` column will contain JSON.
If you have installed the [SpatiaLite](https://www.gaia-gis.it/fossil/libspatialite/index) module for SQLite you can instead import the geometry into a geospatially indexed column.
You can do this using the `--spatialite` option, like so:
$ geojson-to-sqlite my.db features features.geojson --spatialite
The tool will search for the SpatiaLite module in the following locations:
- `/usr/lib/x86_64-linux-gnu/mod_spatialite.so`
- `/usr/local/lib/mod_spatialite.dylib`
If you have installed the module in another location, you can use the `--spatialite_mod=xxx` option to specify where:
$ geojson-to-sqlite my.db features features.geojson \
--spatialite_mod=/usr/lib/mod_spatialite.dylib
You can create a SpatiaLite spatial index on the `geometry` column using the `--spatial-index` option:
$ geojson-to-sqlite my.db features features.geojson --spatial-index
Using this option implies `--spatialite` so you do not need to add that.
## Streaming large datasets
For large datasets, consider using newline-delimited JSON to stream features into the database without loading the entire feature collection into memory.
For example, to load a day of earthquake reports from USGS:
$ geojson-to-sqlite quakes.db quakes tests/quakes.ndjson \
--nl --pk=id --spatialite
When using newline-delimited JSON, tables will also be created from the first feature, instead of guessing types based on the first 100 features.
If you want to use a larger subset of your data to guess column types (for example, if some fields are inconsistent) you can use [fiona](https://fiona.readthedocs.io/en/latest/cli.html) to collect features into a single collection.
$ head tests/quakes.ndjson | fio collect | \
geojson-to-sqlite quakes.db quakes - --spatialite
This will take the first 10 lines from `tests/quakes.ndjson`, pass them to `fio collect`, which turns them into a single feature collection, and pass that, in turn, to `geojson-to-sqlite`.
## Using this with Datasette
Databases created using this tool can be explored and published using [Datasette](https://datasette.readthedocs.io/).
The Datasette documentation includes a section on [how to use it to browse SpatiaLite databases](https://datasette.readthedocs.io/en/stable/spatialite.html).
The [datasette-leaflet-geojson](https://datasette.io/plugins/datasette-leaflet-geojson) plugin can be used to visualize columns containing GeoJSON geometries on a [Leaflet](https://leafletjs.com/) map.
If you are using SpatiaLite you will need to output the geometry as GeoJSON in order for that plugin to work. You can do that using the SpaitaLite `AsGeoJSON()` function - something like this:
```sql
select rowid, AsGeoJSON(geometry) from mytable limit 10
```
The [datasette-geojson-map](https://datasette.io/plugins/datasette-geojson-map) is an alternative plugin which will automatically render SpatiaLite geometries as a Leaflet map on the corresponding table page, without needing you to call `AsGeoJSON(geometry)`.
","
geojson-to-sqlite
CLI tool for converting GeoJSON to SQLite (optionally with SpatiaLite)
You can run this tool against a GeoJSON file like so:
$ geojson-to-sqlite my.db features features.geojson
This will load all of the features from the features.geojson file into a table called features.
Each row will have a geometry column containing the feature geometry, and columns for each of the keys found in any properties attached to those features. (To bundle all properties into a single JSON object, use the --properties flag.)
The table will be created the first time you run the command.
On subsequent runs you can use the --alter option to add any new columns that are missing from the table.
You can pass more than one GeoJSON file, in which case the contents of all of the files will be inserted into the same table.
If your features have an ""id"" property it will be used as the primary key for the table. You can also use --pk=PROPERTY with the name of a different property to use that as the primary key instead. If you don't want to use the ""id"" as the primary key (maybe it contains duplicate values) you can use --pk '' to specify no primary key.
Specifying a primary key also will allow you to upsert data into the rows instead of insert data into new rows.
If no primary key is specified, a SQLite rowid column will be used.
You can use - as the filename to import from standard input. For example:
By default, the geometry column will contain JSON.
If you have installed the SpatiaLite module for SQLite you can instead import the geometry into a geospatially indexed column.
You can do this using the --spatialite option, like so:
$ geojson-to-sqlite my.db features features.geojson --spatialite
The tool will search for the SpatiaLite module in the following locations:
/usr/lib/x86_64-linux-gnu/mod_spatialite.so
/usr/local/lib/mod_spatialite.dylib
If you have installed the module in another location, you can use the --spatialite_mod=xxx option to specify where:
$ geojson-to-sqlite my.db features features.geojson \
--spatialite_mod=/usr/lib/mod_spatialite.dylib
You can create a SpatiaLite spatial index on the geometry column using the --spatial-index option:
$ geojson-to-sqlite my.db features features.geojson --spatial-index
Using this option implies --spatialite so you do not need to add that.
Streaming large datasets
For large datasets, consider using newline-delimited JSON to stream features into the database without loading the entire feature collection into memory.
For example, to load a day of earthquake reports from USGS:
When using newline-delimited JSON, tables will also be created from the first feature, instead of guessing types based on the first 100 features.
If you want to use a larger subset of your data to guess column types (for example, if some fields are inconsistent) you can use fiona to collect features into a single collection.
This will take the first 10 lines from tests/quakes.ndjson, pass them to fio collect, which turns them into a single feature collection, and pass that, in turn, to geojson-to-sqlite.
Using this with Datasette
Databases created using this tool can be explored and published using Datasette.
If you are using SpatiaLite you will need to output the geometry as GeoJSON in order for that plugin to work. You can do that using the SpaitaLite AsGeoJSON() function - something like this:
select rowid, AsGeoJSON(geometry) from mytable limit10
The datasette-geojson-map is an alternative plugin which will automatically render SpatiaLite geometries as a Leaflet map on the corresponding table page, without needing you to call AsGeoJSON(geometry).