name,summary,classifiers,description,author,author_email,description_content_type,home_page,keywords,license,maintainer,maintainer_email,package_url,platform,project_url,project_urls,release_url,requires_dist,requires_python,version,yanked,yanked_reason
airtable-export,Export Airtable data to files on disk,[],"# airtable-export

[![PyPI](https://img.shields.io/pypi/v/airtable-export.svg)](https://pypi.org/project/airtable-export/)
[![Changelog](https://img.shields.io/github/v/release/simonw/airtable-export?include_prereleases&label=changelog)](https://github.com/simonw/airtable-export/releases)
[![Tests](https://github.com/simonw/airtable-export/workflows/Test/badge.svg)](https://github.com/simonw/airtable-export/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/airtable-export/blob/master/LICENSE)

Export Airtable data to files on disk

## Installation

Install this tool using `pip`:

    $ pip install airtable-export

## Usage

You will need to know the following information:

- Your Airtable base ID - this is a string starting with `app...`
- Your Airtable API key - this is a string starting with `key...`
- The names of each of the tables that you wish to export

You can export all of your data to a folder called `export/` by running the following:

    airtable-export export base_id table1 table2 --key=key

This example would create two files: `export/table1.yml` and `export/table2.yml`.

Rather than passing the API key using the `--key` option you can set it as an environment variable called `AIRTABLE_KEY`.

## Export options

By default the tool exports your data as YAML.

You can also export as JSON or as [newline delimited JSON](http://ndjson.org/) using the `--json` or `--ndjson` options:

    airtable-export export base_id table1 table2 --key=key --ndjson

You can pass multiple format options at once. This command will create a `.json`, `.yml` and `.ndjson` file for each exported table:

    airtable-export export base_id table1 table2 \
        --key=key --ndjson --yaml --json

### SQLite database export

You can export tables to a SQLite database file using the `--sqlite database.db` option:

    airtable-export export base_id table1 table2 \
        --key=key --sqlite database.db

This can be combined with other format options. If you only specify `--sqlite` the export directory argument will be ignored.

The SQLite database will have a table created for each table you export. Those tables will have a primary key column called `airtable_id`.

If you run this command against an existing SQLite database records with matching primary keys will be over-written by new records from the export.

## Request options

By default the tool uses [python-httpx](https://www.python-httpx.org)'s default configurations.

You can override the `user-agent` using the `--user-agent` option:

    airtable-export export base_id table1 table2 --key=key --user-agent ""Airtable Export Robot""

You can override the [timeout during a network read operation](https://www.python-httpx.org/advanced/#fine-tuning-the-configuration) using the `--http-read-timeout` option. If not set, this defaults to 5s.

    airtable-export export base_id table1 table2 --key=key --http-read-timeout 60

## Running this using GitHub Actions

[GitHub Actions](https://github.com/features/actions) is GitHub's workflow automation product. You can use it to run `airtable-export` in order to back up your Airtable data to a GitHub repository. Doing this gives you a visible commit history of changes you make to your Airtable data - like [this one](https://github.com/natbat/rockybeaches/commits/main/airtable).

To run this for your own Airtable database you'll first need to add the following secrets to your GitHub repository:

<dl>
  <dt>AIRTABLE_BASE_ID</dt>
  <dd>The base ID, a string beginning `app...`</dd>
  <dt>AIRTABLE_KEY</dt>
  <dd>Your Airtable API key</dd>
  <dt>AIRTABLE_TABLES</dt>
  <dd>A space separated list of the Airtable tables that you want to backup. If any of these contain spaces you will need to enclose them in single quotes, e.g. <samp>'My table with spaces in the name' OtherTableWithNoSpaces</samp></dd>
</dl>

Once you have set those secrets, add the following as a file called `.github/workflows/backup-airtable.yml`:
```yaml
name: Backup Airtable

on:
  workflow_dispatch:
  schedule:
  - cron: '32 0 * * *'

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - name: Check out repo
      uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: 3.8
    - uses: actions/cache@v2
      name: Configure pip caching
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-pip-
        restore-keys: |
          ${{ runner.os }}-pip-
    - name: Install airtable-export
      run: |
        pip install airtable-export
    - name: Backup Airtable to backups/
      env:
        AIRTABLE_BASE_ID: ${{ secrets.AIRTABLE_BASE_ID }}
        AIRTABLE_KEY: ${{ secrets.AIRTABLE_KEY }}
        AIRTABLE_TABLES: ${{ secrets.AIRTABLE_TABLES }}
      run: |-
        airtable-export backups $AIRTABLE_BASE_ID $AIRTABLE_TABLES -v
    - name: Commit and push if it changed
      run: |-
        git config user.name ""Automated""
        git config user.email ""actions@users.noreply.github.com""
        git add -A
        timestamp=$(date -u)
        git commit -m ""Latest data: ${timestamp}"" || exit 0
        git push
```
This will run once a day (at 32 minutes past midnight UTC) and will also run if you manually click the ""Run workflow"" button, see [GitHub Actions: Manual triggers with workflow_dispatch](https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/).

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd airtable-export
    python -mvenv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and tests:

    pip install -e '.[test]'

To run the tests:

    pytest


",Simon Willison,,text/markdown,https://github.com/simonw/airtable-export,,"Apache License, Version 2.0",,,https://pypi.org/project/airtable-export/,,https://pypi.org/project/airtable-export/,"{""CI"": ""https://github.com/simonw/airtable-export/actions"", ""Changelog"": ""https://github.com/simonw/airtable-export/releases"", ""Homepage"": ""https://github.com/simonw/airtable-export"", ""Issues"": ""https://github.com/simonw/airtable-export/issues""}",https://pypi.org/project/airtable-export/0.7.1/,"[""click"", ""PyYAML"", ""httpx"", ""sqlite-utils"", ""pytest ; extra == 'test'"", ""pytest-mock ; extra == 'test'""]",,0.7.1,0,
csv-diff,Python CLI tool and library for diffing CSV and JSON files,"[""Development Status :: 4 - Beta"", ""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7""]","# csv-diff

[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)
[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)
[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)

Tool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco’s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.

## Installation

    pip install csv-diff

## Usage

Consider two CSV files:

`one.csv`

    id,name,age
    1,Cleo,4
    2,Pancakes,2

`two.csv`

    id,name,age
    1,Cleo,5
    3,Bailey,1

`csv-diff` can show a human-readable summary of differences between the files:

    $ csv-diff one.csv two.csv --key=id
    1 row changed, 1 row added, 1 row removed

    1 row changed

      Row 1
        age: ""4"" => ""5""

    1 row added

      id: 3
      name: Bailey
      age: 1

    1 row removed

      id: 2
      name: Pancakes
      age: 2

The `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.

You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.

Use `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:

    % csv-diff one.csv two.csv --key=id --show-unchanged
    1 row changed

      id: 1
        age: ""4"" => ""5""

        Unchanged:
          name: ""Cleo""

You can use the `--json` option to get a machine-readable difference:

    $ csv-diff one.csv two.csv --key=id --json
    {
        ""added"": [
            {
                ""id"": ""3"",
                ""name"": ""Bailey"",
                ""age"": ""1""
            }
        ],
        ""removed"": [
            {
                ""id"": ""2"",
                ""name"": ""Pancakes"",
                ""age"": ""2""
            }
        ],
        ""changed"": [
            {
                ""key"": ""1"",
                ""changes"": {
                    ""age"": [
                        ""4"",
                        ""5""
                    ]
                }
            }
        ],
        ""columns_added"": [],
        ""columns_removed"": []
    }

## As a Python library

You can also import the Python library into your own code like so:

    from csv_diff import load_csv, compare
    diff = compare(
        load_csv(open(""one.csv""), key=""id""),
        load_csv(open(""two.csv""), key=""id"")
    )

`diff` will now contain the same data structure as the output in the `--json` example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.


",Simon Willison,,text/markdown,https://github.com/simonw/csv-diff,,"Apache License, Version 2.0",,,https://pypi.org/project/csv-diff/,,https://pypi.org/project/csv-diff/,"{""Homepage"": ""https://github.com/simonw/csv-diff""}",https://pypi.org/project/csv-diff/1.1/,"[""click"", ""dictdiffer"", ""pytest ; extra == 'test'""]",,1.1,0,
datasette-clone,Create a local copy of database files from a Datasette instance,[],"# datasette-clone

[![PyPI](https://img.shields.io/pypi/v/datasette-clone.svg)](https://pypi.org/project/datasette-clone/)
[![CircleCI](https://circleci.com/gh/simonw/datasette-clone.svg?style=svg)](https://circleci.com/gh/simonw/datasette-clone)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette-clone/blob/master/LICENSE)

Create a local copy of database files from a Datasette instance.

See [datasette-clone](https://simonwillison.net/2020/Apr/14/datasette-clone/) on my blog for background on this project.

## How to install

    $ pip install datasette-clone

## Usage

This only works against Datasette instances running immutable databases (with the `-i` option). Databases published using the `datasette publish` command should be compatible with this tool.

To download copies of all `.db` files from an instance, run:

    datasette-clone https://latest.datasette.io

You can provide an optional second argument to specify a directory:

    datasette-clone https://latest.datasette.io /tmp/here-please

The command stores its own copy of a `databases.json` manifest and uses it to only download databases that have changed the next time you run the command.

It also stores a copy of the instance's `metadata.json` to ensure you have a copy of any source and licensing information for the downloaded databases.

If your instance is protected by an API token, you can use `--token` to provide it:

    datasette-clone https://latest.datasette.io --token=xyz

For verbose output showing what the tool is doing, use `-v`.


",Simon Willison,,text/markdown,https://github.com/simonw/datasette-clone,,"Apache License, Version 2.0",,,https://pypi.org/project/datasette-clone/,,https://pypi.org/project/datasette-clone/,"{""Homepage"": ""https://github.com/simonw/datasette-clone""}",https://pypi.org/project/datasette-clone/0.5/,"[""requests"", ""click"", ""pytest ; extra == 'test'"", ""requests-mock ; extra == 'test'""]",,0.5,0,
db-to-sqlite,CLI tool for exporting tables or queries from any SQL database to a SQLite file,"[""Development Status :: 3 - Alpha"", ""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7"", ""Topic :: Database""]","# db-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/db-to-sqlite.svg)](https://pypi.python.org/pypi/db-to-sqlite)
[![Changelog](https://img.shields.io/github/v/release/simonw/db-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/db-to-sqlite/releases)
[![Tests](https://github.com/simonw/db-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/db-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/db-to-sqlite/blob/main/LICENSE)

CLI tool for exporting tables or queries from any SQL database to a SQLite file.

## Installation

Install from PyPI like so:

    pip install db-to-sqlite

If you want to use it with MySQL, you can install the extra dependency like this:

    pip install 'db-to-sqlite[mysql]'

Installing the `mysqlclient` library on OS X can be tricky - I've found [this recipe](https://gist.github.com/simonw/90ac0afd204cd0d6d9c3135c3888d116) to work (run that before installing `db-to-sqlite`).

For PostgreSQL, use this:

    pip install 'db-to-sqlite[postgresql]'

## Usage

    Usage: db-to-sqlite [OPTIONS] CONNECTION PATH

      Load data from any database into SQLite.

      PATH is a path to the SQLite file to create, e.c. /tmp/my_database.db

      CONNECTION is a SQLAlchemy connection string, for example:

          postgresql://localhost/my_database
          postgresql://username:passwd@localhost/my_database

          mysql://root@localhost/my_database
          mysql://username:passwd@localhost/my_database

      More: https://docs.sqlalchemy.org/en/13/core/engines.html#database-urls

    Options:
      --version                     Show the version and exit.
      --all                         Detect and copy all tables
      --table TEXT                  Specific tables to copy
      --skip TEXT                   When using --all skip these tables
      --redact TEXT...              (table, column) pairs to redact with ***
      --sql TEXT                    Optional SQL query to run
      --output TEXT                 Table in which to save --sql query results
      --pk TEXT                     Optional column to use as a primary key
      --index-fks / --no-index-fks  Should foreign keys have indexes? Default on
      -p, --progress                Show progress bar
      --postgres-schema TEXT        PostgreSQL schema to use
      --help                        Show this message and exit.

For example, to save the content of the `blog_entry` table from a PostgreSQL database to a local file called `blog.db` you could do this:

    db-to-sqlite ""postgresql://localhost/myblog"" blog.db \
        --table=blog_entry

You can specify `--table` more than once.

You can also save the data from all of your tables, effectively creating a SQLite copy of your entire database. Any foreign key relationships will be detected and added to the SQLite database. For example:

    db-to-sqlite ""postgresql://localhost/myblog"" blog.db \
        --all

When running `--all` you can specify tables to skip using `--skip`:

    db-to-sqlite ""postgresql://localhost/myblog"" blog.db \
        --all \
        --skip=django_migrations

If you want to save the results of a custom SQL query, do this:

    db-to-sqlite ""postgresql://localhost/myblog"" output.db \
        --output=query_results \
        --sql=""select id, title, created from blog_entry"" \
        --pk=id

The `--output` option specifies the table that should contain the results of the query.

## Using db-to-sqlite with PostgreSQL schemas

If the tables you want to copy from your PostgreSQL database aren't in the default schema, you can specify an alternate one with the `--postgres-schema` option:

    db-to-sqlite ""postgresql://localhost/myblog"" blog.db \
        --all \
        --postgres-schema my_schema

## Using db-to-sqlite with Heroku Postgres

If you run an application on [Heroku](https://www.heroku.com/) using their [Postgres database product](https://www.heroku.com/postgres), you can use the `heroku config` command to access a compatible connection string:

    $ heroku config --app myappname | grep HEROKU_POSTG
    HEROKU_POSTGRESQL_OLIVE_URL: postgres://username:password@ec2-xxx-xxx-xxx-x.compute-1.amazonaws.com:5432/dbname

You can pass this to `db-to-sqlite` to create a local SQLite database with the data from your Heroku instance.

You can even do this using a bash one-liner:

    $ db-to-sqlite $(heroku config --app myappname | grep HEROKU_POSTG | cut -d: -f 2-) \
        /tmp/heroku.db --all -p
    1/23: django_migrations
    ...
    17/23: blog_blogmark
    [####################################]  100%
    ...

## Related projects

* [Datasette](https://github.com/simonw/datasette): A tool for exploring and publishing data. Works great with SQLite files generated using `db-to-sqlite`.
* [sqlite-utils](https://github.com/simonw/sqlite-utils): Python CLI utility and library for manipulating SQLite databases.
* [csvs-to-sqlite](https://github.com/simonw/csvs-to-sqlite): Convert CSV files into a SQLite database.

## Development

To set up this tool locally, first checkout the code. Then create a new virtual environment:

    cd db-to-sqlite
    python3 -mvenv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and test dependencies:

    pip install -e '.[test]'

To run the tests:

    pytest

This will skip tests against MySQL or PostgreSQL if you do not have their additional dependencies installed.

You can install those extra dependencies like so:

    pip install -e '.[test_mysql,test_postgresql]'

You can alternative use `pip install psycopg2-binary` if you cannot install the `psycopg2` dependency used by the `test_postgresql` extra.

See [Running a MySQL server using Homebrew](https://til.simonwillison.net/homebrew/mysql-homebrew) for tips on running the tests against MySQL on macOS, including how to install the `mysqlclient` dependency.

The PostgreSQL and MySQL tests default to expecting to run against servers on localhost. You can use environment variables to point them at different test database servers:

- `MYSQL_TEST_DB_CONNECTION` - defaults to `mysql://root@localhost/test_db_to_sqlite`
- `POSTGRESQL_TEST_DB_CONNECTION` - defaults to `postgresql://localhost/test_db_to_sqlite`

The database you indicate in the environment variable - `test_db_to_sqlite` by default - will be deleted and recreated on every test run.


",Simon Willison,,text/markdown,https://github.com/simonw/db-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/db-to-sqlite/,,https://pypi.org/project/db-to-sqlite/,"{""CI"": ""https://travis-ci.com/simonw/db-to-sqlite"", ""Changelog"": ""https://github.com/simonw/db-to-sqlite/releases"", ""Documentation"": ""https://github.com/simonw/db-to-sqlite/blob/main/README.md"", ""Homepage"": ""https://github.com/simonw/db-to-sqlite"", ""Issues"": ""https://github.com/simonw/db-to-sqlite/issues"", ""Source code"": ""https://github.com/simonw/db-to-sqlite""}",https://pypi.org/project/db-to-sqlite/1.4/,"[""sqlalchemy"", ""sqlite-utils (>=2.9.1)"", ""click"", ""mysqlclient ; extra == 'mysql'"", ""psycopg2 ; extra == 'postgresql'"", ""pytest ; extra == 'test'"", ""pytest ; extra == 'test_mysql'"", ""mysqlclient ; extra == 'test_mysql'"", ""pytest ; extra == 'test_postgresql'"", ""psycopg2 ; extra == 'test_postgresql'""]",,1.4,0,
dbf-to-sqlite,"CLCLI tool for converting DBF files (dBase, FoxPro etc) to SQLite","[""Development Status :: 3 - Alpha"", ""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7"", ""Topic :: Database""]","# dbf-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/dbf-to-sqlite.svg)](https://pypi.python.org/pypi/dbf-to-sqlite)
[![Travis CI](https://travis-ci.com/simonw/dbf-to-sqlite.svg?branch=master)](https://travis-ci.com/simonw/dbf-to-sqlite)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/dbf-to-sqlite/blob/master/LICENSE)


CLI tool for converting DBF files (dBase, FoxPro etc) to SQLite.

    $ dbf-to-sqlite --help
    Usage: dbf-to-sqlite [OPTIONS] DBF_PATHS... SQLITE_DB

      Convert DBF files (dBase, FoxPro etc) to SQLite

      https://github.com/simonw/dbf-to-sqlite

    Options:
      --version      Show the version and exit.
      --table TEXT   Table name to use (only valid for single files)
      -v, --verbose  Show what's going on
      --help         Show this message and exit.

Example usage:

    $ dbf-to-sqlite *.DBF database.db

This will create a new SQLite database called `database.db` containing one table for each of the `DBF` files in the current directory.

Looking for DBF files to try this out on? Try downloading the [Himalayan Database](http://himalayandatabase.com/) of all expeditions that have climbed in the Nepal Himalaya.


",Simon Willison,,text/markdown,https://github.com/simonw/dbf-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/dbf-to-sqlite/,,https://pypi.org/project/dbf-to-sqlite/,"{""Homepage"": ""https://github.com/simonw/dbf-to-sqlite""}",https://pypi.org/project/dbf-to-sqlite/0.1/,"[""dbf (==0.97.11)"", ""click"", ""sqlite-utils""]",,0.1,0,
dogsheep-beta,Build a search index across content from multiple SQLite database tables and run faceted searches against it using Datasette,[],"# dogsheep-beta

[![PyPI](https://img.shields.io/pypi/v/dogsheep-beta.svg)](https://pypi.org/project/dogsheep-beta/)
[![Changelog](https://img.shields.io/github/v/release/dogsheep/beta?include_prereleases&label=changelog)](https://github.com/dogsheep/beta/releases)
[![Tests](https://github.com/dogsheep/beta/workflows/Test/badge.svg)](https://github.com/dogsheep/beta/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/dogsheep/beta/blob/main/LICENSE)

Build a search index across content from multiple SQLite database tables and run faceted searches against it using Datasette

## Example

A live example of this plugin is running at https://datasette.io/-/beta - configured using [this YAML file](https://github.com/simonw/datasette.io/blob/main/templates/dogsheep-beta.yml).

Read more about how this example works in [Building a search engine for datasette.io](https://simonwillison.net/2020/Dec/19/dogsheep-beta/).

## Installation

Install this tool like so:

    $ pip install dogsheep-beta

## Usage

Run the indexer using the `dogsheep-beta` command-line tool:

    $ dogsheep-beta index dogsheep.db config.yml

The `config.yml` file contains details of the databases and document types that should be indexed:

```yaml
twitter.db:
    tweets:
        sql: |-
            select
                tweets.id as key,
                'Tweet by @' || users.screen_name as title,
                tweets.created_at as timestamp,
                tweets.full_text as search_1
            from tweets join users on tweets.user = users.id
    users:
        sql: |-
            select
                id as key,
                name || ' @' || screen_name as title,
                created_at as timestamp,
                description as search_1
            from users
```

This will create a `search_index` table in the `dogsheep.db` database populated by data from those SQL queries.

By default the search index that this tool creates will be configured for Porter stemming. This means that searches for words like `run` will match documents containing `runs` or `running`.

If you don't want to use Porter stemming, use the `--tokenize none` option:

    $ dogsheep-beta index dogsheep.db config.yml --tokenize none

You can pass other SQLite tokenize argumenst here, see [the SQLite FTS tokenizers documentation](https://www.sqlite.org/fts5.html#tokenizers).

## Columns

The columns that can be returned by our query are:

- `key` - a unique (within that type) primary key
- `title` - the title for the item
- `timestamp` - an ISO8601 timestamp, e.g. `2020-09-02T21:00:21`
- `search_1` - a larger chunk of text to be included in the search index
- `category` - an integer category ID, see below
- `is_public` - an integer (0 or 1, defaults to 0 if not set) specifying if this is public or not

Public records are things like your public tweets, blog posts and GitHub commits.

## Categories

Indexed items can be assigned a category. Categories are integers that correspond to records in the `categories` table, which defaults to containing the following:

|   id | name       |
|------|------------|
|    1 | created    |
|    2 | saved      |
|    3 | received   |

`created` is for items that have been created by the Dogsheep instance owner.

`saved` is for items that they have saved, liked or favourited.

`received` is for items that have been specifically sent to them by other people - incoming emails or direct messages for example.

## Datasette plugin

Run `datasette install dogsheep-beta` (or use `pip install dogsheep-beta` in the same environment as Datasette) to install the Dogsheep Beta Datasette plugin.

Once installed, a custom search interface will be made available at `/-/beta`. You can use this interface to execute searches.

The Datasette plugin has some configuration options. You can set these by adding the following to your `metadata.json` configuration file:

```json
{
    ""plugins"": {
        ""dogsheep-beta"": {
            ""database"": ""beta"",
            ""config_file"": ""dogsheep-beta.yml"",
            ""template_debug"": true
        }
    }
}
```
The configuration settings for the plugin are:
- `database` - the database file that contains your search index. If the file is `beta.db` you should set `database` to `beta`.
- `config_file` - the YAML file containing your Dogsheep Beta configuration.
- `template_debug` - set this to `true` to enable debugging output if errors occur in your custom templates, see below.

## Custom results display

Each indexed item type can define custom display HTML as part of the `config.yml` file. It can do this using a `display` key containing a fragment of Jinja template, and optionally a `display_sql` key with extra SQL to execute to fetch the data to display.

Here's how to define a custom display template for a tweet:

```yaml
twitter.db:
    tweets:
        sql: |-
            select
                tweets.id as key,
                'Tweet by @' || users.screen_name as title,
                tweets.created_at as timestamp,
                tweets.full_text as search_1
            from tweets join users on tweets.user = users.id
        display: |-
            <p>{{ title }} - tweeted at {{ timestamp }}</p>
            <blockquote>{{ search_1 }}</blockquote>
```
This example reuses the value that were stored in the `search_index` table when the indexing query was run.

To load in extra values to display in the template, use a `display_sql` query like this:

```yaml
twitter.db:
    tweets:
        sql: |-
            select
                tweets.id as key,
                'Tweet by @' || users.screen_name as title,
                tweets.created_at as timestamp,
                tweets.full_text as search_1
            from tweets join users on tweets.user = users.id
        display_sql: |-
            select
                users.screen_name,
                tweets.full_text,
                tweets.created_at
            from
                tweets join users on tweets.user = users.id
            where
                tweets.id = :key
        display: |-
            <p>{{ display.screen_name }} - tweeted at {{ display.created_at }}</p>
            <blockquote>{{ display.full_text }}</blockquote>
```
The `display_sql` query will be executed for every search result, passing the key value from the `search_index` table as the `:key` parameter and the user's search term as the `:q` parameter.

This performs well because [many small queries are efficient in SQLite](https://www.sqlite.org/np1queryprob.html).

If an error occurs while rendering one of your templates the search results page will return a 500 error. You can use the `template_debug` configuration setting described above to instead output debugging information for the search results item that experienced the error.

## Displaying maps

This plugin will eventually include a number of useful shortcuts for rendering interesting content.

The first available shortcut is for displaying maps. Make your custom content output something like this:

```html
<div
    data-map-latitude=""{{ display.latitude }}""
    data-map-longitude=""{{ display.longitude }}""
    style=""display: none; float: right; width: 250px; height: 200px; background-color: #ccc;""
></div>
```
JavaScript on the page will look for any elements with `data-map-latitude` and `data-map-longitude` and, if it finds any, will load Leaflet and convert those elements into maps centered on that location. The default zoom level will be 12, or you can set a `data-map-zoom` attribute to customize this.

## Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

    cd dogsheep-beta
    python3 -mvenv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and tests:

    pip install -e '.[test]'

To run the tests:

    pytest


",Simon Willison,,text/markdown,https://github.com/dogsheep/beta,,"Apache License, Version 2.0",,,https://pypi.org/project/dogsheep-beta/,,https://pypi.org/project/dogsheep-beta/,"{""CI"": ""https://github.com/dogsheep/beta/actions"", ""Changelog"": ""https://github.com/dogsheep/beta/releases"", ""Homepage"": ""https://github.com/dogsheep/beta"", ""Issues"": ""https://github.com/dogsheep/beta/issues""}",https://pypi.org/project/dogsheep-beta/0.10.2/,"[""datasette (>=0.50.2)"", ""click"", ""PyYAML"", ""sqlite-utils (>=3.0)"", ""pytest ; extra == 'test'"", ""pytest-asyncio ; extra == 'test'"", ""httpx ; extra == 'test'"", ""beautifulsoup4 ; extra == 'test'"", ""html5lib ; extra == 'test'""]",,0.10.2,0,
download-tiles,Download map tiles and store them in an MBTiles database,[],"# download-tiles

[![PyPI](https://img.shields.io/pypi/v/download-tiles.svg)](https://pypi.org/project/download-tiles/)
[![Changelog](https://img.shields.io/github/v/release/simonw/download-tiles?include_prereleases&label=changelog)](https://github.com/simonw/download-tiles/releases)
[![Tests](https://github.com/simonw/download-tiles/workflows/Test/badge.svg)](https://github.com/simonw/download-tiles/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/download-tiles/blob/master/LICENSE)

Download map tiles and store them in an MBTiles database

## Installation

Install this tool using `pip`:

    $ pip install download-tiles

## Usage

This tool downloads tiles from a specified [TMS (Tile Map Server)](https://wiki.openstreetmap.org/wiki/TMS) server for a specified bounding box and range of zoom levels and stores those tiles in a MBTiles SQLite database. It is a command-line wrapper around the [Landez](https://github.com/makinacorpus/landez) Python libary.

**Please use this tool responsibly**. Consult the usage policies of the tile servers you are interacting with, for example the [OpenStreetMap Tile Usage Policy](https://operations.osmfoundation.org/policies/tiles/).

Running the following will download zoom levels 0-3 of OpenStreetMap, 85 tiles total, and store them in a SQLite database called `world.mbtiles`:

    download-tiles world.mbtiles

You can customize which tile and zoom levels are downloaded using command options:

`--zoom-levels=0-3` or `-z=0-3`

The different zoom levels to download. Specify a single number, e.g. `15`, or a range of numbers e.g. `0-4`. Be careful with this setting as you can easily go over the limits requested by the underlying tile server.

`--bbox=3.9,-6.3,14.5,10.2` or `-b=3.9,-6.3,14.5,10.2`

The bounding box to fetch. Should be specified as `min-lon,min-lat,max-lon,max-lat`. You can use [bboxfinder.com](http://bboxfinder.com/) to find these for different areas.

`--city=london` or `--country=madagascar`

These options can be used instead of `--bbox`. The city or country specified will be looked up using the [Nominatum API](https://nominatim.org/release-docs/latest/api/Search/) and used to derive a bounding box.

`--show-bbox`

Use this option to output the bounding box that was retrieved for the `--city` or `--country` without downloading any tiles.

`--name=Name`

A name for this tile collection, used for the `name` field in the `metadata` table. If not specified a UUID will be used, or if you used `--city` or `--country` the name will be set to the full name of that place.

`--attribution=""Attribution string""`

Attribution string to bake into the `metadata` table. This will default to `© OpenStreetMap contributors` unless you use `--tiles-url` to specify an alternative tile server, in which case you should specify a custom attribution string.

You can use the `--attribution=osm` shortcut to specify the `© OpenStreetMap contributors` value without having to type it out in full.

`--tiles-url=https://...`

The tile server URL to use. This should include `{z}` and `{x}` and `{y}` specifiers, and can optionally include `{s}` for subdomains.

The default URL used here is for OpenStreetMap, `http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png`

`--tiles-subdomains=a,b,c`

A comma-separated list of subdomains to use for the `{s}` parameter.

`--verbose`

Use this option to turn on verbose logging.

`--cache-dir=/tmp/tiles`

Provide a directory to cache downloaded tiles between runs. This can be useful if you are worried you might not have used the correct options for the bounding box or zoom levels.

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd download-tiles
    python -mvenv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and tests:

    pip install -e '.[test]'

To run the tests:

    pytest


",Simon Willison,,text/markdown,https://github.com/simonw/download-tiles,,"Apache License, Version 2.0",,,https://pypi.org/project/download-tiles/,,https://pypi.org/project/download-tiles/,"{""CI"": ""https://github.com/simonw/download-tiles/actions"", ""Changelog"": ""https://github.com/simonw/download-tiles/releases"", ""Homepage"": ""https://github.com/simonw/download-tiles"", ""Issues"": ""https://github.com/simonw/download-tiles/issues""}",https://pypi.org/project/download-tiles/0.4.1/,"[""click"", ""requests"", ""landez (==2.5.0)"", ""pytest ; extra == 'test'"", ""requests-mock ; extra == 'test'""]",>=3.6,0.4.1,0,
evernote-to-sqlite,Tools for converting Evernote content to SQLite,[],"# evernote-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/evernote-to-sqlite.svg)](https://pypi.org/project/evernote-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/dogsheep/evernote-to-sqlite?include_prereleases&label=changelog)](https://github.com/dogsheep/evernote-to-sqlite/releases)
[![Tests](https://github.com/dogsheep/evernote-to-sqlite/workflows/Test/badge.svg)](https://github.com/dogsheep/evernote-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/dogsheep/evernote-to-sqlite/blob/master/LICENSE)

Tools for converting Evernote content to SQLite. See [Building an Evernote to SQLite exporter](https://simonwillison.net/2020/Oct/16/building-evernote-sqlite-exporter/) for background on this project.

## Installation

Install this tool using `pip`:

    $ pip install evernote-to-sqlite

## Usage

Currently the only available command is `evernote-to-sqlite enex`, which converts Evernote's ENEX export files into a SQLite database.

You can create [an ENEX export](https://help.evernote.com/hc/en-us/articles/209005557-Export-notes-and-notebooks-as-ENEX-or-HTML) in the Evernote desktop application by selecting some notes (or all of your notes) and using the `File -> Export Notes...` menu option.

This used to be able to export everything in one go, but it looks like more recent Evernote versions only allow exporting up to fifty notes at a time, or let you export an entire notebook by right-clicking on the notebook and selecting ""Export notebook..."".

You can convert that file to SQLite like so:

    $ evernote-to-sqlite enex evernote.db MyNotes.enex

This will display a progress bar and create a SQLite database file called `evernote.db`.

### Limitations

Unfortunately the ENEX export format does not include a unique identifier for each note. This means you cannot use this tool to re-import notes after they have been updated - you should consider this tool to be a one-time transformation of an ENEX file into an equivalent SQLite database.

ENEX exports also do not include details of which notebook a note belongs to.

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd evernote-to-sqlite
    python -mvenv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and tests:

    pip install -e '.[test]'

To run the tests:

    pytest


",Simon Willison,,text/markdown,https://github.com/dogsheep/evernote-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/evernote-to-sqlite/,,https://pypi.org/project/evernote-to-sqlite/,"{""CI"": ""https://github.com/dogsheep/evernote-to-sqlite/actions"", ""Changelog"": ""https://github.com/dogsheep/evernote-to-sqlite/releases"", ""Homepage"": ""https://github.com/dogsheep/evernote-to-sqlite"", ""Issues"": ""https://github.com/dogsheep/evernote-to-sqlite/issues""}",https://pypi.org/project/evernote-to-sqlite/0.3.2/,"[""click"", ""sqlite-utils (>=3.0)"", ""pytest ; extra == 'test'""]",,0.3.2,0,
fec-to-sqlite,Save FEC campaign finance data to a SQLite database,[],"# fec-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/fec-to-sqlite.svg)](https://pypi.org/project/fec-to-sqlite/)
[![CircleCI](https://circleci.com/gh/simonw/fec-to-sqlite.svg?style=svg)](https://circleci.com/gh/simonw/fec-to-sqlite)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/fec-to-sqlite/blob/master/LICENSE)

Create a SQLite database using FEC campaign contributions data.

This tool builds on [fecfile](https://github.com/esonderegger/) by Evan Sonderegger.

## How to install

    $ pip install fec-to-sqlite

## Usage

    $ fec-to-sqlite filings filings.db 1146148

This fetches the filing with ID `1146148` and stores it in tables in a SQLite database called `filings.db`. It will create any tables it needs.

You can pass more than one filing ID, separated by spaces.


",Simon Willison,,text/markdown,https://github.com/dogsheep/fec-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/fec-to-sqlite/,,https://pypi.org/project/fec-to-sqlite/,"{""Homepage"": ""https://github.com/dogsheep/fec-to-sqlite""}",https://pypi.org/project/fec-to-sqlite/0.2/,"[""sqlite-utils"", ""click"", ""requests"", ""fecfile"", ""tqdm"", ""pytest ; extra == 'test'""]",,0.2,0,
git-history,Tools for analyzing Git history using SQLite,[],"# git-history

[![PyPI](https://img.shields.io/pypi/v/git-history.svg)](https://pypi.org/project/git-history/)
[![Changelog](https://img.shields.io/github/v/release/simonw/git-history?include_prereleases&label=changelog)](https://github.com/simonw/git-history/releases)
[![Tests](https://github.com/simonw/git-history/workflows/Test/badge.svg)](https://github.com/simonw/git-history/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/git-history/blob/master/LICENSE)

Tools for analyzing Git history using SQLite

For background on this project see [git-history: a tool for analyzing scraped data collected using Git and SQLite](https://simonwillison.net/2021/Dec/7/git-history/)

## Installation

Install this tool using `pip`:

    $ pip install git-history

## Demos

[git-history-demos.datasette.io](http://git-history-demos.datasette.io/) hosts three example databases created using this tool:

- [pge-outages](https://git-history-demos.datasette.io/pge-outages) shows a history of PG&E (the electricity supplier) [outages](https://pgealerts.alerts.pge.com/outagecenter/), using data collected in [simonw/pge-outages](https://github.com/simonw/pge-outages) converted using [pge-outages.sh](https://github.com/simonw/git-history/blob/main/demos/pge-outages.sh)
- [ca-fires](https://git-history-demos.datasette.io/ca-fires) shows a history of fires in California reported on [fire.ca.gov/incidents](https://www.fire.ca.gov/incidents/), from data in [simonw/ca-fires-history](https://github.com/simonw/ca-fires-history) converted using [ca-fires.sh](https://github.com/simonw/git-history/blob/main/demos/ca-fires.sh)
- [sf-bay-511](https://git-history-demos.datasette.io/sf-bay-511) has records of San Francisco Bay Area traffic and transit incident data from [511.org](https://511.org/), collected in [dbreunig/511-events-history](https://github.com/dbreunig/511-events-history) converted using [sf-bay-511.sh](https://github.com/simonw/git-history/blob/main/demos/sf-bay-511.sh)

The demos are deployed using [Datasette](https://datasette.io/) on [Google Cloud Run](https://cloud.google.com/run/) by [this GitHub Actions workflow](https://github.com/simonw/git-history/blob/main/.github/workflows/deploy-demos.yml).

## Usage

This tool can be run against a Git repository that holds a file that contains JSON, CSV/TSV or some other format and which has multiple versions tracked in the Git history. Read [Git scraping: track changes over time by scraping to a Git repository](https://simonwillison.net/2020/Oct/9/git-scraping/) to understand how you might create such a repository.

The `file` command analyzes the history of an individual file within the repository, and generates a SQLite database table that represents the different versions of that file over time.

The file is assumed to contain multiple objects - for example, the results of scraping an electricity outage map or a CSV file full of records.

Assuming you have a file called `incidents.json` that is a JSON array of objects, with multiple versions of that file recorded in a repository. Each version of that file might look something like this:

```json
[
    {
        ""IncidentID"": ""abc123"",
        ""Location"": ""Corner of 4th and Vermont"",
        ""Type"": ""fire""
    },
    {
        ""IncidentID"": ""cde448"",
        ""Location"": ""555 West Example Drive"",
        ""Type"": ""medical""
    }
]
```

Change directory into the GitHub repository in question and run the following:

    git-history file incidents.db incidents.json

This will create a new SQLite database in the `incidents.db` file with three tables:

- `commits` containing a row for every commit, with a `hash` column, the `commit_at` date and a foreign key to a `namespace`.
- `item` containing a row for every item in every version of the `filename.json` file - with an extra `_commit` column that is a foreign key back to the `commit` table.
- `namespaces` containing a single row. This allows you to build multiple tables for different files, using the `--namespace` option described below.

The database schema for this example will look like this:

<!-- [[[cog
import cog, json
from git_history import cli
from click.testing import CliRunner
from tests.test_git_history import make_repo
import sqlite_utils
import tempfile, pathlib
tmpdir = pathlib.Path(tempfile.mkdtemp())
db_path = str(tmpdir / ""data.db"")
make_repo(tmpdir)
runner = CliRunner()
result = runner.invoke(cli.cli, [
    ""file"", db_path, str(tmpdir / ""repo"" / ""incidents.json""), ""--repo"", str(tmpdir / ""repo"")
])
cog.out(""```sql\n"")
cog.out(sqlite_utils.Database(db_path).schema)
cog.out(""\n```"")
]]] -->
```sql
CREATE TABLE [namespaces] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT
);
CREATE UNIQUE INDEX [idx_namespaces_name]
    ON [namespaces] ([name]);
CREATE TABLE [commits] (
   [id] INTEGER PRIMARY KEY,
   [namespace] INTEGER REFERENCES [namespaces]([id]),
   [hash] TEXT,
   [commit_at] TEXT
);
CREATE UNIQUE INDEX [idx_commits_namespace_hash]
    ON [commits] ([namespace], [hash]);
CREATE TABLE [item] (
   [IncidentID] TEXT,
   [Location] TEXT,
   [Type] TEXT
);
```
<!-- [[[end]]] -->

If you have 10 historic versions of the `incidents.json` file and each one contains 30 incidents, you will end up with 10 * 30 = 300 rows in your `item` table.

### Track the history of individual items using IDs

If your objects have a unique identifier - or multiple columns that together form a unique identifier - you can use the `--id` option to de-duplicate and track changes to each of those items over time.

This provides a much more interesting way to apply this tool.

If there is a unique identifier column called `IncidentID` you could run the following:

    git-history file incidents.db incidents.json --id IncidentID

The database schema used here is very different from the one used without the `--id` option.

If you have already imported history, the command will skip any commits that it has seen already and just process new ones. This means that even though an initial import could be slow subsequent imports should run a lot faster.

This command will create six tables - `commits`, `item`, `item_version`, `columns`, `item_changed` and `namespaces`.

Here's the full schema:

<!-- [[[cog
db_path2 = str(tmpdir / ""data2.db"")
result = runner.invoke(cli.cli, [
    ""file"", db_path2, str(tmpdir / ""repo"" / ""incidents.json""),
    ""--repo"", str(tmpdir / ""repo""),
    ""--id"", ""IncidentID""
])
cog.out(""```sql\n"")
cog.out(sqlite_utils.Database(db_path2).schema)
cog.out(""\n```"")
]]] -->
```sql
CREATE TABLE [namespaces] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT
);
CREATE UNIQUE INDEX [idx_namespaces_name]
    ON [namespaces] ([name]);
CREATE TABLE [commits] (
   [id] INTEGER PRIMARY KEY,
   [namespace] INTEGER REFERENCES [namespaces]([id]),
   [hash] TEXT,
   [commit_at] TEXT
);
CREATE UNIQUE INDEX [idx_commits_namespace_hash]
    ON [commits] ([namespace], [hash]);
CREATE TABLE [item] (
   [_id] INTEGER PRIMARY KEY,
   [_item_id] TEXT
, [IncidentID] TEXT, [Location] TEXT, [Type] TEXT, [_commit] INTEGER);
CREATE UNIQUE INDEX [idx_item__item_id]
    ON [item] ([_item_id]);
CREATE TABLE [item_version] (
   [_id] INTEGER PRIMARY KEY,
   [_item] INTEGER REFERENCES [item]([_id]),
   [_version] INTEGER,
   [_commit] INTEGER REFERENCES [commits]([id]),
   [IncidentID] TEXT,
   [Location] TEXT,
   [Type] TEXT,
   [_item_full_hash] TEXT
);
CREATE TABLE [columns] (
   [id] INTEGER PRIMARY KEY,
   [namespace] INTEGER REFERENCES [namespaces]([id]),
   [name] TEXT
);
CREATE UNIQUE INDEX [idx_columns_namespace_name]
    ON [columns] ([namespace], [name]);
CREATE TABLE [item_changed] (
   [item_version] INTEGER REFERENCES [item_version]([_id]),
   [column] INTEGER REFERENCES [columns]([id]),
   PRIMARY KEY ([item_version], [column])
);
CREATE VIEW item_version_detail AS select
  commits.commit_at as _commit_at,
  commits.hash as _commit_hash,
  item_version.*,
  (
    select json_group_array(name) from columns
    where id in (
      select column from item_changed
      where item_version = item_version._id
    )
) as _changed_columns
from item_version
  join commits on commits.id = item_version._commit;
CREATE INDEX [idx_item_version__item]
    ON [item_version] ([_item]);
```
<!-- [[[end]]] -->

#### item table

The `item` table will contain the most recent version of each row, de-duplicated by ID, plus the following additional columns:

- `_id` - a numeric integer primary key, used as a foreign key from the `item_version` table.
- `_item_id` - a hash of the values of the columns specified using the `--id` option to the command. This is used for de-duplication when processing new versions.
- `_commit` - a foreign key to the `commit` table, representing the most recent commit to modify this item.

#### item_version table

The `item_version` table will contain a row for each captured differing version of that item, plus the following columns:

- `_id` - a numeric ID for the item version record.
- `_item` - a foreign key to the `item` table.
- `_version` - the numeric version number, starting at 1 and incrementing for each captured version.
- `_commit` - a foreign key to the `commit` table.
- `_item_full_hash` - a hash of this version of the item. This is used internally by the tool to identify items that have changed between commits.

The other columns in this table represent columns in the original data that have changed since the previous version. If the value has not changed, it will be represented by a `null`.

If a value was previously set but has been changed back to `null` it will still be represented as `null` in the `item_version` row. You can identify these using the `item_changed` many-to-many table described below.

You can use the `--full-versions` option to store full copies of the item at each version, rather than just storing the columns that have changed.

#### item_version_detail view

This SQL view joins `item_version` against `commits` to add three further columns: `_commit_at` with the date of the commit, and `_commit_hash` with the Git commit hash.

#### item_changed

This many-to-many table indicates exactly which columns were changed in an `item_version`.

- `item_version` is a foreign key to a row in the `item_version` table.
- `column` is a foreign key to a row in the `columns` table.

This table with have the largest number of rows, which is why it stores just two integers in order to save space.

#### columns

The `columns` table stores column names. It is referenced by `item_changed`.

- `id` - an integer ID.
- `name` - the name of the column.
- `namespace` - a foreign key to `namespaces`, for if multiple file histories are sharing the same database.

#### Reserved column names

<!-- [[[cog
from git_history.utils import RESERVED
cog.out(""Note that "")
cog.out("", "".join(""`{}`"".format(r) for r in RESERVED))
cog.out("" are considered reserved column names for the purposes of this tool."")
]]] -->
Note that `_id`, `_item_full_hash`, `_item`, `_item_id`, `_version`, `_commit`, `_item_id`, `_commit_at`, `_commit_hash`, `_changed_columns`, `rowid` are considered reserved column names for the purposes of this tool.
<!-- [[[end]]] -->

If your data contains any of these they will be renamed to add a trailing underscore, for example `_id_`, `_item_`, `_version_`, to avoid clashing with the reserved columns.

If you have a column with a name such as `_commit_` it will be renamed too, adding an additional trailing underscore, so `_commit_` becomes `_commit__` and `_commit__` becomes `_commit___`.

### Additional options

- `--repo DIRECTORY` - the path to the Git repository, if it is not the current working directory.
- `--branch TEXT` - the Git branch to analyze - defaults to `main`.
- `--id TEXT` - as described above: pass one or more columns that uniquely identify a record, so that changes to that record can be calculated over time.
- `--full-versions` - instead of recording just the columns that have changed in the `item_version` table record a full copy of each version of theh item.
- `--ignore TEXT` - one or more columns to ignore - they will not be included in the resulting database.
- `--csv` - treat the data is CSV or TSV rather than JSON, and attempt to guess the correct dialect
- `--dialect` - use a spcific CSV dialect. Options are `excel`, `excel-tab` and `unix` - see [the Python CSV documentation](https://docs.python.org/3/library/csv.html#csv.excel) for details.
- `--skip TEXT` - one or more full Git commit hashes that should be skipped. You can use this if some of the data in your revision history is corrupted in a way that prevents this tool from working.
- `--start-at TEXT` - skip commits prior to the specified commit hash.
- `--start-after TEXT` - skip commits up to and including the specified commit hash, then start processing from the following commit.
- `--convert TEXT` - custom Python code for a conversion, described below.
- `--import TEXT` - additional Python modules to import for `--convert`.
- `--ignore-duplicate-ids` - if a single version of a file has the same ID in it more than once, the tool will exit with an error. Use this option to ignore this and instead pick just the first of the two duplicates.
- `--namespace TEXT` - use this if you wish to include the history of multiple different files in the same database. The default is `item` but you can set it to something else, which will produce tables with names like `yournamespace` and `yournamespace_version`.
- `--wal` - Enable WAL mode on the created database file. Use this if you plan to run queries against the database while `git-history` is creating it.
- `--silent` - don't show the progress bar.

### CSV and TSV data

If the data in your repository is a CSV or TSV file you can process it by adding the `--csv` option. This will attempt to detect which delimiter is used by the file, so the same option works for both comma- and tab-separated values.

    git-history file trees.db trees.csv --id TreeID

You can also specify the CSV dialect using the `--dialect` option.

### Custom conversions using --convert

If your data is not already either CSV/TSV or a flat JSON array, you can reshape it using the `--convert` option.

The format needed by this tool is an array of dictionaries, as demonstrated by the `incidents.json` example above.

If your data does not fit this shape, you can provide a snippet of Python code to converts the on-disk content of each stored file into a Python list of dictionaries.

For example, if your stored files each look like this:

```json
{
    ""incidents"": [
        {
            ""id"": ""552"",
            ""name"": ""Hawthorne Fire"",
            ""engines"": 3
        },
        {
            ""id"": ""556"",
            ""name"": ""Merlin Fire"",
            ""engines"": 1
        }
    ]
}
```
You could use the following Python snippet to convert them to the required format:

```python
json.loads(content)[""incidents""]
```
(The `json` module is exposed to your custom function by default.)

You would then run the tool like this:

    git-history file database.db incidents.json \
      --id id \
      --convert 'json.loads(content)[""incidents""]'

The `content` variable is always a `bytes` object representing the content of the file at a specific moment in the repository's history.

You can import additional modules using `--import`. This example shows how you could read a CSV file that uses `;` as the delimiter:

    git-history file trees.db ../sf-tree-history/Street_Tree_List.csv \
      --repo ../sf-tree-history \
      --import csv \
      --import io \
      --convert '
        fp = io.StringIO(content.decode(""utf-8""))
        return list(csv.DictReader(fp, delimiter="";""))
        ' \
      --id TreeID

You can import nested modules such as [ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html)  using `--import xml.etree.ElementTree`, then refer to them in your function body as `xml.etree.ElementTree`. For example, if your tracked data was in an `items.xml` file that looked like this:

```xml
<items>
  <item id=""1"" name=""One"" />
  <item id=""2"" name=""Two"" />
  <item id=""3"" name=""Three"" />
</item>
```
You could load it using the following `--convert` script:
```
git-history file items.xml --convert '
tree = xml.etree.ElementTree.fromstring(content)
return [el.attrib for el in tree.iter(""item"")]
' --import xml.etree.ElementTree --id id
```

If your Python code spans more than one line it needs to include a `return` statement.

You can also use Python generators in your `--convert` code, for example:

    git-history file stats.db package-stats/stats.json \
        --repo package-stats \
        --convert '
        data = json.loads(content)
        for key, counts in data.items():
            for date, count in counts.items():
                yield {
                    ""package"": key,
                    ""date"": date,
                    ""count"": count
                }
        ' --id package --id date

This conversion function expects data that looks like this:

```json
{
    ""airtable-export"": {
        ""2021-05-18"": 66,
        ""2021-05-19"": 60,
        ""2021-05-20"": 87
    }
}
```

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd git-history
    python -m venv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and test dependencies:

    pip install -e '.[test]'

To run the tests:

    pytest

To update the schema examples in this README file:

    cog -r README.md


",Simon Willison,,text/markdown,https://github.com/simonw/git-history,,"Apache License, Version 2.0",,,https://pypi.org/project/git-history/,,https://pypi.org/project/git-history/,"{""CI"": ""https://github.com/simonw/git-history/actions"", ""Changelog"": ""https://github.com/simonw/git-history/releases"", ""Homepage"": ""https://github.com/simonw/git-history"", ""Issues"": ""https://github.com/simonw/git-history/issues""}",https://pypi.org/project/git-history/0.6.1/,"[""click"", ""GitPython"", ""sqlite-utils (>=3.19)"", ""pytest ; extra == 'test'"", ""cogapp ; extra == 'test'""]",>=3.6,0.6.1,0,
google-drive-to-sqlite,Create a SQLite database containing metadata from Google Drive,[],"# google-drive-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/google-drive-to-sqlite.svg)](https://pypi.org/project/google-drive-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/simonw/google-drive-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/google-drive-to-sqlite/releases)
[![Tests](https://github.com/simonw/google-drive-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/google-drive-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/google-drive-to-sqlite/blob/master/LICENSE)

Create a SQLite database containing metadata from [Google Drive](https://www.google.com/drive)

If you use Google Drive, and especially if you have shared drives with other people there's a good chance you have hundreds or even thousands of files that you may not be fully aware of.

This tool can download metadata about those files - their names, sizes, folders, content types, permissions, creation dates and more - and store them in a SQLite database.

This lets you use SQL to analyze your Google Drive contents, using [Datasette](https://datasette.io/) or the SQLite command-line tool or any other SQLite database browsing software.

## Installation

Install this tool using `pip`:

    pip install google-drive-to-sqlite

## Quickstart

Authenticate with Google Drive by running:

    google-drive-to-sqlite auth

Now create a SQLite database with metadata about all of the files you have starred using:

    google-drive-to-sqlite files starred.db --starred

You can explore the resulting database using [Datasette](https://datasette.io/):

    $ pip install datasette
    $ datasette starred.db
    INFO:     Started server process [24661]
    INFO:     Uvicorn running on http://127.0.0.1:8001

## Authentication

> :warning: **This application has not yet been verified by Google** - you may find you are unable to authenticate until that verification is complete. [#10](https://github.com/simonw/google-drive-to-sqlite/issues/10)
>
> You can work around this issue by [creating your own OAuth client ID key](https://til.simonwillison.net/googlecloud/google-oauth-cli-application) and passing it to the `auth` command using `--google-client-id` and `--google-client-secret`.

First, authenticate with Google Drive using the `auth` command:

    $ google-drive-to-sqlite auth
    Visit the following URL to authenticate with Google Drive

    https://accounts.google.com/o/oauth2/v2/auth?...

    Then return here and paste in the resulting code:
    Paste code here: 

Follow the link, sign in with Google Drive and then copy and paste the resulting code back into the tool.

This will save an authentication token to the file called `auth.json` in the current directory.

To specify a different location for that file, use the `--auth` option:

    google-drive-to-sqlite auth --auth ~/google-drive-auth.json

The `auth` command also provides options for using a different scope, Google client ID and Google client secret. You can use these to create your own custom authentication tokens that can work with other Google APIs, see [issue #5](https://github.com/simonw/google-drive-to-sqlite/issues/5) for details.

Full `--help`:

<!-- [[[cog
import cog
from google_drive_to_sqlite import cli
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(cli.cli, [""auth"", ""--help""])
help = result.output.replace(""Usage: cli"", ""Usage: google-drive-to-sqlite"")
cog.out(
    ""```\n{}\n```\n"".format(help)
)
]]] -->
```
Usage: google-drive-to-sqlite auth [OPTIONS]

  Authenticate user and save credentials

Options:
  -a, --auth FILE              Path to save token, defaults to auth.json
  --google-client-id TEXT      Custom Google client ID
  --google-client-secret TEXT  Custom Google client secret
  --scope TEXT                 Custom token scope
  --help                       Show this message and exit.

```
<!-- [[[end]]] -->

To revoke the token that is stored in `auth.json`, such that it cannot be used to access Google Drive in the future, run the `revoke` command:

    google-drive-to-sqlite revoke

Or if your token is stored in another location:

    google-drive-to-sqlite revoke -a ~/google-drive-auth.json

You will need to obtain a fresh token using the `auth` command in order to continue using this tool.

## google-drive-to-sqlite files

To retrieve metadata about the files in your Google Drive, or a folder or search within it, use the `google-drive-to-sqlite files` command.

This will default to writing details about every file in your Google Drive to a SQLite database:

    google-drive-to-sqlite files files.db

Files and folders will be written to databases tables, which will be created if they do not yet exist. The database schema is [shown below](#database-schema).

If a file or folder already exists, based on a matching `id`, it will be replaced with fresh data.

Instead of writing to SQLite you can use `--json` to output as JSON, or `--nl` to output as newline-delimited JSON:

    google-drive-to-sqlite files --nl

Use `--folder ID` to retrieve everything in a specified folder and its sub-folders:

    google-drive-to-sqlite files files.db --folder 1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i

Use `--q QUERY` to use a [custom search query](https://developers.google.com/drive/api/v3/reference/query-ref):

    google-drive-to-sqlite files files.db -q ""viewedByMeTime > '2022-01-01'""

The following shortcut options help build queries:

- `--full-text TEXT` to search for files where the full text matches a search term
- `--starred` for files and folders you have starred
- `--trashed` for files and folders in the trash
- `--shared-with-me` for files and folders that have been shared with you
- `--apps` for Google Apps documents, spreadsheets, presentations and drawings (equivalent to setting all  of the next four options)
- `--docs` for Google Apps documents
- `--sheets` for Google Apps spreadsheets
- `--presentations` for Google Apps presentations
- `--drawings` for Google Apps drawings

You can combine these - for example, this returns all files that you have starred and that were shared with you:

    google-drive-to-sqlite files highlights.db \
      --starred --shared-with-me

Multiple options are treated as AND, with the exception of the Google Apps options which are treated as OR - so the following would retrieve all spreadsheets and presentations that have also been starred:

    google-drive-to-sqlite files highlights.db \
      --starred --sheets --presentations

You can use `--stop-after X` to stop after retrieving X files, useful for trying out a new search pattern and seeing results straight away.

The `--import-json` and `--import-nl` options are mainly useful for testing and developing this tool. They allow you to replay the JSON or newline-delimited JSON that was previously fetched using `--json` or `--nl` and use it to create a fresh SQLite database, without needing to make any outbound API calls:

    # Fetch all starred files from the API, write to starred.json
    google-drive-to-sqlite files -q 'starred = true' --json > starred.json
    # Now import that data into a new SQLite database file
    google-drive-to-sqlite files starred.db --import-json starred.json

Full `--help`:

<!-- [[[cog
result = runner.invoke(cli.cli, [""files"", ""--help""])
help = result.output.replace(""Usage: cli"", ""Usage: google-drive-to-sqlite"")
cog.out(
    ""```\n{}\n```\n"".format(help)
)
]]] -->
```
Usage: google-drive-to-sqlite files [OPTIONS] [DATABASE]

  Retrieve metadata for files in Google Drive, and write to a SQLite database or
  output as JSON.

      google-drive-to-sqlite files files.db

  Use --json to output JSON, --nl for newline-delimited JSON:

      google-drive-to-sqlite files files.db --json

  Use a folder ID to recursively fetch every file in that folder and its sub-
  folders:

      google-drive-to-sqlite files files.db --folder
      1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i

  Fetch files you have starred:

      google-drive-to-sqlite files starred.db --starred

Options:
  -a, --auth FILE       Path to auth.json token file
  --folder TEXT         Files in this folder ID and its sub-folders
  -q TEXT               Files matching this query
  --full-text TEXT      Search for files with text match
  --starred             Files you have starred
  --trashed             Files in the trash
  --shared-with-me      Files that have been shared with you
  --apps                Google Apps docs, spreadsheets, presentations and
                        drawings
  --docs                Google Apps docs
  --sheets              Google Apps spreadsheets
  --presentations       Google Apps presentations
  --drawings            Google Apps drawings
  --json                Output JSON rather than write to DB
  --nl                  Output newline-delimited JSON rather than write to DB
  --stop-after INTEGER  Stop paginating after X results
  --import-json FILE    Import from this JSON file instead of the API
  --import-nl FILE      Import from this newline-delimited JSON file
  -v, --verbose         Send verbose output to stderr
  --help                Show this message and exit.

```
<!-- [[[end]]] -->

## google-drive-to-sqlite download FILE_ID

The `download` command can be used to download files from Google Drive.

You'll need one or more file IDs, which look something like `0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB`.

To download the file, run this:

    google-drive-to-sqlite download 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB

This will detect the content type of the file and use that as the extension - so if this file is a JPEG the file would be downloaded as:

    0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB.jpeg

You can pass multiple file IDs to the command at once.

To hide the progress bar and filename output, use `-s` or `--silent`.

If you are downloading a single file you can use the `-o` output to specify a filename and location:

    google-drive-to-sqlite download 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB \
      -o my-image.jpeg

Use `-o -` to write the file contents to standard output:

    google-drive-to-sqlite download 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB \
      -o - > my-image.jpeg

Full `--help`:

<!-- [[[cog
result = runner.invoke(cli.cli, [""download"", ""--help""])
help = result.output.replace(""Usage: cli"", ""Usage: google-drive-to-sqlite"")
cog.out(
    ""```\n{}\n```\n"".format(help)
)
]]] -->
```
Usage: google-drive-to-sqlite download [OPTIONS] FILE_IDS...

  Download one or more files to disk, based on their file IDs.

  The file content will be saved to a file with the name:

      FILE_ID.ext

  Where the extension is automatically picked based on the type of file.

  If you are downloading a single file you can specify a filename with -o:

      google-drive-to-sqlite download MY_FILE_ID -o myfile.txt

Options:
  -a, --auth FILE    Path to auth.json token file
  -o, --output FILE  File to write to, or - for standard output
  -s, --silent       Hide progress bar and filename
  --help             Show this message and exit.

```
<!-- [[[end]]] -->

## google-drive-to-sqlite export FORMAT FILE_ID

The `export` command can be used to export Google Docs documents, spreadsheets and presentations in a number of different formats.

You'll need one or more document IDs, which look something like `10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU`. You can find these by looking at the URL of your document on the Google Docs site.

To export that document as PDF, run this:

    google-drive-to-sqlite export pdf 10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU

The file will be exported as:

    10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU-export.pdf

You can pass multiple file IDs to the command at once.

For the `FORMAT` option you can use any of the mime type options listed [on this page](https://developers.google.com/drive/api/v3/ref-export-formats) - for example, to export as an Open Office document you could use:

    google-drive-to-sqlite export \
     application/vnd.oasis.opendocument.text \
     10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU

For convenience the following shortcuts for common file formats are provided:

- Google Docs: `html`, `txt`, `rtf`, `pdf`, `doc`, `zip`, `epub`
- Google Sheets: `xls`, `pdf`, `csv`, `tsv`, `zip`
- Presentations: `ppt`, `pdf`, `txt`
- Drawings: `jpeg`, `png`, `svg`

The `zip` option returns a zip file of HTML. `txt` returns plain text. The others should be self-evident.

To hide the filename output, use `-s` or `--silent`.

If you are exporting a single file you can use the `-o` output to specify a filename and location:

    google-drive-to-sqlite export pdf 10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU \
      -o my-document.pdf

Use `-o -` to write the file contents to standard output:

    google-drive-to-sqlite export pdf 10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU \
      -o - > my-document.pdf

Full `--help`:

<!-- [[[cog
result = runner.invoke(cli.cli, [""export"", ""--help""])
help = result.output.replace(""Usage: cli"", ""Usage: google-drive-to-sqlite"")
cog.out(
    ""```\n{}\n```\n"".format(help)
)
]]] -->
```
Usage: google-drive-to-sqlite export [OPTIONS] FORMAT FILE_IDS...

  Export one or more files to the specified format.

  Usage:

      google-drive-to-sqlite export pdf FILE_ID_1 FILE_ID_2

  The file content will be saved to a file with the name:

      FILE_ID-export.ext

  Where the extension is based on the format you specified.

  Available export formats can be seen here:
  https://developers.google.com/drive/api/v3/ref-export-formats

  Or you can use one of the following shortcuts:

  - Google Docs: html, txt, rtf, pdf, doc, zip, epub
  - Google Sheets: xls, pdf, csv, tsv, zip
  - Presentations: ppt, pdf, txt
  - Drawings: jpeg, png, svg

  ""zip"" returns a zip file of HTML.

  If you are exporting a single file you can specify a filename with -o:

      google-drive-to-sqlite export zip MY_FILE_ID -o myfile.zip

Options:
  -a, --auth FILE    Path to auth.json token file
  -o, --output FILE  File to write to, or - for standard output
  -s, --silent       Hide progress bar and filename
  --help             Show this message and exit.

```
<!-- [[[end]]] -->

## google-drive-to-sqlite get URL

The `get` command makes authenticated requests to the specified URL, using credentials derived from the `auth.json` file.

For example:

    $ google-drive-to-sqlite get 'https://www.googleapis.com/drive/v3/about?fields=*'
    {
        ""kind"": ""drive#about"",
        ""user"": {
            ""kind"": ""drive#user"",
            ""displayName"": ""Simon Willison"",
    # ...

If the resource you are fetching supports pagination you can use `--paginate key` to paginate through all of the rows in a specified key. For example, the following API has a `nextPageToken` key and a `files` list, suggesting it supports pagination:

    $ google-drive-to-sqlite get https://www.googleapis.com/drive/v3/files
    {
        ""kind"": ""drive#fileList"",
        ""nextPageToken"": ""~!!~AI9...wogHHYlc="",
        ""incompleteSearch"": false,
        ""files"": [
            {
                ""kind"": ""drive#file"",
                ""id"": ""1YEsITp_X8PtDUJWHGM0osT-TXAU1nr0e7RSWRM2Jpyg"",
                ""name"": ""Title of a spreadsheet"",
                ""mimeType"": ""application/vnd.google-apps.spreadsheet""
            },

To paginate through everything in the `files` list you would use `--paginate files` like this:

    $ google-drive-to-sqlite get https://www.googleapis.com/drive/v3/files --paginate files
    [
      {
        ""kind"": ""drive#file"",
        ""id"": ""1YEsITp_X8PtDUJWHGM0osT-TXAU1nr0e7RSWRM2Jpyg"",
        ""name"": ""Title of a spreadsheet"",
        ""mimeType"": ""application/vnd.google-apps.spreadsheet""
      },
      # ...

Add `--nl` to stream paginated data as newline-delimited JSON:

    $ google-drive-to-sqlite get https://www.googleapis.com/drive/v3/files --paginate files --nl
    {""kind"": ""drive#file"", ""id"": ""1YEsITp_X8PtDUJWHGM0osT-TXAU1nr0e7RSWRM2Jpyg"", ""name"": ""Title of a spreadsheet"", ""mimeType"": ""application/vnd.google-apps.spreadsheet""}
    {""kind"": ""drive#file"", ""id"": ""1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i"", ""name"": ""Subfolder"", ""mimeType"": ""application/vnd.google-apps.folder""}

Add `--stop-after 5` to stop after 5 records - useful for testing.

Full `--help`:

<!-- [[[cog
result = runner.invoke(cli.cli, [""get"", ""--help""])
help = result.output.replace(""Usage: cli"", ""Usage: google-drive-to-sqlite"")
cog.out(
    ""```\n{}\n```\n"".format(help)
)
]]] -->
```
Usage: google-drive-to-sqlite get [OPTIONS] URL

  Make an authenticated HTTP GET to the specified URL

Options:
  -a, --auth FILE       Path to auth.json token file
  --paginate TEXT       Paginate through all results in this key
  --nl                  Output paginated data as newline-delimited JSON
  --stop-after INTEGER  Stop paginating after X results
  -v, --verbose         Send verbose output to stderr
  --help                Show this message and exit.

```
<!-- [[[end]]] -->


## Database schema

The database created by this tool has the following schema:

<!-- [[[cog
import tempfile, pathlib, sqlite_utils
tmpdir = pathlib.Path(tempfile.mkdtemp())
db_path = str(tmpdir / ""docs.db"")
result = runner.invoke(cli.cli, [
    ""files"", db_path, ""--import-json"", ""tests/folder-and-children.json""
])
cog.out(""```sql\n"")
schema = sqlite_utils.Database(db_path).schema
# Tidy up some formatting
schema = schema.replace("", ["", "",\n   ["")
schema = schema.replace(""\n,\n"", "",\n"")
schema = schema.replace(""TEXT);"", ""TEXT\n);"")
cog.out(schema)
cog.out(""\n```"")
]]] -->
```sql
CREATE TABLE [drive_users] (
   [permissionId] TEXT PRIMARY KEY,
   [kind] TEXT,
   [displayName] TEXT,
   [photoLink] TEXT,
   [me] INTEGER,
   [emailAddress] TEXT
);
CREATE TABLE [drive_folders] (
   [id] TEXT PRIMARY KEY,
   [_parent] TEXT,
   [_owner] TEXT,
   [lastModifyingUser] TEXT,
   [kind] TEXT,
   [name] TEXT,
   [mimeType] TEXT,
   [starred] INTEGER,
   [trashed] INTEGER,
   [explicitlyTrashed] INTEGER,
   [parents] TEXT,
   [spaces] TEXT,
   [version] TEXT,
   [webViewLink] TEXT,
   [iconLink] TEXT,
   [hasThumbnail] INTEGER,
   [thumbnailVersion] TEXT,
   [viewedByMe] INTEGER,
   [createdTime] TEXT,
   [modifiedTime] TEXT,
   [modifiedByMe] INTEGER,
   [shared] INTEGER,
   [ownedByMe] INTEGER,
   [viewersCanCopyContent] INTEGER,
   [copyRequiresWriterPermission] INTEGER,
   [writersCanShare] INTEGER,
   [folderColorRgb] TEXT,
   [quotaBytesUsed] TEXT,
   [isAppAuthorized] INTEGER,
   [linkShareMetadata] TEXT,
   FOREIGN KEY([_parent]) REFERENCES [drive_folders]([id]),
   FOREIGN KEY([_owner]) REFERENCES [drive_users]([permissionId]),
   FOREIGN KEY([lastModifyingUser]) REFERENCES [drive_users]([permissionId])
);
CREATE TABLE [drive_files] (
   [id] TEXT PRIMARY KEY,
   [_parent] TEXT,
   [_owner] TEXT,
   [lastModifyingUser] TEXT,
   [kind] TEXT,
   [name] TEXT,
   [mimeType] TEXT,
   [starred] INTEGER,
   [trashed] INTEGER,
   [explicitlyTrashed] INTEGER,
   [parents] TEXT,
   [spaces] TEXT,
   [version] TEXT,
   [webViewLink] TEXT,
   [iconLink] TEXT,
   [hasThumbnail] INTEGER,
   [thumbnailVersion] TEXT,
   [viewedByMe] INTEGER,
   [createdTime] TEXT,
   [modifiedTime] TEXT,
   [modifiedByMe] INTEGER,
   [shared] INTEGER,
   [ownedByMe] INTEGER,
   [viewersCanCopyContent] INTEGER,
   [copyRequiresWriterPermission] INTEGER,
   [writersCanShare] INTEGER,
   [quotaBytesUsed] TEXT,
   [isAppAuthorized] INTEGER,
   [linkShareMetadata] TEXT,
   FOREIGN KEY([_parent]) REFERENCES [drive_folders]([id]),
   FOREIGN KEY([_owner]) REFERENCES [drive_users]([permissionId]),
   FOREIGN KEY([lastModifyingUser]) REFERENCES [drive_users]([permissionId])
);
```
<!-- [[[end]]] -->

## Thumbnails

You can construct a thumbnail image for a known file ID using the following URL:

    https://drive.google.com/thumbnail?sz=w800-h800&id=FILE_ID

Users who are signed into Google Drive and have permission to view a file will be redirected to a thumbnail version of that file. You can tweak the `w800` and `h800` parameters to request different thumbnail sizes.

## Privacy policy

This tool requests access to your Google Drive account in order to retrieve metadata about your files there. It also offers a feature that can download the content of those files.

The credentials used to access your account are stored in the auth.json file on your computer. The metadata and content retrieved from Google Drive is also stored only on your own personal computer.

At no point to the developers of this tool gain access to any of your data.

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd google-drive-to-sqlite
    python -m venv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and test dependencies:

    pip install -e '.[test]'

To run the tests:

    pytest


",Simon Willison,,text/markdown,https://github.com/simonw/google-drive-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/google-drive-to-sqlite/,,https://pypi.org/project/google-drive-to-sqlite/,"{""CI"": ""https://github.com/simonw/google-drive-to-sqlite/actions"", ""Changelog"": ""https://github.com/simonw/google-drive-to-sqlite/releases"", ""Homepage"": ""https://github.com/simonw/google-drive-to-sqlite"", ""Issues"": ""https://github.com/simonw/google-drive-to-sqlite/issues""}",https://pypi.org/project/google-drive-to-sqlite/0.4/,"[""click"", ""httpx"", ""sqlite-utils"", ""pytest ; extra == 'test'"", ""pytest-httpx ; extra == 'test'"", ""pytest-mock ; extra == 'test'"", ""cogapp ; extra == 'test'""]",>=3.6,0.4,0,
hacker-news-to-sqlite,Create a SQLite database containing data pulled from Hacker News,[],"# hacker-news-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/hacker-news-to-sqlite.svg)](https://pypi.org/project/hacker-news-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/dogsheep/hacker-news-to-sqlite?include_prereleases&label=changelog)](https://github.com/dogsheep/hacker-news-to-sqlite/releases)
[![Tests](https://github.com/dogsheep/hacker-news-to-sqlite/workflows/Test/badge.svg)](https://github.com/dogsheep/hacker-news-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/hacker-news-to-sqlite/blob/main/LICENSE)

Create a SQLite database containing data fetched from [Hacker News](https://news.ycombinator.com/).

## How to install

    $ pip install hacker-news-to-sqlite

## Usage

    $ hacker-news-to-sqlite user hacker-news.db your-username
    Importing items:  37%|███████████                        | 845/2297 [05:09<11:02,  2.19it/s]

Imports all of your Hacker News submissions and comments into a SQLite database called `hacker-news.db`.

    $ hacker-news-to-sqlite trees hacker-news.db 22640038 22643218

Fetches the entire comments tree in which any of those content IDs appears.

## Browsing your data with Datasette

You can use [Datasette](https://datasette.readthedocs.org/) to browse your data. Install Datasette like this:

    $ pip install datasette

Now run it against your `hacker-news.db` file like so:

    $ datasette hacker-news.db

Visit `https://localhost:8001/` to search and explore your data.

You can improve the display of your data usinng the [datasette-render-timestamps](https://github.com/simonw/datasette-render-timestamps) and [datasette-render-html](https://github.com/simonw/datasette-render-html) plugins. Install them like this:

    $ pip install datasette-render-timestamps datasette-render-html

Now save the following configuration in a file called `metadata.json`:

```json
{
    ""databases"": {
        ""hacker-news"": {
            ""tables"": {
                ""items"": {
                    ""plugins"": {
                        ""datasette-render-html"": {
                            ""columns"": [
                                ""text""
                            ]
                        },
                        ""datasette-render-timestamps"": {
                            ""columns"": [
                                ""time""
                            ]
                        }
                    }
                },
                ""users"": {
                    ""plugins"": {
                        ""datasette-render-timestamps"": {
                            ""columns"": [
                                ""created""
                            ]
                        }
                    }
                }
            }
        }
    }
}
```
Run Datasette like this:

    $ datasette -m metadata.json hacker-news.db

The timestamp columns will now be rendered as human-readable dates, and any HTML in your posts will be displayed as rendered HTML.


",Simon Willison,,text/markdown,https://github.com/dogsheep/hacker-news-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/hacker-news-to-sqlite/,,https://pypi.org/project/hacker-news-to-sqlite/,"{""Homepage"": ""https://github.com/dogsheep/hacker-news-to-sqlite""}",https://pypi.org/project/hacker-news-to-sqlite/0.4/,"[""sqlite-utils"", ""click"", ""requests"", ""tqdm"", ""pytest ; extra == 'test'"", ""requests-mock ; extra == 'test'""]",,0.4,0,
inaturalist-to-sqlite,Create a SQLite database containing your observation history from iNaturalist,[],"# inaturalist-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/inaturalist-to-sqlite.svg)](https://pypi.org/project/inaturalist-to-sqlite/)
[![CircleCI](https://circleci.com/gh/dogsheep/inaturalist-to-sqlite.svg?style=svg)](https://circleci.com/gh/dogsheep/inaturalist-to-sqlite)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/dogsheep/inaturalist-to-sqlite/blob/master/LICENSE)

Create a SQLite database containing your observation history from [iNaturalist](https://www.inaturalist.org/).

## How to install

    $ pip install inaturalist-to-sqlite

## Usage

    $ inaturalist-to-sqlite inaturalist.db yourusername

(Or try `simonw` if you don't yet have an iNaturalist account)

This will import all of your iNaturalist observations into a SQLite database called `inaturalist.db`.

",Simon Willison,,text/markdown,https://github.com/dogsheep/inaturalist-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/inaturalist-to-sqlite/,,https://pypi.org/project/inaturalist-to-sqlite/,"{""Homepage"": ""https://github.com/dogsheep/inaturalist-to-sqlite""}",https://pypi.org/project/inaturalist-to-sqlite/0.2.1/,"[""sqlite-utils (>=2.0)"", ""click"", ""requests"", ""pytest ; extra == 'test'""]",,0.2.1,0,
markdown-to-sqlite,CLI tool for loading markdown files into a SQLite database,"[""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7"", ""Topic :: Database""]","# markdown-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/markdown-to-sqlite.svg)](https://pypi.python.org/pypi/markdown-to-sqlite)
[![Changelog](https://img.shields.io/github/v/release/simonw/markdown-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/markdown-to-sqlite/releases)
[![Tests](https://github.com/simonw/markdown-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/markdown-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/markdown-to-sqlite/blob/main/LICENSE)

CLI tool for loading markdown files into a SQLite database.

YAML embedded in the markdown files will be used to populate additional columns.

    Usage: markdown-to-sqlite [OPTIONS] DBNAME TABLE PATHS...

For example:

    $ markdown-to-sqlite docs.db documents file1.md file2.md

## Breaking change

Prior to version 1.0 this argument order was different - markdown files were listed before the database and table.


",Simon Willison,,text/markdown,https://github.com/simonw/markdown-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/markdown-to-sqlite/,,https://pypi.org/project/markdown-to-sqlite/,"{""CI"": ""https://github.com/simonw/markdown-to-sqlite/actions"", ""Changelog"": ""https://github.com/simonw/markdown-to-sqlite/releases"", ""Homepage"": ""https://github.com/simonw/markdown-to-sqlite"", ""Issues"": ""https://github.com/simonw/markdown-to-sqlite/issues""}",https://pypi.org/project/markdown-to-sqlite/1.0/,"[""yamldown"", ""markdown"", ""sqlite-utils"", ""click"", ""pytest ; extra == 'test'""]",>=3.6,1.0,0,
mbox-to-sqlite,Load email from .mbox files into SQLite,[],"# mbox-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/mbox-to-sqlite.svg)](https://pypi.org/project/mbox-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/simonw/mbox-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/mbox-to-sqlite/releases)
[![Tests](https://github.com/simonw/mbox-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/mbox-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/mbox-to-sqlite/blob/master/LICENSE)

Load email from .mbox files into SQLite

## Installation

Install this tool using `pip`:

    pip install mbox-to-sqlite

## Usage

Use the `mbox` command to import a `.mbox` file into a SQLite database:

    mbox-to-sqlite mbox emails.db path/to/messages.mbox

You can try this out against an example containing a sample of 3,266 emails from the [Enron corpus](https://en.wikipedia.org/wiki/Enron_Corpus) like this:

    curl -O https://raw.githubusercontent.com/ivanhb/EMA/master/server/data/mbox/enron/mbox-enron-white-s-all.mbox
    mbox-to-sqlite mbox enron.db mbox-enron-white-s-all.mbox

You can then explore the resulting database using [Datasette](https://datasette.io/):

    datasette enron.db

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd mbox-to-sqlite
    python -m venv venv
    source venv/bin/activate

Now install the dependencies and test dependencies:

    pip install -e '.[test]'

To run the tests:

    pytest
",Simon Willison,,text/markdown,https://github.com/simonw/mbox-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/mbox-to-sqlite/,,https://pypi.org/project/mbox-to-sqlite/,"{""CI"": ""https://github.com/simonw/mbox-to-sqlite/actions"", ""Changelog"": ""https://github.com/simonw/mbox-to-sqlite/releases"", ""Homepage"": ""https://github.com/simonw/mbox-to-sqlite"", ""Issues"": ""https://github.com/simonw/mbox-to-sqlite/issues""}",https://pypi.org/project/mbox-to-sqlite/0.1a0/,"[""click"", ""sqlite-utils"", ""pytest ; extra == 'test'""]",>=3.7,0.1a0,0,
pocket-to-sqlite,Create a SQLite database containing data from your Pocket account,"[""License :: OSI Approved :: Apache Software License""]","# pocket-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/pocket-to-sqlite.svg)](https://pypi.org/project/pocket-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/dogsheep/pocket-to-sqlite?include_prereleases&label=changelog)](https://github.com/dogsheep/pocket-to-sqlite/releases)
[![Tests](https://github.com/dogsheep/pocket-to-sqlite/workflows/Test/badge.svg)](https://github.com/dogsheep/pocket-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/dogsheep/pocket-to-sqlite/blob/main/LICENSE)

Create a SQLite database containing data from your [Pocket](https://getpocket.com/) account.

## How to install

    $ pip install pocket-to-sqlite

## Usage

You will need to first obtain a valid OAuth token for your Pocket account. You can do this by running the `auth` command and following the prompts:

    $ pocket-to-sqlite auth
    Visit this page and sign in with your Pocket account:

    https://getpocket.com/auth/author...

    Once you have signed in there, hit <enter> to continue
    Authentication tokens written to auth.json

Now you can fetch all of your items from Pocket like this:

    $ pocket-to-sqlite fetch pocket.db

The first time you run this command it will fetch all of your items, and display a progress bar while it does it.

On subsequent runs it will only fetch new items.

You can force it to fetch everything from the beginning again using `--all`. Use `--silent` to disable the progress bar.

## Using with Datasette

The SQLite database produced by this tool is designed to be browsed using [Datasette](https://datasette.readthedocs.io/). Use the [datasette-render-timestamps](https://github.com/simonw/datasette-render-timestamps) plugin to improve the display of the timestamp values.
",Simon Willison,,text/markdown,https://github.com/dogsheep/pocket-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/pocket-to-sqlite/,,https://pypi.org/project/pocket-to-sqlite/,"{""CI"": ""https://github.com/dogsheep/pocket-to-sqlite/actions"", ""Changelog"": ""https://github.com/dogsheep/pocket-to-sqlite/releases"", ""Homepage"": ""https://github.com/dogsheep/pocket-to-sqlite"", ""Issues"": ""https://github.com/dogsheep/pocket-to-sqlite/issues""}",https://pypi.org/project/pocket-to-sqlite/0.2.2/,"[""sqlite-utils (>=2.4.4)"", ""click"", ""requests"", ""pytest ; extra == 'test'""]",,0.2.2,0,
pypi-to-sqlite,Load data about Python packages from PyPI into SQLite,[],"# pypi-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/pypi-to-sqlite.svg)](https://pypi.org/project/pypi-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/simonw/pypi-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/pypi-to-sqlite/releases)
[![Tests](https://github.com/simonw/pypi-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/pypi-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/pypi-to-sqlite/blob/master/LICENSE)

Load data about Python packages from PyPI into SQLite

## Installation

Install this tool using `pip`:

    pip install pypi-to-sqlite

## Usage

To create a SQLite database with details of one or more packages, run:

    pypi-to-sqlite pypi.db datasette sqlite-utils

You can also process JSON that you have previously saved to disk like so:

    curl -o datasette.json https://pypi.org/pypi/datasette/json
    pypi-to-sqlite pypi.db -f datasette.json

The tool will create three tables: `packages`, `versions` and `releases`. The full table schema is shown below.

To create the tables with a prefix, use `--prefix prefix`. For example:

    pypi-to-sqlite pypi.db datasette --prefix pypi_

This will create tables called `pypi_packages`, `pypi_versions` and `pypi_releases`.

## Database schema

<!-- [[[cog
import cog, json
from pypi_to_sqlite import cli
from click.testing import CliRunner
import sqlite_utils
import tempfile, pathlib
tmpdir = pathlib.Path(tempfile.mkdtemp())
db_path = str(tmpdir / ""pypi.db"")
runner = CliRunner()
result = runner.invoke(cli.cli, [db_path, ""-f"", ""tests/datasette-block.json""])
cog.out(""```sql\n"")
cog.out(sqlite_utils.Database(db_path).schema)
cog.out(""\n```"")
]]] -->
```sql
CREATE TABLE [packages] (
   [name] TEXT PRIMARY KEY,
   [summary] TEXT,
   [classifiers] TEXT,
   [description] TEXT,
   [author] TEXT,
   [author_email] TEXT,
   [description_content_type] TEXT,
   [home_page] TEXT,
   [keywords] TEXT,
   [license] TEXT,
   [maintainer] TEXT,
   [maintainer_email] TEXT,
   [package_url] TEXT,
   [platform] TEXT,
   [project_url] TEXT,
   [project_urls] TEXT,
   [release_url] TEXT,
   [requires_dist] TEXT,
   [requires_python] TEXT,
   [version] TEXT,
   [yanked] INTEGER,
   [yanked_reason] TEXT
);
CREATE TABLE [versions] (
   [id] TEXT PRIMARY KEY,
   [package] TEXT REFERENCES [packages]([name]),
   [name] TEXT
);
CREATE TABLE [releases] (
   [md5_digest] TEXT PRIMARY KEY,
   [package] TEXT REFERENCES [packages]([name]),
   [version] TEXT REFERENCES [versions]([id]),
   [packagetype] TEXT,
   [filename] TEXT,
   [comment_text] TEXT,
   [digests] TEXT,
   [has_sig] INTEGER,
   [python_version] TEXT,
   [requires_python] TEXT,
   [size] INTEGER,
   [upload_time] TEXT,
   [upload_time_iso_8601] TEXT,
   [url] TEXT,
   [yanked] INTEGER,
   [yanked_reason] TEXT
);
```
<!-- [[[end]]] -->

## pypi-to-sqlite --help

<!-- [[[cog
result = runner.invoke(cli.cli, [""--help""])
cog.out(""```\n"")
cog.out(result.output.replace(""Usage: cli"", ""Usage: pypi-to-sqlite""))
cog.out(""\n```"")
]]] -->
```
Usage: pypi-to-sqlite [OPTIONS] DB_PATH [PACKAGE]...

  Load data about Python packages from PyPI into SQLite

  Usage example:

      pypi-to-sqlite pypy.db datasette sqlite-utils

  Use -f to load data from a JSON file instead:

      pypi-to-sqlite pypy.db -f datasette.json

  Created tables will be packages, versions and releases

  To create tables called pypi_packages, pypi_versions, pypi_releases use
  --prefix pypi_:

      pypi-to-sqlite pypy.db datasette sqlite-utils --prefix pypi_

Options:
  --version            Show the version and exit.
  -f, --file FILENAME  Import JSON from this file
  -d, --delay FLOAT    Wait this many seconds between requests
  --prefix TEXT        Prefix to use for the created database tables
  --help               Show this message and exit.

```
<!-- [[[end]]] -->

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd pypi-to-sqlite
    python -m venv venv
    source venv/bin/activate

Now install the dependencies and test dependencies:

    pip install -e '.[test]'

To run the tests:

    pytest


",Simon Willison,,text/markdown,https://github.com/simonw/pypi-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/pypi-to-sqlite/,,https://pypi.org/project/pypi-to-sqlite/,"{""CI"": ""https://github.com/simonw/pypi-to-sqlite/actions"", ""Changelog"": ""https://github.com/simonw/pypi-to-sqlite/releases"", ""Homepage"": ""https://github.com/simonw/pypi-to-sqlite"", ""Issues"": ""https://github.com/simonw/pypi-to-sqlite/issues""}",https://pypi.org/project/pypi-to-sqlite/0.2.2/,"[""click"", ""sqlite-utils"", ""httpx"", ""pytest ; extra == 'test'"", ""pytest-httpx ; extra == 'test'"", ""cogapp ; extra == 'test'""]",>=3.7,0.2.2,0,
s3-credentials,A tool for creating credentials for accessing S3 buckets,[],"# s3-credentials

[![PyPI](https://img.shields.io/pypi/v/s3-credentials.svg)](https://pypi.org/project/s3-credentials/)
[![Changelog](https://img.shields.io/github/v/release/simonw/s3-credentials?include_prereleases&label=changelog)](https://github.com/simonw/s3-credentials/releases)
[![Tests](https://github.com/simonw/s3-credentials/workflows/Test/badge.svg)](https://github.com/simonw/s3-credentials/actions?query=workflow%3ATest)
[![Documentation Status](https://readthedocs.org/projects/s3-credentials/badge/?version=latest)](https://s3-credentials.readthedocs.org/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/s3-credentials/blob/master/LICENSE)

A tool for creating credentials for accessing S3 buckets

For project background, see [s3-credentials: a tool for creating credentials for S3 buckets](https://simonwillison.net/2021/Nov/3/s3-credentials/) on my blog.

## Installation

    pip install s3-credentials

## Basic usage

To create a new S3 bucket and output credentials that can be used with only that bucket:
```
% s3-credentials create my-new-s3-bucket --create-bucket
Created bucket:  my-new-s3-bucket
Created user: s3.read-write.my-new-s3-bucket with permissions boundary: arn:aws:iam::aws:policy/AmazonS3FullAccess
Attached policy s3.read-write.my-new-s3-bucket to user s3.read-write.my-new-s3-bucket
Created access key for user: s3.read-write.my-new-s3-bucket
{
    ""UserName"": ""s3.read-write.my-new-s3-bucket"",
    ""AccessKeyId"": ""AKIAWXFXAIOZOYLZAEW5"",
    ""Status"": ""Active"",
    ""SecretAccessKey"": ""..."",
    ""CreateDate"": ""2021-11-03 01:38:24+00:00""
}
```
The tool can do a lot more than this. See the [documentation](https://s3-credentials.readthedocs.io/) for details.

## Documentation

- [Full documentation](https://s3-credentials.readthedocs.io/)
- [Command help reference](https://s3-credentials.readthedocs.io/en/stable/help.html)
- [Release notes](https://github.com/simonw/s3-credentials/releases)
",Simon Willison,,text/markdown,https://github.com/simonw/s3-credentials,,"Apache License, Version 2.0",,,https://pypi.org/project/s3-credentials/,,https://pypi.org/project/s3-credentials/,"{""CI"": ""https://github.com/simonw/s3-credentials/actions"", ""Changelog"": ""https://github.com/simonw/s3-credentials/releases"", ""Homepage"": ""https://github.com/simonw/s3-credentials"", ""Issues"": ""https://github.com/simonw/s3-credentials/issues""}",https://pypi.org/project/s3-credentials/0.14/,"[""click"", ""boto3"", ""pytest ; extra == 'test'"", ""pytest-mock ; extra == 'test'"", ""cogapp ; extra == 'test'"", ""moto[s3] ; extra == 'test'""]",>=3.6,0.14,0,
shot-scraper,A command-line utility for taking automated screenshots of websites,[],"# shot-scraper

[![PyPI](https://img.shields.io/pypi/v/shot-scraper.svg)](https://pypi.org/project/shot-scraper/)
[![Changelog](https://img.shields.io/github/v/release/simonw/shot-scraper?include_prereleases&label=changelog)](https://github.com/simonw/shot-scraper/releases)
[![Tests](https://github.com/simonw/shot-scraper/workflows/Test/badge.svg)](https://github.com/simonw/shot-scraper/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/shot-scraper/blob/master/LICENSE)

A command-line utility for taking automated screenshots of websites

For background on this project see [shot-scraper: automated screenshots for documentation, built on Playwright](https://simonwillison.net/2022/Mar/10/shot-scraper/).

## Documentation

- [Full documentation for shot-scraper](https://shot-scraper.datasette.io/)
- [Tutorial: Automating screenshots for the Datasette documentation using shot-scraper](https://simonwillison.net/2022/Oct/14/automating-screenshots/)
- [Release notes](https://github.com/simonw/shot-scraper/releases)

## Get started with GitHub Actions

To get started without installing any software, use the [shot-scraper-template](https://github.com/simonw/shot-scraper-template) template to create your own GitHub repository which takes screenshots of a page using `shot-scraper`. See [Instantly create a GitHub repository to take screenshots of a web page](https://simonwillison.net/2022/Mar/14/shot-scraper-template/) for details.

## Quick installation

You can install the `shot-scraper` CLI tool using [pip](https://pip.pypa.io/):

    pip install shot-scraper
    # Now install the browser it needs:
    shot-scraper install

## Taking your first screenshot

You can take a screenshot of a web page like this:

    shot-scraper https://datasette.io/

This will create a screenshot in a file called `datasette-io.png`.

Many more options are available, see [Taking a screenshot](https://shot-scraper.datasette.io/en/stable/screenshots.html) for details.

## Examples

- The [shot-scraper-demo](https://github.com/simonw/shot-scraper-demo) repository uses this tool to capture recently spotted owls in El Granada, CA according to [this page](https://www.owlsnearme.com/?place=127871), and to  generate an annotated screenshot illustrating a Datasette feature as described [in my blog](https://simonwillison.net/2022/Mar/10/shot-scraper/#a-complex-example).
- The [Datasette Documentation](https://docs.datasette.io/en/latest/) uses screenshots taken by `shot-scraper` running in the [simonw/datasette-screenshots](https://github.com/simonw/datasette-screenshots) GitHub repository, described in detail in [Automating screenshots for the Datasette documentation using shot-scraper](https://simonwillison.net/2022/Oct/14/automating-screenshots/).
- Ben Welsh built [@newshomepages](https://twitter.com/newshomepages), a Twitter bot that uses `shot-scraper` and GitHub Actions to take screenshots of news website homepages and publish them to Twitter. The code for that lives in [palewire/news-homepages](https://github.com/palewire/news-homepages).
- [scrape-hacker-news-by-domain](https://github.com/simonw/scrape-hacker-news-by-domain) uses `shot-scraper javascript` to scrape a web page. See [Scraping web pages from the command-line with shot-scraper](https://simonwillison.net/2022/Mar/14/scraping-web-pages-shot-scraper/) for details of how this works.
",Simon Willison,,text/markdown,https://github.com/simonw/shot-scraper,,"Apache License, Version 2.0",,,https://pypi.org/project/shot-scraper/,,https://pypi.org/project/shot-scraper/,"{""CI"": ""https://github.com/simonw/shot-scraper/actions"", ""Changelog"": ""https://github.com/simonw/shot-scraper/releases"", ""Homepage"": ""https://github.com/simonw/shot-scraper"", ""Issues"": ""https://github.com/simonw/shot-scraper/issues""}",https://pypi.org/project/shot-scraper/1.0.1/,"[""click"", ""PyYAML"", ""playwright"", ""click-default-group"", ""pytest ; extra == 'test'"", ""cogapp ; extra == 'test'"", ""pytest-mock ; extra == 'test'""]",>=3.7,1.0.1,0,
sphinx-to-sqlite,Create a SQLite database from Sphinx documentation,[],"# sphinx-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/sphinx-to-sqlite.svg)](https://pypi.org/project/sphinx-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/simonw/sphinx-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/sphinx-to-sqlite/releases)
[![Tests](https://github.com/simonw/sphinx-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/sphinx-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sphinx-to-sqlite/blob/master/LICENSE)

Create a SQLite database from Sphinx documentation.

## Demo

You can see the results of running this tool against the [Datasette documentation](https://docs.datasette.io/) at https://latest-docs.datasette.io/docs/sections

## Installation

Install this tool using `pip`:

    $ pip install sphinx-to-sqlite

## Usage

First run `sphinx-build` with the `-b xml` option to create XML files in your `_build/` directory.

Then run:

    $ sphinx-to-sqlite docs.db path/to/_build

To build the SQLite database.

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd sphinx-to-sqlite
    python -mvenv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and tests:

    pip install -e '.[test]'

To run the tests:

    pytest


",Simon Willison,,text/markdown,https://github.com/simonw/sphinx-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/sphinx-to-sqlite/,,https://pypi.org/project/sphinx-to-sqlite/,"{""CI"": ""https://github.com/simonw/sphinx-to-sqlite/actions"", ""Changelog"": ""https://github.com/simonw/sphinx-to-sqlite/releases"", ""Homepage"": ""https://github.com/simonw/sphinx-to-sqlite"", ""Issues"": ""https://github.com/simonw/sphinx-to-sqlite/issues""}",https://pypi.org/project/sphinx-to-sqlite/0.1a1/,"[""click"", ""sqlite-utils"", ""pytest ; extra == 'test'""]",,0.1a1,0,
sqlite-comprehend,Tools for running data in a SQLite database through AWS Comprehend,[],"# sqlite-comprehend

[![PyPI](https://img.shields.io/pypi/v/sqlite-comprehend.svg)](https://pypi.org/project/sqlite-comprehend/)
[![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-comprehend?include_prereleases&label=changelog)](https://github.com/simonw/sqlite-comprehend/releases)
[![Tests](https://github.com/simonw/sqlite-comprehend/workflows/Test/badge.svg)](https://github.com/simonw/sqlite-comprehend/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sqlite-comprehend/blob/master/LICENSE)

Tools for running data in a SQLite database through [AWS Comprehend](https://aws.amazon.com/comprehend/)

See [sqlite-comprehend: run AWS entity extraction against content in a SQLite database](https://simonwillison.net/2022/Jul/11/sqlite-comprehend/) for background on this project.

## Installation

Install this tool using `pip`:

    pip install sqlite-comprehend

## Demo

You can see examples of tables generated using this command here:

- [comprehend_entities](https://datasette.simonwillison.net/simonwillisonblog/comprehend_entities) - the extracted entities, classified by type
- [blog_entry_comprehend_entities](https://datasette.simonwillison.net/simonwillisonblog/blog_entry_comprehend_entities) - a table relating entities to the entries that they appear in
- [comprehend_entity_types](https://datasette.simonwillison.net/simonwillisonblog/comprehend_entity_types) - a small lookup table of entity types

## Configuration

You will need AWS credentials with the `comprehend:BatchDetectEntities` [IAM permission](https://docs.aws.amazon.com/comprehend/latest/dg/access-control-managing-permissions.html).

You can configure credentials [using these instructions](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html). You can also save them to a JSON or INI configuration file and pass them to the command using `-a credentials.ini`, or pass them using the `--access-key` and `--secret-key` options.

## Entity extraction

The `sqlite-comprehend entities` command runs entity extraction against every row in the specified table and saves the results to your database.

Specify the database, the table and one or more columns containing text in that table. The following runs against the `text` column in the `pages` table of the `sfms.db` SQLite database:

    sqlite-comprehend sfms.db pages text

Results will be written into a `pages_comprehend_entities` table. Change the name of the output table by passing `-o other_table_name`.

You can run against a subset of rows by adding a `--where` clause:

    sqlite-comprehend sfms.db pages text --where 'id < 10'

You can also used named parameters in your `--where` clause:

    sqlite-comprehend sfms.db pages text --where 'id < :maxid' -p maxid 10

Only the first 5,000 characters for each row will be considered. Be sure to review [Comprehend's pricing](https://aws.amazon.com/comprehend/pricing/) - which starts at $0.0001 per hundred characters.

If your context includes HTML tags, you can strip them out before extracting entities by adding `--strip-tags`:

    sqlite-comprehend sfms.db pages text --strip-tags

Rows that have been processed are recorded in the `pages_comprehend_entities_done` table. If you run the command more than once it will only process rows that have been newly added.

You can delete records from that `_done` table to run them again.

### sqlite-comprehend entities --help

<!-- [[[cog
from click.testing import CliRunner
from sqlite_comprehend import cli
runner = CliRunner()
result = runner.invoke(cli.cli, [""entities"", ""--help""])
help = result.output.replace(""Usage: cli"", ""Usage: sqlite-comprehend"")
cog.out(
    ""```\n{}\n```"".format(help)
)
]]] -->
```
Usage: sqlite-comprehend entities [OPTIONS] DATABASE TABLE COLUMNS...

  Detect entities in columns in a table

  To extract entities from columns text1 and text2 in mytable:

      sqlite-comprehend entities my.db mytable text1 text2

  To run against just a subset of the rows in the table, add:

      --where ""id < :max_id"" -p max_id 50

  Results will be written to a table called mytable_comprehend_entities

  To specify a different output table, use -o custom_table_name

Options:
  --where TEXT                WHERE clause to filter table
  -p, --param <TEXT TEXT>...  Named :parameters for SQL query
  -o, --output TEXT           Custom output table
  -r, --reset                 Start from scratch, deleting previous results
  --strip-tags                Strip HTML tags before extracting entities
  --access-key TEXT           AWS access key ID
  --secret-key TEXT           AWS secret access key
  --session-token TEXT        AWS session token
  --endpoint-url TEXT         Custom endpoint URL
  -a, --auth FILENAME         Path to JSON/INI file containing credentials
  --help                      Show this message and exit.

```
<!-- [[[end]]] -->

## Schema

Assuming an input table called `pages` the tables created by this tool will have the following schema:

<!-- [[[cog
import cog, json
from sqlite_comprehend import cli
from unittest.mock import patch
from click.testing import CliRunner
import sqlite_utils
import tempfile, pathlib
tmpdir = pathlib.Path(tempfile.mkdtemp())
db_path = str(tmpdir / ""data.db"")
db = sqlite_utils.Database(db_path)
db[""pages""].insert_all(
    [
        {
            ""id"": 1,
            ""text"": ""John Bob"",
        },
        {
            ""id"": 2,
            ""text"": ""Sandra X"",
        },
    ],
    pk=""id"",
)
with patch('boto3.client') as client:
    client.return_value.batch_detect_entities.return_value = {
        ""ResultList"": [
            {
                ""Index"": 0,
                ""Entities"": [
                    {
                        ""Score"": 0.8,
                        ""Type"": ""PERSON"",
                        ""Text"": ""John Bob"",
                        ""BeginOffset"": 0,
                        ""EndOffset"": 5,
                    },
                ],
            },
            {
                ""Index"": 1,
                ""Entities"": [
                    {
                        ""Score"": 0.8,
                        ""Type"": ""PERSON"",
                        ""Text"": ""Sandra X"",
                        ""BeginOffset"": 0,
                        ""EndOffset"": 5,
                    },
                ],
            },
        ],
        ""ErrorList"": [],
    }
    runner = CliRunner()
    result = runner.invoke(cli.cli, [
        ""entities"", db_path, ""pages"", ""text""
    ])
cog.out(""```sql\n"")
cog.out(db.schema)
cog.out(""\n```"")
]]] -->
```sql
CREATE TABLE [pages] (
   [id] INTEGER PRIMARY KEY,
   [text] TEXT
);
CREATE TABLE [comprehend_entity_types] (
   [id] INTEGER PRIMARY KEY,
   [value] TEXT
);
CREATE TABLE [comprehend_entities] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT,
   [type] INTEGER REFERENCES [comprehend_entity_types]([id])
);
CREATE TABLE [pages_comprehend_entities] (
   [id] INTEGER REFERENCES [pages]([id]),
   [score] FLOAT,
   [entity] INTEGER REFERENCES [comprehend_entities]([id]),
   [begin_offset] INTEGER,
   [end_offset] INTEGER
);
CREATE UNIQUE INDEX [idx_comprehend_entity_types_value]
    ON [comprehend_entity_types] ([value]);
CREATE UNIQUE INDEX [idx_comprehend_entities_type_name]
    ON [comprehend_entities] ([type], [name]);
CREATE TABLE [pages_comprehend_entities_done] (
   [id] INTEGER PRIMARY KEY REFERENCES [pages]([id])
);
```
<!-- [[[end]]] -->

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd sqlite-comprehend
    python -m venv venv
    source venv/bin/activate

Now install the dependencies and test dependencies:

    pip install -e '.[test]'

To run the tests:

    pytest
",Simon Willison,,text/markdown,https://github.com/simonw/sqlite-comprehend,,"Apache License, Version 2.0",,,https://pypi.org/project/sqlite-comprehend/,,https://pypi.org/project/sqlite-comprehend/,"{""CI"": ""https://github.com/simonw/sqlite-comprehend/actions"", ""Changelog"": ""https://github.com/simonw/sqlite-comprehend/releases"", ""Homepage"": ""https://github.com/simonw/sqlite-comprehend"", ""Issues"": ""https://github.com/simonw/sqlite-comprehend/issues""}",https://pypi.org/project/sqlite-comprehend/0.2.2/,"[""click"", ""boto3"", ""sqlite-utils"", ""pytest ; extra == 'test'"", ""pytest-mock ; extra == 'test'"", ""cogapp ; extra == 'test'""]",>=3.7,0.2.2,0,
sqlite-diffable,Tools for dumping/loading a SQLite database to diffable directory structure,"[""Development Status :: 3 - Alpha"", ""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7"", ""Topic :: Database""]","# sqlite-diffable

[![PyPI](https://img.shields.io/pypi/v/sqlite-diffable.svg)](https://pypi.org/project/sqlite-diffable/)
[![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-diffable?include_prereleases&label=changelog)](https://github.com/simonw/sqlite-diffable/releases)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sqlite-diffable/blob/main/LICENSE)

Tools for dumping/loading a SQLite database to diffable directory structure

## Installation

    pip install sqlite-diffable

## Demo

The repository at [simonw/simonwillisonblog-backup](https://github.com/simonw/simonwillisonblog-backup) contains a backup of the database on my blog, https://simonwillison.net/ - created using this tool.

## Dumping a database

Given a SQLite database called `fixtures.db` containing a table `facetable`, the following will dump out that table to the `dump/` directory:

    sqlite-diffable dump fixtures.db dump/ facetable

To dump out every table in that database, use `--all`:

    sqlite-diffable dump fixtures.db dump/ --all

## Loading a database

To load a previously dumped database, run the following:

    sqlite-diffable load restored.db dump/

This will show an error if any of the tables that are being restored already exist in the database file.

You can replace those tables (dropping them before restoring them) using the `--replace` option:

    sqlite-diffable load restored.db dump/ --replace

## Converting to JSON objects

Table rows are stored in the `.ndjson` files as newline-delimited JSON arrays, like this:

```
[""a"", ""a"", ""a-a"", 63, null, 0.7364712141640124, ""$null""]
[""a"", ""b"", ""a-b"", 51, null, 0.6020187290499803, ""$null""]
```

Sometimes it can be more convenient to work with a list of JSON objects.

The `sqlite-diffable objects` command can read a `.ndjson` file and its accompanying `.metadata.json` file and output JSON objects to standard output:

    sqlite-diffable objects fixtures.db dump/sortable.ndjson

The output of that command looks something like this:
```
{""pk1"": ""a"", ""pk2"": ""a"", ""content"": ""a-a"", ""sortable"": 63, ""sortable_with_nulls"": null, ""sortable_with_nulls_2"": 0.7364712141640124, ""text"": ""$null""}
{""pk1"": ""a"", ""pk2"": ""b"", ""content"": ""a-b"", ""sortable"": 51, ""sortable_with_nulls"": null, ""sortable_with_nulls_2"": 0.6020187290499803, ""text"": ""$null""}
```

Add `-o` to write that output to a file:

    sqlite-diffable objects fixtures.db dump/sortable.ndjson -o output.txt

Add `--array` to output a JSON array of objects, as opposed to a newline-delimited file:

    sqlite-diffable objects fixtures.db dump/sortable.ndjson --array
Output:
```
[
{""pk1"": ""a"", ""pk2"": ""a"", ""content"": ""a-a"", ""sortable"": 63, ""sortable_with_nulls"": null, ""sortable_with_nulls_2"": 0.7364712141640124, ""text"": ""$null""},
{""pk1"": ""a"", ""pk2"": ""b"", ""content"": ""a-b"", ""sortable"": 51, ""sortable_with_nulls"": null, ""sortable_with_nulls_2"": 0.6020187290499803, ""text"": ""$null""}
]
```

## Storage format

Each table is represented as two files. The first, `table_name.metadata.json`, contains metadata describing the structure of the table. For a table called `redirects_redirect` that file might look like this:

```json
{
    ""name"": ""redirects_redirect"",
    ""columns"": [
        ""id"",
        ""domain"",
        ""path"",
        ""target"",
        ""created""
    ],
    ""schema"": ""CREATE TABLE [redirects_redirect] (\n   [id] INTEGER PRIMARY KEY,\n   [domain] TEXT,\n   [path] TEXT,\n   [target] TEXT,\n   [created] TEXT\n)""
}
```

It is an object with three keys: `name` is the name of the table, `columns` is an array of column strings and `schema` is the SQL schema text used for tha table.

The second file, `table_name.ndjson`, contains [newline-delimited JSON](http://ndjson.org/) for every row in the table. Each row is represented as a JSON array with items corresponding to each of the columns defined in the metadata.

That file for the `redirects_redirect.ndjson` table might look like this:

```
[1, ""feeds.simonwillison.net"", ""swn-everything"", ""https://simonwillison.net/atom/everything/"", ""2017-10-01T21:11:36.440537+00:00""]
[2, ""feeds.simonwillison.net"", ""swn-entries"", ""https://simonwillison.net/atom/entries/"", ""2017-10-01T21:12:32.478849+00:00""]
[3, ""feeds.simonwillison.net"", ""swn-links"", ""https://simonwillison.net/atom/links/"", ""2017-10-01T21:12:54.820729+00:00""]
```
",Simon Willison,,text/markdown,https://github.com/simonw/sqlite-diffable,,"Apache License, Version 2.0",,,https://pypi.org/project/sqlite-diffable/,,https://pypi.org/project/sqlite-diffable/,"{""CI"": ""https://github.com/simonw/sqlite-diffable/actions"", ""Changelog"": ""https://github.com/simonw/sqlite-diffable/releases"", ""Homepage"": ""https://github.com/simonw/sqlite-diffable"", ""Issues"": ""https://github.com/simonw/sqlite-diffable/issues""}",https://pypi.org/project/sqlite-diffable/0.5/,"[""click"", ""sqlite-utils"", ""pytest ; extra == 'test'"", ""black ; extra == 'test'""]",,0.5,0,
sqlite-generate,Tool for generating demo SQLite databases,[],"# sqlite-generate

[![PyPI](https://img.shields.io/pypi/v/sqlite-generate.svg)](https://pypi.org/project/sqlite-generate/)
[![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-generate?label=changelog)](https://github.com/simonw/sqlite-generate/releases)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sqlite-generate/blob/master/LICENSE)

Tool for generating demo SQLite databases

## Installation

Install this plugin using `pip`:

    $ pip install sqlite-generate

## Demo

You can see a demo of the database generated using this command running in [Datasette](https://github.com/simonw/datasette) at https://sqlite-generate-demo.datasette.io/

The demo is generated using the following command:

    sqlite-generate demo.db --seed seed --fts --columns=10 --fks=0,3 --pks=0,2

## Usage

To generate a SQLite database file called `data.db` with 10 randomly named tables in it, run the following:

    sqlite-generate data.db

You can use the `--tables` option to generate a different number of tables:

    sqlite-generate data.db --tables 20

You can run the command against the same database file multiple times to keep adding new tables, using different settings for each batch of generated tables.

By default each table will contain a random number of rows between 0 and 200. You can customize this with the `--rows` option:

    sqlite-generate data.db --rows 20

This will insert 20 rows into each table.

    sqlite-generate data.db --rows 500,2000

This inserts a random number of rows between 500 and 2000 into each table.

Each table will have 5 columns. You can change this using `--columns`:

    sqlite-generate data.db --columns 10

`--columns` can also accept a range:

    sqlite-generate data.db --columns 5,15

You can control the random number seed used with the `--seed` option. This will result in the exact same database file being created by multiple runs of the tool:

    sqlite-generate data.db --seed=myseed

By default each table will contain between 0 and 2 foreign key columns to other tables. You can control this using the `--fks` option, with either a single number or a range:

    sqlite-generate data.db --columns=20 --fks=5,15

Each table will have a single primary key column called `id`. You can use the `--pks=` option to change the number of primary key columns on each table. Drop it to 0 to generate [rowid tables](https://www.sqlite.org/rowidtable.html). Increase it above 1 to generate tables with compound primary keys. Or use a range to get a random selection of different primary key layouts:

    sqlite-generate data.db --pks=0,2

To configure [SQLite full-text search](https://www.sqlite.org/fts5.html) for all columns of type text, use `--fts`:

    sqlite-generate data.db --fts

This will use FTS5 by default. To use [FTS4](https://www.sqlite.org/fts3.html) instead, use `--fts4`.

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd sqlite-generate
    python -mvenv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and tests:

    pip install -e '.[test]'

To run the tests:

    pytest


",Simon Willison,,text/markdown,https://github.com/simonw/sqlite-generate,,"Apache License, Version 2.0",,,https://pypi.org/project/sqlite-generate/,,https://pypi.org/project/sqlite-generate/,"{""CI"": ""https://github.com/simonw/sqlite-generate/actions"", ""Changelog"": ""https://github.com/simonw/sqlite-generate/releases"", ""Homepage"": ""https://github.com/simonw/sqlite-generate"", ""Issues"": ""https://github.com/simonw/sqlite-generate/issues""}",https://pypi.org/project/sqlite-generate/1.1.1/,"[""click"", ""Faker"", ""sqlite-utils"", ""pytest ; extra == 'test'""]",,1.1.1,0,
sqlite-transform,Tool for running transformations on columns in a SQLite database.,[],"# sqlite-transform

![No longer maintained](https://img.shields.io/badge/no%20longer-maintained-red)
[![PyPI](https://img.shields.io/pypi/v/sqlite-transform.svg)](https://pypi.org/project/sqlite-transform/)
[![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-transform?include_prereleases&label=changelog)](https://github.com/simonw/sqlite-transform/releases)
[![Tests](https://github.com/simonw/sqlite-transform/workflows/Test/badge.svg)](https://github.com/simonw/sqlite-transform/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/dogsheep/sqlite-transform/blob/main/LICENSE)

Tool for running transformations on columns in a SQLite database.

> **:warning: This tool is no longer maintained**
>
> I added a new tool to [sqlite-utils](https://sqlite-utils.datasette.io/) called [sqlite-utils convert](https://sqlite-utils.datasette.io/en/stable/cli.html#converting-data-in-columns) which provides a super-set of the functionality originally provided here. `sqlite-transform` is no longer maintained, and I recommend switching to using `sqlite-utils convert` instead.

## How to install

    pip install sqlite-transform

## parsedate and parsedatetime

These subcommands will run all values in the specified column through `dateutils.parser.parse()` and replace them with the result, formatted as an ISO timestamp or ISO date.

For example, if a row in the database has an `opened` column which contains `10/10/2019 08:10:00 PM`, running the following command:

    sqlite-transform parsedatetime my.db mytable opened

Will result in that value being replaced by `2019-10-10T20:10:00`.

Using the `parsedate` subcommand here would result in `2019-10-10` instead.

In the case of ambiguous dates such as `03/04/05` these commands both default to assuming American-style `mm/dd/yy` format. You can pass `--dayfirst` to specify that the day should be assumed to be first, or `--yearfirst` for the year.

## jsonsplit

The `jsonsplit` subcommand takes columns that contain a comma-separated list, for example a `tags` column containing records like `""trees,park,dogs""` and converts it into a JSON array `[""trees"", ""park"", ""dogs""]`.

This is useful for taking advantage of Datasette's [Facet by JSON array](https://docs.datasette.io/en/stable/facets.html#facet-by-json-array) feature.

    sqlite-transform jsonsplit my.db mytable tags

It defaults to splitting on commas, but you can specify a different delimiter character using the `--delimiter` option, for example:

    sqlite-transform jsonsplit \
        my.db mytable tags --delimiter ';'

Values within the array will be treated as strings, so a column containing `123,552,775` will be converted into the JSON array `[""123"", ""552"", ""775""]`.

You can specify a different type for these values using `--type int` or `--type float`, for example:

    sqlite-transform jsonsplit \
        my.db mytable tags --type int

This will result in that column being converted into `[123, 552, 775]`.

## lambda for executing your own code

The `lambda` subcommand lets you specify Python code which will be executed against the column.

Here's how to convert a column to uppercase:

    sqlite-transform lambda my.db mytable mycolumn --code='str(value).upper()'

The code you provide will be compiled into a function that takes `value` as a single argument. You can break your function body into multiple lines, provided the last line is a `return` statement:

    sqlite-transform lambda my.db mytable mycolumn --code='value = str(value)
    return value.upper()'

You can also specify Python modules that should be imported and made available to your code using one or more `--import` options:

    sqlite-transform lambda my.db mytable mycolumn \
        --code='""\n"".join(textwrap.wrap(value, 10))' \
        --import=textwrap

The `--dry-run` option will output a preview of the transformation against the first ten rows, without modifying the database.

## Saving the result to a separate column

Each of these commands accepts optional `--output` and `--output-type` options. These can be used to save the result of the transformation to a separate column, which will be created if the column does not already exist.

To save the result of `jsonsplit` to a new column called `json_tags`, use the following:

    sqlite-transform jsonsplit my.db mytable tags \
      --output json_tags

The type of the created column defaults to `text`, but a different column type can be specified using `--output-type`. This example will create a new floating point column called `float_id` with a copy of each item's ID increased by 0.5:

    sqlite-transform lambda my.db mytable id \
      --code 'float(value) + 0.5' \
      --output float_id \
      --output-type float

You can drop the original column at the end of the operation by adding `--drop`.

## Splitting a column into multiple columns

Sometimes you may wish to convert a single column into multiple derived columns. For example, you may have a `location` column containing `latitude,longitude` values which you wish to split out into separate `latitude` and `longitude` columns.

You can achieve this using the `--multi` option to `sqlite-transform lambda`. This option expects your `--code` function to return a Python dictionary: new columns well be created and populated for each of the keys in that dictionary.

For the `latitude,longitude` example you would use the following:

    sqlite-transform lambda demo.db places location \
      --code 'return {
        ""latitude"": float(value.split("","")[0]),
        ""longitude"": float(value.split("","")[1]),
      }' --multi

The type of the returned values will be taken into account when creating the new columns. In this example, the resulting database schema will look like this:

```sql
CREATE TABLE [places] (
    [location] TEXT,
    [latitude] FLOAT,
    [longitude] FLOAT
);
```
The code function can also return `None`, in which case its output will be ignored.

You can drop the original column at the end of the operation by adding `--drop`.

## Disabling the progress bar

By default each command will show a progress bar. Pass `-s` or `--silent` to hide that progress bar.


",Simon Willison,,text/markdown,https://github.com/simonw/sqlite-transform,,"Apache License, Version 2.0",,,https://pypi.org/project/sqlite-transform/,,https://pypi.org/project/sqlite-transform/,"{""Homepage"": ""https://github.com/simonw/sqlite-transform""}",https://pypi.org/project/sqlite-transform/1.2.1/,"[""dateutils"", ""tqdm"", ""click"", ""sqlite-utils"", ""pytest ; extra == 'test'""]",,1.2.1,0,
sqlite-utils,CLI tool and Python utility functions for manipulating SQLite databases,"[""Development Status :: 5 - Production/Stable"", ""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.10"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7"", ""Programming Language :: Python :: 3.8"", ""Programming Language :: Python :: 3.9"", ""Topic :: Database""]","# sqlite-utils

[![PyPI](https://img.shields.io/pypi/v/sqlite-utils.svg)](https://pypi.org/project/sqlite-utils/)
[![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-utils?include_prereleases&label=changelog)](https://sqlite-utils.datasette.io/en/stable/changelog.html)
[![Python 3.x](https://img.shields.io/pypi/pyversions/sqlite-utils.svg?logo=python&logoColor=white)](https://pypi.org/project/sqlite-utils/)
[![Tests](https://github.com/simonw/sqlite-utils/workflows/Test/badge.svg)](https://github.com/simonw/sqlite-utils/actions?query=workflow%3ATest)
[![Documentation Status](https://readthedocs.org/projects/sqlite-utils/badge/?version=stable)](http://sqlite-utils.datasette.io/en/stable/?badge=stable)
[![codecov](https://codecov.io/gh/simonw/sqlite-utils/branch/main/graph/badge.svg)](https://codecov.io/gh/simonw/sqlite-utils)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sqlite-utils/blob/main/LICENSE)
[![discord](https://img.shields.io/discord/823971286308356157?label=discord)](https://discord.gg/Ass7bCAMDw)

Python CLI utility and library for manipulating SQLite databases.

## Some feature highlights

- [Pipe JSON](https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-json-data) (or [CSV or TSV](https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-csv-or-tsv-data)) directly into a new SQLite database file, automatically creating a table with the appropriate schema
- [Run in-memory SQL queries](https://sqlite-utils.datasette.io/en/stable/cli.html#querying-data-directly-using-an-in-memory-database), including joins, directly against data in CSV, TSV or JSON files and view the results
- [Configure SQLite full-text search](https://sqlite-utils.datasette.io/en/stable/cli.html#configuring-full-text-search) against your database tables and run search queries against them, ordered by relevance
- Run [transformations against your tables](https://sqlite-utils.datasette.io/en/stable/cli.html#transforming-tables) to make schema changes that SQLite `ALTER TABLE` does not directly support, such as changing the type of a column
- [Extract columns](https://sqlite-utils.datasette.io/en/stable/cli.html#extracting-columns-into-a-separate-table) into separate tables to better normalize your existing data

Read more on my blog, in this series of posts on [New features in sqlite-utils](https://simonwillison.net/series/sqlite-utils-features/) and other [entries tagged sqliteutils](https://simonwillison.net/tags/sqliteutils/).

## Installation

    pip install sqlite-utils

Or if you use [Homebrew](https://brew.sh/) for macOS:

    brew install sqlite-utils

## Using as a CLI tool

Now you can do things with the CLI utility like this:

    $ sqlite-utils memory dogs.csv ""select * from t""
    [{""id"": 1, ""age"": 4, ""name"": ""Cleo""},
     {""id"": 2, ""age"": 2, ""name"": ""Pancakes""}]

    $ sqlite-utils insert dogs.db dogs dogs.csv --csv
    [####################################]  100%

    $ sqlite-utils tables dogs.db --counts
    [{""table"": ""dogs"", ""count"": 2}]

    $ sqlite-utils dogs.db ""select id, name from dogs""
    [{""id"": 1, ""name"": ""Cleo""},
     {""id"": 2, ""name"": ""Pancakes""}]

    $ sqlite-utils dogs.db ""select * from dogs"" --csv
    id,age,name
    1,4,Cleo
    2,2,Pancakes

    $ sqlite-utils dogs.db ""select * from dogs"" --table
      id    age  name
    ----  -----  --------
       1      4  Cleo
       2      2  Pancakes

You can import JSON data into a new database table like this:

    $ curl https://api.github.com/repos/simonw/sqlite-utils/releases \
        | sqlite-utils insert releases.db releases - --pk id

Or for data in a CSV file:

    $ sqlite-utils insert dogs.db dogs dogs.csv --csv

`sqlite-utils memory` lets you import CSV or JSON data into an in-memory database and run SQL queries against it in a single command:

    $ cat dogs.csv | sqlite-utils memory - ""select name, age from stdin""

See the [full CLI documentation](https://sqlite-utils.datasette.io/en/stable/cli.html) for comprehensive coverage of many more commands.

## Using as a library

You can also `import sqlite_utils` and use it as a Python library like this:

```python
import sqlite_utils
db = sqlite_utils.Database(""demo_database.db"")
# This line creates a ""dogs"" table if one does not already exist:
db[""dogs""].insert_all([
    {""id"": 1, ""age"": 4, ""name"": ""Cleo""},
    {""id"": 2, ""age"": 2, ""name"": ""Pancakes""}
], pk=""id"")
```

Check out the [full library documentation](https://sqlite-utils.datasette.io/en/stable/python-api.html) for everything else you can do with the Python library.

## Related projects

* [Datasette](https://datasette.io/): A tool for exploring and publishing data
* [csvs-to-sqlite](https://github.com/simonw/csvs-to-sqlite): Convert CSV files into a SQLite database
* [db-to-sqlite](https://github.com/simonw/db-to-sqlite): CLI tool for exporting a MySQL or PostgreSQL database as a SQLite file
* [dogsheep](https://dogsheep.github.io/): A family of tools for personal analytics, built on top of `sqlite-utils`
",Simon Willison,,text/markdown,https://github.com/simonw/sqlite-utils,,"Apache License, Version 2.0",,,https://pypi.org/project/sqlite-utils/,,https://pypi.org/project/sqlite-utils/,"{""CI"": ""https://github.com/simonw/sqlite-utils/actions"", ""Changelog"": ""https://sqlite-utils.datasette.io/en/stable/changelog.html"", ""Documentation"": ""https://sqlite-utils.datasette.io/en/stable/"", ""Homepage"": ""https://github.com/simonw/sqlite-utils"", ""Issues"": ""https://github.com/simonw/sqlite-utils/issues"", ""Source code"": ""https://github.com/simonw/sqlite-utils""}",https://pypi.org/project/sqlite-utils/3.30/,"[""sqlite-fts4"", ""click"", ""click-default-group-wheel"", ""tabulate"", ""python-dateutil"", ""furo ; extra == 'docs'"", ""sphinx-autobuild ; extra == 'docs'"", ""codespell ; extra == 'docs'"", ""sphinx-copybutton ; extra == 'docs'"", ""beanbag-docutils (>=2.0) ; extra == 'docs'"", ""flake8 ; extra == 'flake8'"", ""mypy ; extra == 'mypy'"", ""types-click ; extra == 'mypy'"", ""types-tabulate ; extra == 'mypy'"", ""types-python-dateutil ; extra == 'mypy'"", ""data-science-types ; extra == 'mypy'"", ""pytest ; extra == 'test'"", ""black ; extra == 'test'"", ""hypothesis ; extra == 'test'"", ""cogapp ; extra == 'test'""]",>=3.6,3.30,0,
swarm-to-sqlite,Create a SQLite database containing your checkin history from Foursquare Swarm,[],"# swarm-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/swarm-to-sqlite.svg)](https://pypi.org/project/swarm-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/dogsheep/swarm-to-sqlite?include_prereleases&label=changelog)](https://github.com/dogsheep/swarm-to-sqlite/releases)
[![Tests](https://github.com/dogsheep/swarm-to-sqlite/workflows/Test/badge.svg)](https://github.com/dogsheep/swarm-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/dogsheep/swarm-to-sqlite/blob/main/LICENSE)

Create a SQLite database containing your checkin history from Foursquare Swarm.

## How to install

    $ pip install swarm-to-sqlite

## Usage

You will need to first obtain a valid OAuth token for your Foursquare account. You can do so using this tool: https://your-foursquare-oauth-token.glitch.me/

Simplest usage is to simply provide the name of the database file you wish to write to. The tool will prompt you to paste in your token, and will then download your checkins and store them in the specified database file.

    $ swarm-to-sqlite checkins.db
    Please provide your Foursquare OAuth token:
    Importing 3699 checkins  [#########-----------------------] 27% 00:02:31

You can also pass the token as a command-line option:

    $ swarm-to-sqlite checkins.db --token=XXX

Or as an environment variable:

    $ export FOURSQUARE_TOKEN=XXX
    $ swarm-to-sqlite checkins.db

To retrieve just checkins within the past X hours, days or weeks, use the `--since=` option. For example, to pull only checkins that happened within the last 10 days use:

    $ swarm-to-sqlite checkins.db --token=XXX --since=10d

Use `2w` for two weeks, `10h` for ten hours, `3d` for three days.

In addition to saving the checkins to a database, you can also write them to a JSON file using the `--save` option:

    $ swarm-to-sqlite checkins.db --save=checkins.json

Having done this, you can re-import checkins directly from that file (rather than making API calls to fetch data from Foursquare) like this:

    $ swarm-to-sqlite checkins.db --load=checkins.json

## Using with Datasette

The SQLite database produced by this tool is designed to be browsed using [Datasette](https://datasette.io/).

You can install the [datasette-cluster-map](https://datasette.io/plugins/datasette-cluster-map) plugin to view your checkins on a map.


",Simon Willison,,text/markdown,https://github.com/dogsheep/swarm-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/swarm-to-sqlite/,,https://pypi.org/project/swarm-to-sqlite/,"{""CI"": ""https://github.com/dogsheep/swarm-to-sqlite/actions"", ""Changelog"": ""https://github.com/dogsheep/swarm-to-sqlite/releases"", ""Homepage"": ""https://github.com/dogsheep/swarm-to-sqlite"", ""Issues"": ""https://github.com/dogsheep/swarm-to-sqlite/issues""}",https://pypi.org/project/swarm-to-sqlite/0.3.3/,"[""sqlite-utils (>=3.3)"", ""click"", ""requests"", ""pytest ; extra == 'test'""]",,0.3.3,0,
tableau-to-sqlite,Fetch data from Tableau into a SQLite database,[],"# tableau-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/tableau-to-sqlite.svg)](https://pypi.org/project/tableau-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/simonw/tableau-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/tableau-to-sqlite/releases)
[![Tests](https://github.com/simonw/tableau-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/tableau-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/tableau-to-sqlite/blob/master/LICENSE)

Fetch data from Tableau into a SQLite database. A wrapper around [TableauScraper](https://github.com/bertrandmartel/tableau-scraping/).

## Installation

Install this tool using `pip`:

    $ pip install tableau-to-sqlite

## Usage

If you have the URL to a Tableau dashboard like this:

https://results.mo.gov/t/COVID19/views/VaccinationsDashboard/Vaccinations

You can pass that directly to the tool:

    tableau-to-sqlite tableau.db \
      https://results.mo.gov/t/COVID19/views/VaccinationsDashboard/Vaccinations

This will create a SQLite database called `tableau.db` containing one table for each of the worksheets in that dashboard.

If the dashboard is hosted on https://public.tableau.com/ you can instead provide the view name. This will be two strings separated by a `/` symbol - something like this:

    OregonCOVID-19VaccineProviderEnrollment/COVID-19VaccineProviderEnrollment

Now run the tool like this:

    tableau-to-sqlite tableau.db \
        OregonCOVID-19VaccineProviderEnrollment/COVID-19VaccineProviderEnrollment

## Get the data as JSON or CSV

If you're building a [git scraper](https://simonwillison.net/2020/Oct/9/git-scraping/) you may want to convert the data gathered by this tool to CSV or JSON to check into your repository.

You can do that using [sqlite-utils](https://sqlite-utils.datasette.io/). Install it using `pip`:

    pip install sqlite-utils

You can dump out a table as JSON like so:

    sqlite-utils rows tableau.db \
       'Admin Site and County Map Site No Info' > tableau.json

Or as CSV like this:

    sqlite-utils rows tableau.db --csv \
       'Admin Site and County Map Site No Info' > tableau.csv

## Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

    cd tableau-to-sqlite
    python -mvenv venv
    source venv/bin/activate

Or if you are using `pipenv`:

    pipenv shell

Now install the dependencies and tests:

    pip install -e '.[test]'

To run the tests:

    pytest


",Simon Willison,,text/markdown,https://github.com/simonw/tableau-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/tableau-to-sqlite/,,https://pypi.org/project/tableau-to-sqlite/,"{""CI"": ""https://github.com/simonw/tableau-to-sqlite/actions"", ""Changelog"": ""https://github.com/simonw/tableau-to-sqlite/releases"", ""Homepage"": ""https://github.com/simonw/tableau-to-sqlite"", ""Issues"": ""https://github.com/simonw/tableau-to-sqlite/issues""}",https://pypi.org/project/tableau-to-sqlite/0.2.1/,"[""click"", ""TableauScraper (==0.1.2)"", ""pytest ; extra == 'test'"", ""vcrpy ; extra == 'test'""]",>=3.6,0.2.1,0,
yaml-to-sqlite,Utility for converting YAML files to SQLite,"[""Development Status :: 3 - Alpha"", ""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7""]","# yaml-to-sqlite

[![PyPI](https://img.shields.io/pypi/v/yaml-to-sqlite.svg)](https://pypi.org/project/yaml-to-sqlite/)
[![Changelog](https://img.shields.io/github/v/release/simonw/yaml-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/yaml-to-sqlite/releases)
[![Tests](https://github.com/simonw/yaml-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/yaml-to-sqlite/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/yaml-to-sqlite/blob/main/LICENSE)

Load the contents of a YAML file into a SQLite database table.

```
$ yaml-to-sqlite --help
Usage: yaml-to-sqlite [OPTIONS] DB_PATH TABLE YAML_FILE

  Convert YAML files to SQLite

Options:
  --version             Show the version and exit.
  --pk TEXT             Column to use as a primary key
  --single-column TEXT  If YAML file is a list of values, populate this column
  --help                Show this message and exit.
```
## Usage

Given a `news.yml` file containing the following:
```yaml
- date: 2021-06-05
  body: |-
    [Datasette 0.57](https://docs.datasette.io/en/stable/changelog.html#v0-57) is out with an important security patch.
- date: 2021-05-10
  body: |-
    [Django SQL Dashboard](https://simonwillison.net/2021/May/10/django-sql-dashboard/) is a new tool that brings a useful authenticated subset of Datasette to Django projects that are built on top of PostgreSQL.
```
Running this command:
```bash
$ yaml-to-sqlite news.db stories news.yml
```
Will create a database file with this schema:
```bash
$ sqlite-utils schema news.db
CREATE TABLE [stories] (
   [date] TEXT,
   [body] TEXT
);
```
The `--pk` option can be used to set a column as the primary key for the table:

```bash
$ yaml-to-sqlite news.db stories news.yml --pk date
$ sqlite-utils schema news.db
CREATE TABLE [stories] (
   [date] TEXT PRIMARY KEY,
   [body] TEXT
);
```
## Single column YAML lists

The `--single-column` option can be used when the YAML file is a list of values, for example a file called `dogs.yml` containing the following:

```yaml
- Cleo
- Pancakes
- Nixie
```
Running this command:
```bash
$ yaml-to-sqlite dogs.db dogs.yaml --single-column=name
```
Will create a single `dogs` table with a single `name` column that is the primary key:

```bash
$ sqlite-utils schema dogs.db
CREATE TABLE [dogs] (
   [name] TEXT PRIMARY KEY
);
$ sqlite-utils dogs.db 'select * from dogs' -t
name
--------
Cleo
Pancakes
Nixie
```


",Simon Willison,,text/markdown,https://github.com/simonw/yaml-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/yaml-to-sqlite/,,https://pypi.org/project/yaml-to-sqlite/,"{""Homepage"": ""https://github.com/simonw/yaml-to-sqlite""}",https://pypi.org/project/yaml-to-sqlite/1.0/,"[""click"", ""PyYAML"", ""sqlite-utils (>=3.9.1)"", ""pytest ; extra == 'test'""]",,1.0,0,