name,summary,classifiers,description,author,author_email,description_content_type,home_page,keywords,license,maintainer,maintainer_email,package_url,platform,project_url,project_urls,release_url,requires_dist,requires_python,version,yanked,yanked_reason download-tiles,Download map tiles and store them in an MBTiles database,[],"# download-tiles [![PyPI](https://img.shields.io/pypi/v/download-tiles.svg)](https://pypi.org/project/download-tiles/) [![Changelog](https://img.shields.io/github/v/release/simonw/download-tiles?include_prereleases&label=changelog)](https://github.com/simonw/download-tiles/releases) [![Tests](https://github.com/simonw/download-tiles/workflows/Test/badge.svg)](https://github.com/simonw/download-tiles/actions?query=workflow%3ATest) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/download-tiles/blob/master/LICENSE) Download map tiles and store them in an MBTiles database ## Installation Install this tool using `pip`: $ pip install download-tiles ## Usage This tool downloads tiles from a specified [TMS (Tile Map Server)](https://wiki.openstreetmap.org/wiki/TMS) server for a specified bounding box and range of zoom levels and stores those tiles in a MBTiles SQLite database. It is a command-line wrapper around the [Landez](https://github.com/makinacorpus/landez) Python libary. **Please use this tool responsibly**. Consult the usage policies of the tile servers you are interacting with, for example the [OpenStreetMap Tile Usage Policy](https://operations.osmfoundation.org/policies/tiles/). Running the following will download zoom levels 0-3 of OpenStreetMap, 85 tiles total, and store them in a SQLite database called `world.mbtiles`: download-tiles world.mbtiles You can customize which tile and zoom levels are downloaded using command options: `--zoom-levels=0-3` or `-z=0-3` The different zoom levels to download. Specify a single number, e.g. `15`, or a range of numbers e.g. `0-4`. Be careful with this setting as you can easily go over the limits requested by the underlying tile server. `--bbox=3.9,-6.3,14.5,10.2` or `-b=3.9,-6.3,14.5,10.2` The bounding box to fetch. Should be specified as `min-lon,min-lat,max-lon,max-lat`. You can use [bboxfinder.com](http://bboxfinder.com/) to find these for different areas. `--city=london` or `--country=madagascar` These options can be used instead of `--bbox`. The city or country specified will be looked up using the [Nominatum API](https://nominatim.org/release-docs/latest/api/Search/) and used to derive a bounding box. `--show-bbox` Use this option to output the bounding box that was retrieved for the `--city` or `--country` without downloading any tiles. `--name=Name` A name for this tile collection, used for the `name` field in the `metadata` table. If not specified a UUID will be used, or if you used `--city` or `--country` the name will be set to the full name of that place. `--attribution=""Attribution string""` Attribution string to bake into the `metadata` table. This will default to `© OpenStreetMap contributors` unless you use `--tiles-url` to specify an alternative tile server, in which case you should specify a custom attribution string. You can use the `--attribution=osm` shortcut to specify the `© OpenStreetMap contributors` value without having to type it out in full. `--tiles-url=https://...` The tile server URL to use. This should include `{z}` and `{x}` and `{y}` specifiers, and can optionally include `{s}` for subdomains. The default URL used here is for OpenStreetMap, `http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png` `--tiles-subdomains=a,b,c` A comma-separated list of subdomains to use for the `{s}` parameter. `--verbose` Use this option to turn on verbose logging. `--cache-dir=/tmp/tiles` Provide a directory to cache downloaded tiles between runs. This can be useful if you are worried you might not have used the correct options for the bounding box or zoom levels. ## Development To contribute to this tool, first checkout the code. Then create a new virtual environment: cd download-tiles python -mvenv venv source venv/bin/activate Or if you are using `pipenv`: pipenv shell Now install the dependencies and tests: pip install -e '.[test]' To run the tests: pytest ",Simon Willison,,text/markdown,https://github.com/simonw/download-tiles,,"Apache License, Version 2.0",,,https://pypi.org/project/download-tiles/,,https://pypi.org/project/download-tiles/,"{""CI"": ""https://github.com/simonw/download-tiles/actions"", ""Changelog"": ""https://github.com/simonw/download-tiles/releases"", ""Homepage"": ""https://github.com/simonw/download-tiles"", ""Issues"": ""https://github.com/simonw/download-tiles/issues""}",https://pypi.org/project/download-tiles/0.4.1/,"[""click"", ""requests"", ""landez (==2.5.0)"", ""pytest ; extra == 'test'"", ""requests-mock ; extra == 'test'""]",>=3.6,0.4.1,0, git-history,Tools for analyzing Git history using SQLite,[],"# git-history [![PyPI](https://img.shields.io/pypi/v/git-history.svg)](https://pypi.org/project/git-history/) [![Changelog](https://img.shields.io/github/v/release/simonw/git-history?include_prereleases&label=changelog)](https://github.com/simonw/git-history/releases) [![Tests](https://github.com/simonw/git-history/workflows/Test/badge.svg)](https://github.com/simonw/git-history/actions?query=workflow%3ATest) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/git-history/blob/master/LICENSE) Tools for analyzing Git history using SQLite For background on this project see [git-history: a tool for analyzing scraped data collected using Git and SQLite](https://simonwillison.net/2021/Dec/7/git-history/) ## Installation Install this tool using `pip`: $ pip install git-history ## Demos [git-history-demos.datasette.io](http://git-history-demos.datasette.io/) hosts three example databases created using this tool: - [pge-outages](https://git-history-demos.datasette.io/pge-outages) shows a history of PG&E (the electricity supplier) [outages](https://pgealerts.alerts.pge.com/outagecenter/), using data collected in [simonw/pge-outages](https://github.com/simonw/pge-outages) converted using [pge-outages.sh](https://github.com/simonw/git-history/blob/main/demos/pge-outages.sh) - [ca-fires](https://git-history-demos.datasette.io/ca-fires) shows a history of fires in California reported on [fire.ca.gov/incidents](https://www.fire.ca.gov/incidents/), from data in [simonw/ca-fires-history](https://github.com/simonw/ca-fires-history) converted using [ca-fires.sh](https://github.com/simonw/git-history/blob/main/demos/ca-fires.sh) - [sf-bay-511](https://git-history-demos.datasette.io/sf-bay-511) has records of San Francisco Bay Area traffic and transit incident data from [511.org](https://511.org/), collected in [dbreunig/511-events-history](https://github.com/dbreunig/511-events-history) converted using [sf-bay-511.sh](https://github.com/simonw/git-history/blob/main/demos/sf-bay-511.sh) The demos are deployed using [Datasette](https://datasette.io/) on [Google Cloud Run](https://cloud.google.com/run/) by [this GitHub Actions workflow](https://github.com/simonw/git-history/blob/main/.github/workflows/deploy-demos.yml). ## Usage This tool can be run against a Git repository that holds a file that contains JSON, CSV/TSV or some other format and which has multiple versions tracked in the Git history. Read [Git scraping: track changes over time by scraping to a Git repository](https://simonwillison.net/2020/Oct/9/git-scraping/) to understand how you might create such a repository. The `file` command analyzes the history of an individual file within the repository, and generates a SQLite database table that represents the different versions of that file over time. The file is assumed to contain multiple objects - for example, the results of scraping an electricity outage map or a CSV file full of records. Assuming you have a file called `incidents.json` that is a JSON array of objects, with multiple versions of that file recorded in a repository. Each version of that file might look something like this: ```json [ { ""IncidentID"": ""abc123"", ""Location"": ""Corner of 4th and Vermont"", ""Type"": ""fire"" }, { ""IncidentID"": ""cde448"", ""Location"": ""555 West Example Drive"", ""Type"": ""medical"" } ] ``` Change directory into the GitHub repository in question and run the following: git-history file incidents.db incidents.json This will create a new SQLite database in the `incidents.db` file with three tables: - `commits` containing a row for every commit, with a `hash` column, the `commit_at` date and a foreign key to a `namespace`. - `item` containing a row for every item in every version of the `filename.json` file - with an extra `_commit` column that is a foreign key back to the `commit` table. - `namespaces` containing a single row. This allows you to build multiple tables for different files, using the `--namespace` option described below. The database schema for this example will look like this: ```sql CREATE TABLE [namespaces] ( [id] INTEGER PRIMARY KEY, [name] TEXT ); CREATE UNIQUE INDEX [idx_namespaces_name] ON [namespaces] ([name]); CREATE TABLE [commits] ( [id] INTEGER PRIMARY KEY, [namespace] INTEGER REFERENCES [namespaces]([id]), [hash] TEXT, [commit_at] TEXT ); CREATE UNIQUE INDEX [idx_commits_namespace_hash] ON [commits] ([namespace], [hash]); CREATE TABLE [item] ( [IncidentID] TEXT, [Location] TEXT, [Type] TEXT ); ``` If you have 10 historic versions of the `incidents.json` file and each one contains 30 incidents, you will end up with 10 * 30 = 300 rows in your `item` table. ### Track the history of individual items using IDs If your objects have a unique identifier - or multiple columns that together form a unique identifier - you can use the `--id` option to de-duplicate and track changes to each of those items over time. This provides a much more interesting way to apply this tool. If there is a unique identifier column called `IncidentID` you could run the following: git-history file incidents.db incidents.json --id IncidentID The database schema used here is very different from the one used without the `--id` option. If you have already imported history, the command will skip any commits that it has seen already and just process new ones. This means that even though an initial import could be slow subsequent imports should run a lot faster. This command will create six tables - `commits`, `item`, `item_version`, `columns`, `item_changed` and `namespaces`. Here's the full schema: ```sql CREATE TABLE [namespaces] ( [id] INTEGER PRIMARY KEY, [name] TEXT ); CREATE UNIQUE INDEX [idx_namespaces_name] ON [namespaces] ([name]); CREATE TABLE [commits] ( [id] INTEGER PRIMARY KEY, [namespace] INTEGER REFERENCES [namespaces]([id]), [hash] TEXT, [commit_at] TEXT ); CREATE UNIQUE INDEX [idx_commits_namespace_hash] ON [commits] ([namespace], [hash]); CREATE TABLE [item] ( [_id] INTEGER PRIMARY KEY, [_item_id] TEXT , [IncidentID] TEXT, [Location] TEXT, [Type] TEXT, [_commit] INTEGER); CREATE UNIQUE INDEX [idx_item__item_id] ON [item] ([_item_id]); CREATE TABLE [item_version] ( [_id] INTEGER PRIMARY KEY, [_item] INTEGER REFERENCES [item]([_id]), [_version] INTEGER, [_commit] INTEGER REFERENCES [commits]([id]), [IncidentID] TEXT, [Location] TEXT, [Type] TEXT, [_item_full_hash] TEXT ); CREATE TABLE [columns] ( [id] INTEGER PRIMARY KEY, [namespace] INTEGER REFERENCES [namespaces]([id]), [name] TEXT ); CREATE UNIQUE INDEX [idx_columns_namespace_name] ON [columns] ([namespace], [name]); CREATE TABLE [item_changed] ( [item_version] INTEGER REFERENCES [item_version]([_id]), [column] INTEGER REFERENCES [columns]([id]), PRIMARY KEY ([item_version], [column]) ); CREATE VIEW item_version_detail AS select commits.commit_at as _commit_at, commits.hash as _commit_hash, item_version.*, ( select json_group_array(name) from columns where id in ( select column from item_changed where item_version = item_version._id ) ) as _changed_columns from item_version join commits on commits.id = item_version._commit; CREATE INDEX [idx_item_version__item] ON [item_version] ([_item]); ``` #### item table The `item` table will contain the most recent version of each row, de-duplicated by ID, plus the following additional columns: - `_id` - a numeric integer primary key, used as a foreign key from the `item_version` table. - `_item_id` - a hash of the values of the columns specified using the `--id` option to the command. This is used for de-duplication when processing new versions. - `_commit` - a foreign key to the `commit` table, representing the most recent commit to modify this item. #### item_version table The `item_version` table will contain a row for each captured differing version of that item, plus the following columns: - `_id` - a numeric ID for the item version record. - `_item` - a foreign key to the `item` table. - `_version` - the numeric version number, starting at 1 and incrementing for each captured version. - `_commit` - a foreign key to the `commit` table. - `_item_full_hash` - a hash of this version of the item. This is used internally by the tool to identify items that have changed between commits. The other columns in this table represent columns in the original data that have changed since the previous version. If the value has not changed, it will be represented by a `null`. If a value was previously set but has been changed back to `null` it will still be represented as `null` in the `item_version` row. You can identify these using the `item_changed` many-to-many table described below. You can use the `--full-versions` option to store full copies of the item at each version, rather than just storing the columns that have changed. #### item_version_detail view This SQL view joins `item_version` against `commits` to add three further columns: `_commit_at` with the date of the commit, and `_commit_hash` with the Git commit hash. #### item_changed This many-to-many table indicates exactly which columns were changed in an `item_version`. - `item_version` is a foreign key to a row in the `item_version` table. - `column` is a foreign key to a row in the `columns` table. This table with have the largest number of rows, which is why it stores just two integers in order to save space. #### columns The `columns` table stores column names. It is referenced by `item_changed`. - `id` - an integer ID. - `name` - the name of the column. - `namespace` - a foreign key to `namespaces`, for if multiple file histories are sharing the same database. #### Reserved column names Note that `_id`, `_item_full_hash`, `_item`, `_item_id`, `_version`, `_commit`, `_item_id`, `_commit_at`, `_commit_hash`, `_changed_columns`, `rowid` are considered reserved column names for the purposes of this tool. If your data contains any of these they will be renamed to add a trailing underscore, for example `_id_`, `_item_`, `_version_`, to avoid clashing with the reserved columns. If you have a column with a name such as `_commit_` it will be renamed too, adding an additional trailing underscore, so `_commit_` becomes `_commit__` and `_commit__` becomes `_commit___`. ### Additional options - `--repo DIRECTORY` - the path to the Git repository, if it is not the current working directory. - `--branch TEXT` - the Git branch to analyze - defaults to `main`. - `--id TEXT` - as described above: pass one or more columns that uniquely identify a record, so that changes to that record can be calculated over time. - `--full-versions` - instead of recording just the columns that have changed in the `item_version` table record a full copy of each version of theh item. - `--ignore TEXT` - one or more columns to ignore - they will not be included in the resulting database. - `--csv` - treat the data is CSV or TSV rather than JSON, and attempt to guess the correct dialect - `--dialect` - use a spcific CSV dialect. Options are `excel`, `excel-tab` and `unix` - see [the Python CSV documentation](https://docs.python.org/3/library/csv.html#csv.excel) for details. - `--skip TEXT` - one or more full Git commit hashes that should be skipped. You can use this if some of the data in your revision history is corrupted in a way that prevents this tool from working. - `--start-at TEXT` - skip commits prior to the specified commit hash. - `--start-after TEXT` - skip commits up to and including the specified commit hash, then start processing from the following commit. - `--convert TEXT` - custom Python code for a conversion, described below. - `--import TEXT` - additional Python modules to import for `--convert`. - `--ignore-duplicate-ids` - if a single version of a file has the same ID in it more than once, the tool will exit with an error. Use this option to ignore this and instead pick just the first of the two duplicates. - `--namespace TEXT` - use this if you wish to include the history of multiple different files in the same database. The default is `item` but you can set it to something else, which will produce tables with names like `yournamespace` and `yournamespace_version`. - `--wal` - Enable WAL mode on the created database file. Use this if you plan to run queries against the database while `git-history` is creating it. - `--silent` - don't show the progress bar. ### CSV and TSV data If the data in your repository is a CSV or TSV file you can process it by adding the `--csv` option. This will attempt to detect which delimiter is used by the file, so the same option works for both comma- and tab-separated values. git-history file trees.db trees.csv --id TreeID You can also specify the CSV dialect using the `--dialect` option. ### Custom conversions using --convert If your data is not already either CSV/TSV or a flat JSON array, you can reshape it using the `--convert` option. The format needed by this tool is an array of dictionaries, as demonstrated by the `incidents.json` example above. If your data does not fit this shape, you can provide a snippet of Python code to converts the on-disk content of each stored file into a Python list of dictionaries. For example, if your stored files each look like this: ```json { ""incidents"": [ { ""id"": ""552"", ""name"": ""Hawthorne Fire"", ""engines"": 3 }, { ""id"": ""556"", ""name"": ""Merlin Fire"", ""engines"": 1 } ] } ``` You could use the following Python snippet to convert them to the required format: ```python json.loads(content)[""incidents""] ``` (The `json` module is exposed to your custom function by default.) You would then run the tool like this: git-history file database.db incidents.json \ --id id \ --convert 'json.loads(content)[""incidents""]' The `content` variable is always a `bytes` object representing the content of the file at a specific moment in the repository's history. You can import additional modules using `--import`. This example shows how you could read a CSV file that uses `;` as the delimiter: git-history file trees.db ../sf-tree-history/Street_Tree_List.csv \ --repo ../sf-tree-history \ --import csv \ --import io \ --convert ' fp = io.StringIO(content.decode(""utf-8"")) return list(csv.DictReader(fp, delimiter="";"")) ' \ --id TreeID You can import nested modules such as [ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html) using `--import xml.etree.ElementTree`, then refer to them in your function body as `xml.etree.ElementTree`. For example, if your tracked data was in an `items.xml` file that looked like this: ```xml ``` You could load it using the following `--convert` script: ``` git-history file items.xml --convert ' tree = xml.etree.ElementTree.fromstring(content) return [el.attrib for el in tree.iter(""item"")] ' --import xml.etree.ElementTree --id id ``` If your Python code spans more than one line it needs to include a `return` statement. You can also use Python generators in your `--convert` code, for example: git-history file stats.db package-stats/stats.json \ --repo package-stats \ --convert ' data = json.loads(content) for key, counts in data.items(): for date, count in counts.items(): yield { ""package"": key, ""date"": date, ""count"": count } ' --id package --id date This conversion function expects data that looks like this: ```json { ""airtable-export"": { ""2021-05-18"": 66, ""2021-05-19"": 60, ""2021-05-20"": 87 } } ``` ## Development To contribute to this tool, first checkout the code. Then create a new virtual environment: cd git-history python -m venv venv source venv/bin/activate Or if you are using `pipenv`: pipenv shell Now install the dependencies and test dependencies: pip install -e '.[test]' To run the tests: pytest To update the schema examples in this README file: cog -r README.md ",Simon Willison,,text/markdown,https://github.com/simonw/git-history,,"Apache License, Version 2.0",,,https://pypi.org/project/git-history/,,https://pypi.org/project/git-history/,"{""CI"": ""https://github.com/simonw/git-history/actions"", ""Changelog"": ""https://github.com/simonw/git-history/releases"", ""Homepage"": ""https://github.com/simonw/git-history"", ""Issues"": ""https://github.com/simonw/git-history/issues""}",https://pypi.org/project/git-history/0.6.1/,"[""click"", ""GitPython"", ""sqlite-utils (>=3.19)"", ""pytest ; extra == 'test'"", ""cogapp ; extra == 'test'""]",>=3.6,0.6.1,0, google-drive-to-sqlite,Create a SQLite database containing metadata from Google Drive,[],"# google-drive-to-sqlite [![PyPI](https://img.shields.io/pypi/v/google-drive-to-sqlite.svg)](https://pypi.org/project/google-drive-to-sqlite/) [![Changelog](https://img.shields.io/github/v/release/simonw/google-drive-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/google-drive-to-sqlite/releases) [![Tests](https://github.com/simonw/google-drive-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/google-drive-to-sqlite/actions?query=workflow%3ATest) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/google-drive-to-sqlite/blob/master/LICENSE) Create a SQLite database containing metadata from [Google Drive](https://www.google.com/drive) If you use Google Drive, and especially if you have shared drives with other people there's a good chance you have hundreds or even thousands of files that you may not be fully aware of. This tool can download metadata about those files - their names, sizes, folders, content types, permissions, creation dates and more - and store them in a SQLite database. This lets you use SQL to analyze your Google Drive contents, using [Datasette](https://datasette.io/) or the SQLite command-line tool or any other SQLite database browsing software. ## Installation Install this tool using `pip`: pip install google-drive-to-sqlite ## Quickstart Authenticate with Google Drive by running: google-drive-to-sqlite auth Now create a SQLite database with metadata about all of the files you have starred using: google-drive-to-sqlite files starred.db --starred You can explore the resulting database using [Datasette](https://datasette.io/): $ pip install datasette $ datasette starred.db INFO: Started server process [24661] INFO: Uvicorn running on http://127.0.0.1:8001 ## Authentication > :warning: **This application has not yet been verified by Google** - you may find you are unable to authenticate until that verification is complete. [#10](https://github.com/simonw/google-drive-to-sqlite/issues/10) > > You can work around this issue by [creating your own OAuth client ID key](https://til.simonwillison.net/googlecloud/google-oauth-cli-application) and passing it to the `auth` command using `--google-client-id` and `--google-client-secret`. First, authenticate with Google Drive using the `auth` command: $ google-drive-to-sqlite auth Visit the following URL to authenticate with Google Drive https://accounts.google.com/o/oauth2/v2/auth?... Then return here and paste in the resulting code: Paste code here: Follow the link, sign in with Google Drive and then copy and paste the resulting code back into the tool. This will save an authentication token to the file called `auth.json` in the current directory. To specify a different location for that file, use the `--auth` option: google-drive-to-sqlite auth --auth ~/google-drive-auth.json The `auth` command also provides options for using a different scope, Google client ID and Google client secret. You can use these to create your own custom authentication tokens that can work with other Google APIs, see [issue #5](https://github.com/simonw/google-drive-to-sqlite/issues/5) for details. Full `--help`: ``` Usage: google-drive-to-sqlite auth [OPTIONS] Authenticate user and save credentials Options: -a, --auth FILE Path to save token, defaults to auth.json --google-client-id TEXT Custom Google client ID --google-client-secret TEXT Custom Google client secret --scope TEXT Custom token scope --help Show this message and exit. ``` To revoke the token that is stored in `auth.json`, such that it cannot be used to access Google Drive in the future, run the `revoke` command: google-drive-to-sqlite revoke Or if your token is stored in another location: google-drive-to-sqlite revoke -a ~/google-drive-auth.json You will need to obtain a fresh token using the `auth` command in order to continue using this tool. ## google-drive-to-sqlite files To retrieve metadata about the files in your Google Drive, or a folder or search within it, use the `google-drive-to-sqlite files` command. This will default to writing details about every file in your Google Drive to a SQLite database: google-drive-to-sqlite files files.db Files and folders will be written to databases tables, which will be created if they do not yet exist. The database schema is [shown below](#database-schema). If a file or folder already exists, based on a matching `id`, it will be replaced with fresh data. Instead of writing to SQLite you can use `--json` to output as JSON, or `--nl` to output as newline-delimited JSON: google-drive-to-sqlite files --nl Use `--folder ID` to retrieve everything in a specified folder and its sub-folders: google-drive-to-sqlite files files.db --folder 1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i Use `--q QUERY` to use a [custom search query](https://developers.google.com/drive/api/v3/reference/query-ref): google-drive-to-sqlite files files.db -q ""viewedByMeTime > '2022-01-01'"" The following shortcut options help build queries: - `--full-text TEXT` to search for files where the full text matches a search term - `--starred` for files and folders you have starred - `--trashed` for files and folders in the trash - `--shared-with-me` for files and folders that have been shared with you - `--apps` for Google Apps documents, spreadsheets, presentations and drawings (equivalent to setting all of the next four options) - `--docs` for Google Apps documents - `--sheets` for Google Apps spreadsheets - `--presentations` for Google Apps presentations - `--drawings` for Google Apps drawings You can combine these - for example, this returns all files that you have starred and that were shared with you: google-drive-to-sqlite files highlights.db \ --starred --shared-with-me Multiple options are treated as AND, with the exception of the Google Apps options which are treated as OR - so the following would retrieve all spreadsheets and presentations that have also been starred: google-drive-to-sqlite files highlights.db \ --starred --sheets --presentations You can use `--stop-after X` to stop after retrieving X files, useful for trying out a new search pattern and seeing results straight away. The `--import-json` and `--import-nl` options are mainly useful for testing and developing this tool. They allow you to replay the JSON or newline-delimited JSON that was previously fetched using `--json` or `--nl` and use it to create a fresh SQLite database, without needing to make any outbound API calls: # Fetch all starred files from the API, write to starred.json google-drive-to-sqlite files -q 'starred = true' --json > starred.json # Now import that data into a new SQLite database file google-drive-to-sqlite files starred.db --import-json starred.json Full `--help`: ``` Usage: google-drive-to-sqlite files [OPTIONS] [DATABASE] Retrieve metadata for files in Google Drive, and write to a SQLite database or output as JSON. google-drive-to-sqlite files files.db Use --json to output JSON, --nl for newline-delimited JSON: google-drive-to-sqlite files files.db --json Use a folder ID to recursively fetch every file in that folder and its sub- folders: google-drive-to-sqlite files files.db --folder 1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i Fetch files you have starred: google-drive-to-sqlite files starred.db --starred Options: -a, --auth FILE Path to auth.json token file --folder TEXT Files in this folder ID and its sub-folders -q TEXT Files matching this query --full-text TEXT Search for files with text match --starred Files you have starred --trashed Files in the trash --shared-with-me Files that have been shared with you --apps Google Apps docs, spreadsheets, presentations and drawings --docs Google Apps docs --sheets Google Apps spreadsheets --presentations Google Apps presentations --drawings Google Apps drawings --json Output JSON rather than write to DB --nl Output newline-delimited JSON rather than write to DB --stop-after INTEGER Stop paginating after X results --import-json FILE Import from this JSON file instead of the API --import-nl FILE Import from this newline-delimited JSON file -v, --verbose Send verbose output to stderr --help Show this message and exit. ``` ## google-drive-to-sqlite download FILE_ID The `download` command can be used to download files from Google Drive. You'll need one or more file IDs, which look something like `0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB`. To download the file, run this: google-drive-to-sqlite download 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB This will detect the content type of the file and use that as the extension - so if this file is a JPEG the file would be downloaded as: 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB.jpeg You can pass multiple file IDs to the command at once. To hide the progress bar and filename output, use `-s` or `--silent`. If you are downloading a single file you can use the `-o` output to specify a filename and location: google-drive-to-sqlite download 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB \ -o my-image.jpeg Use `-o -` to write the file contents to standard output: google-drive-to-sqlite download 0B32uDVNZfiEKLUtIT1gzYWN2NDI4SzVQYTFWWWxCWUtvVGNB \ -o - > my-image.jpeg Full `--help`: ``` Usage: google-drive-to-sqlite download [OPTIONS] FILE_IDS... Download one or more files to disk, based on their file IDs. The file content will be saved to a file with the name: FILE_ID.ext Where the extension is automatically picked based on the type of file. If you are downloading a single file you can specify a filename with -o: google-drive-to-sqlite download MY_FILE_ID -o myfile.txt Options: -a, --auth FILE Path to auth.json token file -o, --output FILE File to write to, or - for standard output -s, --silent Hide progress bar and filename --help Show this message and exit. ``` ## google-drive-to-sqlite export FORMAT FILE_ID The `export` command can be used to export Google Docs documents, spreadsheets and presentations in a number of different formats. You'll need one or more document IDs, which look something like `10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU`. You can find these by looking at the URL of your document on the Google Docs site. To export that document as PDF, run this: google-drive-to-sqlite export pdf 10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU The file will be exported as: 10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU-export.pdf You can pass multiple file IDs to the command at once. For the `FORMAT` option you can use any of the mime type options listed [on this page](https://developers.google.com/drive/api/v3/ref-export-formats) - for example, to export as an Open Office document you could use: google-drive-to-sqlite export \ application/vnd.oasis.opendocument.text \ 10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU For convenience the following shortcuts for common file formats are provided: - Google Docs: `html`, `txt`, `rtf`, `pdf`, `doc`, `zip`, `epub` - Google Sheets: `xls`, `pdf`, `csv`, `tsv`, `zip` - Presentations: `ppt`, `pdf`, `txt` - Drawings: `jpeg`, `png`, `svg` The `zip` option returns a zip file of HTML. `txt` returns plain text. The others should be self-evident. To hide the filename output, use `-s` or `--silent`. If you are exporting a single file you can use the `-o` output to specify a filename and location: google-drive-to-sqlite export pdf 10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU \ -o my-document.pdf Use `-o -` to write the file contents to standard output: google-drive-to-sqlite export pdf 10BOHGDUYa7lBjUSo26YFCHTpgEmtXabdVFaopCTh1vU \ -o - > my-document.pdf Full `--help`: ``` Usage: google-drive-to-sqlite export [OPTIONS] FORMAT FILE_IDS... Export one or more files to the specified format. Usage: google-drive-to-sqlite export pdf FILE_ID_1 FILE_ID_2 The file content will be saved to a file with the name: FILE_ID-export.ext Where the extension is based on the format you specified. Available export formats can be seen here: https://developers.google.com/drive/api/v3/ref-export-formats Or you can use one of the following shortcuts: - Google Docs: html, txt, rtf, pdf, doc, zip, epub - Google Sheets: xls, pdf, csv, tsv, zip - Presentations: ppt, pdf, txt - Drawings: jpeg, png, svg ""zip"" returns a zip file of HTML. If you are exporting a single file you can specify a filename with -o: google-drive-to-sqlite export zip MY_FILE_ID -o myfile.zip Options: -a, --auth FILE Path to auth.json token file -o, --output FILE File to write to, or - for standard output -s, --silent Hide progress bar and filename --help Show this message and exit. ``` ## google-drive-to-sqlite get URL The `get` command makes authenticated requests to the specified URL, using credentials derived from the `auth.json` file. For example: $ google-drive-to-sqlite get 'https://www.googleapis.com/drive/v3/about?fields=*' { ""kind"": ""drive#about"", ""user"": { ""kind"": ""drive#user"", ""displayName"": ""Simon Willison"", # ... If the resource you are fetching supports pagination you can use `--paginate key` to paginate through all of the rows in a specified key. For example, the following API has a `nextPageToken` key and a `files` list, suggesting it supports pagination: $ google-drive-to-sqlite get https://www.googleapis.com/drive/v3/files { ""kind"": ""drive#fileList"", ""nextPageToken"": ""~!!~AI9...wogHHYlc="", ""incompleteSearch"": false, ""files"": [ { ""kind"": ""drive#file"", ""id"": ""1YEsITp_X8PtDUJWHGM0osT-TXAU1nr0e7RSWRM2Jpyg"", ""name"": ""Title of a spreadsheet"", ""mimeType"": ""application/vnd.google-apps.spreadsheet"" }, To paginate through everything in the `files` list you would use `--paginate files` like this: $ google-drive-to-sqlite get https://www.googleapis.com/drive/v3/files --paginate files [ { ""kind"": ""drive#file"", ""id"": ""1YEsITp_X8PtDUJWHGM0osT-TXAU1nr0e7RSWRM2Jpyg"", ""name"": ""Title of a spreadsheet"", ""mimeType"": ""application/vnd.google-apps.spreadsheet"" }, # ... Add `--nl` to stream paginated data as newline-delimited JSON: $ google-drive-to-sqlite get https://www.googleapis.com/drive/v3/files --paginate files --nl {""kind"": ""drive#file"", ""id"": ""1YEsITp_X8PtDUJWHGM0osT-TXAU1nr0e7RSWRM2Jpyg"", ""name"": ""Title of a spreadsheet"", ""mimeType"": ""application/vnd.google-apps.spreadsheet""} {""kind"": ""drive#file"", ""id"": ""1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i"", ""name"": ""Subfolder"", ""mimeType"": ""application/vnd.google-apps.folder""} Add `--stop-after 5` to stop after 5 records - useful for testing. Full `--help`: ``` Usage: google-drive-to-sqlite get [OPTIONS] URL Make an authenticated HTTP GET to the specified URL Options: -a, --auth FILE Path to auth.json token file --paginate TEXT Paginate through all results in this key --nl Output paginated data as newline-delimited JSON --stop-after INTEGER Stop paginating after X results -v, --verbose Send verbose output to stderr --help Show this message and exit. ``` ## Database schema The database created by this tool has the following schema: ```sql CREATE TABLE [drive_users] ( [permissionId] TEXT PRIMARY KEY, [kind] TEXT, [displayName] TEXT, [photoLink] TEXT, [me] INTEGER, [emailAddress] TEXT ); CREATE TABLE [drive_folders] ( [id] TEXT PRIMARY KEY, [_parent] TEXT, [_owner] TEXT, [lastModifyingUser] TEXT, [kind] TEXT, [name] TEXT, [mimeType] TEXT, [starred] INTEGER, [trashed] INTEGER, [explicitlyTrashed] INTEGER, [parents] TEXT, [spaces] TEXT, [version] TEXT, [webViewLink] TEXT, [iconLink] TEXT, [hasThumbnail] INTEGER, [thumbnailVersion] TEXT, [viewedByMe] INTEGER, [createdTime] TEXT, [modifiedTime] TEXT, [modifiedByMe] INTEGER, [shared] INTEGER, [ownedByMe] INTEGER, [viewersCanCopyContent] INTEGER, [copyRequiresWriterPermission] INTEGER, [writersCanShare] INTEGER, [folderColorRgb] TEXT, [quotaBytesUsed] TEXT, [isAppAuthorized] INTEGER, [linkShareMetadata] TEXT, FOREIGN KEY([_parent]) REFERENCES [drive_folders]([id]), FOREIGN KEY([_owner]) REFERENCES [drive_users]([permissionId]), FOREIGN KEY([lastModifyingUser]) REFERENCES [drive_users]([permissionId]) ); CREATE TABLE [drive_files] ( [id] TEXT PRIMARY KEY, [_parent] TEXT, [_owner] TEXT, [lastModifyingUser] TEXT, [kind] TEXT, [name] TEXT, [mimeType] TEXT, [starred] INTEGER, [trashed] INTEGER, [explicitlyTrashed] INTEGER, [parents] TEXT, [spaces] TEXT, [version] TEXT, [webViewLink] TEXT, [iconLink] TEXT, [hasThumbnail] INTEGER, [thumbnailVersion] TEXT, [viewedByMe] INTEGER, [createdTime] TEXT, [modifiedTime] TEXT, [modifiedByMe] INTEGER, [shared] INTEGER, [ownedByMe] INTEGER, [viewersCanCopyContent] INTEGER, [copyRequiresWriterPermission] INTEGER, [writersCanShare] INTEGER, [quotaBytesUsed] TEXT, [isAppAuthorized] INTEGER, [linkShareMetadata] TEXT, FOREIGN KEY([_parent]) REFERENCES [drive_folders]([id]), FOREIGN KEY([_owner]) REFERENCES [drive_users]([permissionId]), FOREIGN KEY([lastModifyingUser]) REFERENCES [drive_users]([permissionId]) ); ``` ## Thumbnails You can construct a thumbnail image for a known file ID using the following URL: https://drive.google.com/thumbnail?sz=w800-h800&id=FILE_ID Users who are signed into Google Drive and have permission to view a file will be redirected to a thumbnail version of that file. You can tweak the `w800` and `h800` parameters to request different thumbnail sizes. ## Privacy policy This tool requests access to your Google Drive account in order to retrieve metadata about your files there. It also offers a feature that can download the content of those files. The credentials used to access your account are stored in the auth.json file on your computer. The metadata and content retrieved from Google Drive is also stored only on your own personal computer. At no point to the developers of this tool gain access to any of your data. ## Development To contribute to this tool, first checkout the code. Then create a new virtual environment: cd google-drive-to-sqlite python -m venv venv source venv/bin/activate Or if you are using `pipenv`: pipenv shell Now install the dependencies and test dependencies: pip install -e '.[test]' To run the tests: pytest ",Simon Willison,,text/markdown,https://github.com/simonw/google-drive-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/google-drive-to-sqlite/,,https://pypi.org/project/google-drive-to-sqlite/,"{""CI"": ""https://github.com/simonw/google-drive-to-sqlite/actions"", ""Changelog"": ""https://github.com/simonw/google-drive-to-sqlite/releases"", ""Homepage"": ""https://github.com/simonw/google-drive-to-sqlite"", ""Issues"": ""https://github.com/simonw/google-drive-to-sqlite/issues""}",https://pypi.org/project/google-drive-to-sqlite/0.4/,"[""click"", ""httpx"", ""sqlite-utils"", ""pytest ; extra == 'test'"", ""pytest-httpx ; extra == 'test'"", ""pytest-mock ; extra == 'test'"", ""cogapp ; extra == 'test'""]",>=3.6,0.4,0, markdown-to-sqlite,CLI tool for loading markdown files into a SQLite database,"[""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7"", ""Topic :: Database""]","# markdown-to-sqlite [![PyPI](https://img.shields.io/pypi/v/markdown-to-sqlite.svg)](https://pypi.python.org/pypi/markdown-to-sqlite) [![Changelog](https://img.shields.io/github/v/release/simonw/markdown-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/markdown-to-sqlite/releases) [![Tests](https://github.com/simonw/markdown-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/markdown-to-sqlite/actions?query=workflow%3ATest) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/markdown-to-sqlite/blob/main/LICENSE) CLI tool for loading markdown files into a SQLite database. YAML embedded in the markdown files will be used to populate additional columns. Usage: markdown-to-sqlite [OPTIONS] DBNAME TABLE PATHS... For example: $ markdown-to-sqlite docs.db documents file1.md file2.md ## Breaking change Prior to version 1.0 this argument order was different - markdown files were listed before the database and table. ",Simon Willison,,text/markdown,https://github.com/simonw/markdown-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/markdown-to-sqlite/,,https://pypi.org/project/markdown-to-sqlite/,"{""CI"": ""https://github.com/simonw/markdown-to-sqlite/actions"", ""Changelog"": ""https://github.com/simonw/markdown-to-sqlite/releases"", ""Homepage"": ""https://github.com/simonw/markdown-to-sqlite"", ""Issues"": ""https://github.com/simonw/markdown-to-sqlite/issues""}",https://pypi.org/project/markdown-to-sqlite/1.0/,"[""yamldown"", ""markdown"", ""sqlite-utils"", ""click"", ""pytest ; extra == 'test'""]",>=3.6,1.0,0, s3-credentials,A tool for creating credentials for accessing S3 buckets,[],"# s3-credentials [![PyPI](https://img.shields.io/pypi/v/s3-credentials.svg)](https://pypi.org/project/s3-credentials/) [![Changelog](https://img.shields.io/github/v/release/simonw/s3-credentials?include_prereleases&label=changelog)](https://github.com/simonw/s3-credentials/releases) [![Tests](https://github.com/simonw/s3-credentials/workflows/Test/badge.svg)](https://github.com/simonw/s3-credentials/actions?query=workflow%3ATest) [![Documentation Status](https://readthedocs.org/projects/s3-credentials/badge/?version=latest)](https://s3-credentials.readthedocs.org/) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/s3-credentials/blob/master/LICENSE) A tool for creating credentials for accessing S3 buckets For project background, see [s3-credentials: a tool for creating credentials for S3 buckets](https://simonwillison.net/2021/Nov/3/s3-credentials/) on my blog. ## Installation pip install s3-credentials ## Basic usage To create a new S3 bucket and output credentials that can be used with only that bucket: ``` % s3-credentials create my-new-s3-bucket --create-bucket Created bucket: my-new-s3-bucket Created user: s3.read-write.my-new-s3-bucket with permissions boundary: arn:aws:iam::aws:policy/AmazonS3FullAccess Attached policy s3.read-write.my-new-s3-bucket to user s3.read-write.my-new-s3-bucket Created access key for user: s3.read-write.my-new-s3-bucket { ""UserName"": ""s3.read-write.my-new-s3-bucket"", ""AccessKeyId"": ""AKIAWXFXAIOZOYLZAEW5"", ""Status"": ""Active"", ""SecretAccessKey"": ""..."", ""CreateDate"": ""2021-11-03 01:38:24+00:00"" } ``` The tool can do a lot more than this. See the [documentation](https://s3-credentials.readthedocs.io/) for details. ## Documentation - [Full documentation](https://s3-credentials.readthedocs.io/) - [Command help reference](https://s3-credentials.readthedocs.io/en/stable/help.html) - [Release notes](https://github.com/simonw/s3-credentials/releases) ",Simon Willison,,text/markdown,https://github.com/simonw/s3-credentials,,"Apache License, Version 2.0",,,https://pypi.org/project/s3-credentials/,,https://pypi.org/project/s3-credentials/,"{""CI"": ""https://github.com/simonw/s3-credentials/actions"", ""Changelog"": ""https://github.com/simonw/s3-credentials/releases"", ""Homepage"": ""https://github.com/simonw/s3-credentials"", ""Issues"": ""https://github.com/simonw/s3-credentials/issues""}",https://pypi.org/project/s3-credentials/0.14/,"[""click"", ""boto3"", ""pytest ; extra == 'test'"", ""pytest-mock ; extra == 'test'"", ""cogapp ; extra == 'test'"", ""moto[s3] ; extra == 'test'""]",>=3.6,0.14,0, sqlite-utils,CLI tool and Python utility functions for manipulating SQLite databases,"[""Development Status :: 5 - Production/Stable"", ""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.10"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7"", ""Programming Language :: Python :: 3.8"", ""Programming Language :: Python :: 3.9"", ""Topic :: Database""]","# sqlite-utils [![PyPI](https://img.shields.io/pypi/v/sqlite-utils.svg)](https://pypi.org/project/sqlite-utils/) [![Changelog](https://img.shields.io/github/v/release/simonw/sqlite-utils?include_prereleases&label=changelog)](https://sqlite-utils.datasette.io/en/stable/changelog.html) [![Python 3.x](https://img.shields.io/pypi/pyversions/sqlite-utils.svg?logo=python&logoColor=white)](https://pypi.org/project/sqlite-utils/) [![Tests](https://github.com/simonw/sqlite-utils/workflows/Test/badge.svg)](https://github.com/simonw/sqlite-utils/actions?query=workflow%3ATest) [![Documentation Status](https://readthedocs.org/projects/sqlite-utils/badge/?version=stable)](http://sqlite-utils.datasette.io/en/stable/?badge=stable) [![codecov](https://codecov.io/gh/simonw/sqlite-utils/branch/main/graph/badge.svg)](https://codecov.io/gh/simonw/sqlite-utils) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/sqlite-utils/blob/main/LICENSE) [![discord](https://img.shields.io/discord/823971286308356157?label=discord)](https://discord.gg/Ass7bCAMDw) Python CLI utility and library for manipulating SQLite databases. ## Some feature highlights - [Pipe JSON](https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-json-data) (or [CSV or TSV](https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-csv-or-tsv-data)) directly into a new SQLite database file, automatically creating a table with the appropriate schema - [Run in-memory SQL queries](https://sqlite-utils.datasette.io/en/stable/cli.html#querying-data-directly-using-an-in-memory-database), including joins, directly against data in CSV, TSV or JSON files and view the results - [Configure SQLite full-text search](https://sqlite-utils.datasette.io/en/stable/cli.html#configuring-full-text-search) against your database tables and run search queries against them, ordered by relevance - Run [transformations against your tables](https://sqlite-utils.datasette.io/en/stable/cli.html#transforming-tables) to make schema changes that SQLite `ALTER TABLE` does not directly support, such as changing the type of a column - [Extract columns](https://sqlite-utils.datasette.io/en/stable/cli.html#extracting-columns-into-a-separate-table) into separate tables to better normalize your existing data Read more on my blog, in this series of posts on [New features in sqlite-utils](https://simonwillison.net/series/sqlite-utils-features/) and other [entries tagged sqliteutils](https://simonwillison.net/tags/sqliteutils/). ## Installation pip install sqlite-utils Or if you use [Homebrew](https://brew.sh/) for macOS: brew install sqlite-utils ## Using as a CLI tool Now you can do things with the CLI utility like this: $ sqlite-utils memory dogs.csv ""select * from t"" [{""id"": 1, ""age"": 4, ""name"": ""Cleo""}, {""id"": 2, ""age"": 2, ""name"": ""Pancakes""}] $ sqlite-utils insert dogs.db dogs dogs.csv --csv [####################################] 100% $ sqlite-utils tables dogs.db --counts [{""table"": ""dogs"", ""count"": 2}] $ sqlite-utils dogs.db ""select id, name from dogs"" [{""id"": 1, ""name"": ""Cleo""}, {""id"": 2, ""name"": ""Pancakes""}] $ sqlite-utils dogs.db ""select * from dogs"" --csv id,age,name 1,4,Cleo 2,2,Pancakes $ sqlite-utils dogs.db ""select * from dogs"" --table id age name ---- ----- -------- 1 4 Cleo 2 2 Pancakes You can import JSON data into a new database table like this: $ curl https://api.github.com/repos/simonw/sqlite-utils/releases \ | sqlite-utils insert releases.db releases - --pk id Or for data in a CSV file: $ sqlite-utils insert dogs.db dogs dogs.csv --csv `sqlite-utils memory` lets you import CSV or JSON data into an in-memory database and run SQL queries against it in a single command: $ cat dogs.csv | sqlite-utils memory - ""select name, age from stdin"" See the [full CLI documentation](https://sqlite-utils.datasette.io/en/stable/cli.html) for comprehensive coverage of many more commands. ## Using as a library You can also `import sqlite_utils` and use it as a Python library like this: ```python import sqlite_utils db = sqlite_utils.Database(""demo_database.db"") # This line creates a ""dogs"" table if one does not already exist: db[""dogs""].insert_all([ {""id"": 1, ""age"": 4, ""name"": ""Cleo""}, {""id"": 2, ""age"": 2, ""name"": ""Pancakes""} ], pk=""id"") ``` Check out the [full library documentation](https://sqlite-utils.datasette.io/en/stable/python-api.html) for everything else you can do with the Python library. ## Related projects * [Datasette](https://datasette.io/): A tool for exploring and publishing data * [csvs-to-sqlite](https://github.com/simonw/csvs-to-sqlite): Convert CSV files into a SQLite database * [db-to-sqlite](https://github.com/simonw/db-to-sqlite): CLI tool for exporting a MySQL or PostgreSQL database as a SQLite file * [dogsheep](https://dogsheep.github.io/): A family of tools for personal analytics, built on top of `sqlite-utils` ",Simon Willison,,text/markdown,https://github.com/simonw/sqlite-utils,,"Apache License, Version 2.0",,,https://pypi.org/project/sqlite-utils/,,https://pypi.org/project/sqlite-utils/,"{""CI"": ""https://github.com/simonw/sqlite-utils/actions"", ""Changelog"": ""https://sqlite-utils.datasette.io/en/stable/changelog.html"", ""Documentation"": ""https://sqlite-utils.datasette.io/en/stable/"", ""Homepage"": ""https://github.com/simonw/sqlite-utils"", ""Issues"": ""https://github.com/simonw/sqlite-utils/issues"", ""Source code"": ""https://github.com/simonw/sqlite-utils""}",https://pypi.org/project/sqlite-utils/3.30/,"[""sqlite-fts4"", ""click"", ""click-default-group-wheel"", ""tabulate"", ""python-dateutil"", ""furo ; extra == 'docs'"", ""sphinx-autobuild ; extra == 'docs'"", ""codespell ; extra == 'docs'"", ""sphinx-copybutton ; extra == 'docs'"", ""beanbag-docutils (>=2.0) ; extra == 'docs'"", ""flake8 ; extra == 'flake8'"", ""mypy ; extra == 'mypy'"", ""types-click ; extra == 'mypy'"", ""types-tabulate ; extra == 'mypy'"", ""types-python-dateutil ; extra == 'mypy'"", ""data-science-types ; extra == 'mypy'"", ""pytest ; extra == 'test'"", ""black ; extra == 'test'"", ""hypothesis ; extra == 'test'"", ""cogapp ; extra == 'test'""]",>=3.6,3.30,0, tableau-to-sqlite,Fetch data from Tableau into a SQLite database,[],"# tableau-to-sqlite [![PyPI](https://img.shields.io/pypi/v/tableau-to-sqlite.svg)](https://pypi.org/project/tableau-to-sqlite/) [![Changelog](https://img.shields.io/github/v/release/simonw/tableau-to-sqlite?include_prereleases&label=changelog)](https://github.com/simonw/tableau-to-sqlite/releases) [![Tests](https://github.com/simonw/tableau-to-sqlite/workflows/Test/badge.svg)](https://github.com/simonw/tableau-to-sqlite/actions?query=workflow%3ATest) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/tableau-to-sqlite/blob/master/LICENSE) Fetch data from Tableau into a SQLite database. A wrapper around [TableauScraper](https://github.com/bertrandmartel/tableau-scraping/). ## Installation Install this tool using `pip`: $ pip install tableau-to-sqlite ## Usage If you have the URL to a Tableau dashboard like this: https://results.mo.gov/t/COVID19/views/VaccinationsDashboard/Vaccinations You can pass that directly to the tool: tableau-to-sqlite tableau.db \ https://results.mo.gov/t/COVID19/views/VaccinationsDashboard/Vaccinations This will create a SQLite database called `tableau.db` containing one table for each of the worksheets in that dashboard. If the dashboard is hosted on https://public.tableau.com/ you can instead provide the view name. This will be two strings separated by a `/` symbol - something like this: OregonCOVID-19VaccineProviderEnrollment/COVID-19VaccineProviderEnrollment Now run the tool like this: tableau-to-sqlite tableau.db \ OregonCOVID-19VaccineProviderEnrollment/COVID-19VaccineProviderEnrollment ## Get the data as JSON or CSV If you're building a [git scraper](https://simonwillison.net/2020/Oct/9/git-scraping/) you may want to convert the data gathered by this tool to CSV or JSON to check into your repository. You can do that using [sqlite-utils](https://sqlite-utils.datasette.io/). Install it using `pip`: pip install sqlite-utils You can dump out a table as JSON like so: sqlite-utils rows tableau.db \ 'Admin Site and County Map Site No Info' > tableau.json Or as CSV like this: sqlite-utils rows tableau.db --csv \ 'Admin Site and County Map Site No Info' > tableau.csv ## Development To contribute to this tool, first checkout the code. Then create a new virtual environment: cd tableau-to-sqlite python -mvenv venv source venv/bin/activate Or if you are using `pipenv`: pipenv shell Now install the dependencies and tests: pip install -e '.[test]' To run the tests: pytest ",Simon Willison,,text/markdown,https://github.com/simonw/tableau-to-sqlite,,"Apache License, Version 2.0",,,https://pypi.org/project/tableau-to-sqlite/,,https://pypi.org/project/tableau-to-sqlite/,"{""CI"": ""https://github.com/simonw/tableau-to-sqlite/actions"", ""Changelog"": ""https://github.com/simonw/tableau-to-sqlite/releases"", ""Homepage"": ""https://github.com/simonw/tableau-to-sqlite"", ""Issues"": ""https://github.com/simonw/tableau-to-sqlite/issues""}",https://pypi.org/project/tableau-to-sqlite/0.2.1/,"[""click"", ""TableauScraper (==0.1.2)"", ""pytest ; extra == 'test'"", ""vcrpy ; extra == 'test'""]",>=3.6,0.2.1,0,