id,node_id,name,full_name,private,owner,html_url,description,fork,created_at,updated_at,pushed_at,homepage,size,stargazers_count,watchers_count,language,has_issues,has_projects,has_downloads,has_wiki,has_pages,forks_count,archived,disabled,open_issues_count,license,topics,forks,open_issues,watchers,default_branch,permissions,temp_clone_token,organization,network_count,subscribers_count,readme,readme_html,allow_forking,visibility,is_template,template_repository,web_commit_signoff_required,has_discussions
197431109,MDEwOlJlcG9zaXRvcnkxOTc0MzExMDk=,dogsheep-beta,dogsheep/dogsheep-beta,0,53015001,https://github.com/dogsheep/dogsheep-beta,Build a search index across content from multiple SQLite database tables and run faceted searches against it using Datasette,0,2019-07-17T17:07:26Z,2021-06-13T14:39:01Z,2021-06-13T14:38:59Z,https://dogsheep.github.io/,61,78,78,Python,1,0,1,0,0,0,0,0,11,,"[""search"", ""datasette"", ""datasette-plugin"", ""dogsheep"", ""datasette-io"", ""datasette-tool""]",0,11,78,main,"{""admin"": false, ""push"": false, ""pull"": false}",,53015001,0,4,"# dogsheep-beta
[](https://pypi.org/project/dogsheep-beta/)
[](https://github.com/dogsheep/beta/releases)
[](https://github.com/dogsheep/beta/actions?query=workflow%3ATest)
[](https://github.com/dogsheep/beta/blob/main/LICENSE)
Build a search index across content from multiple SQLite database tables and run faceted searches against it using Datasette
## Example
A live example of this plugin is running at https://datasette.io/-/beta - configured using [this YAML file](https://github.com/simonw/datasette.io/blob/main/templates/dogsheep-beta.yml).
Read more about how this example works in [Building a search engine for datasette.io](https://simonwillison.net/2020/Dec/19/dogsheep-beta/).
## Installation
Install this tool like so:
$ pip install dogsheep-beta
## Usage
Run the indexer using the `dogsheep-beta` command-line tool:
$ dogsheep-beta index dogsheep.db config.yml
The `config.yml` file contains details of the databases and document types that should be indexed:
```yaml
twitter.db:
tweets:
sql: |-
select
tweets.id as key,
'Tweet by @' || users.screen_name as title,
tweets.created_at as timestamp,
tweets.full_text as search_1
from tweets join users on tweets.user = users.id
users:
sql: |-
select
id as key,
name || ' @' || screen_name as title,
created_at as timestamp,
description as search_1
from users
```
This will create a `search_index` table in the `dogsheep.db` database populated by data from those SQL queries.
By default the search index that this tool creates will be configured for Porter stemming. This means that searches for words like `run` will match documents containing `runs` or `running`.
If you don't want to use Porter stemming, use the `--tokenize none` option:
$ dogsheep-beta index dogsheep.db config.yml --tokenize none
You can pass other SQLite tokenize argumenst here, see [the SQLite FTS tokenizers documentation](https://www.sqlite.org/fts5.html#tokenizers).
## Columns
The columns that can be returned by our query are:
- `key` - a unique (within that type) primary key
- `title` - the title for the item
- `timestamp` - an ISO8601 timestamp, e.g. `2020-09-02T21:00:21`
- `search_1` - a larger chunk of text to be included in the search index
- `category` - an integer category ID, see below
- `is_public` - an integer (0 or 1, defaults to 0 if not set) specifying if this is public or not
Public records are things like your public tweets, blog posts and GitHub commits.
## Categories
Indexed items can be assigned a category. Categories are integers that correspond to records in the `categories` table, which defaults to containing the following:
| id | name |
|------|------------|
| 1 | created |
| 2 | saved |
| 3 | received |
`created` is for items that have been created by the Dogsheep instance owner.
`saved` is for items that they have saved, liked or favourited.
`received` is for items that have been specifically sent to them by other people - incoming emails or direct messages for example.
## Datasette plugin
Run `datasette install dogsheep-beta` (or use `pip install dogsheep-beta` in the same environment as Datasette) to install the Dogsheep Beta Datasette plugin.
Once installed, a custom search interface will be made available at `/-/beta`. You can use this interface to execute searches.
The Datasette plugin has some configuration options. You can set these by adding the following to your `metadata.json` configuration file:
```json
{
""plugins"": {
""dogsheep-beta"": {
""database"": ""beta"",
""config_file"": ""dogsheep-beta.yml"",
""template_debug"": true
}
}
}
```
The configuration settings for the plugin are:
- `database` - the database file that contains your search index. If the file is `beta.db` you should set `database` to `beta`.
- `config_file` - the YAML file containing your Dogsheep Beta configuration.
- `template_debug` - set this to `true` to enable debugging output if errors occur in your custom templates, see below.
## Custom results display
Each indexed item type can define custom display HTML as part of the `config.yml` file. It can do this using a `display` key containing a fragment of Jinja template, and optionally a `display_sql` key with extra SQL to execute to fetch the data to display.
Here's how to define a custom display template for a tweet:
```yaml
twitter.db:
tweets:
sql: |-
select
tweets.id as key,
'Tweet by @' || users.screen_name as title,
tweets.created_at as timestamp,
tweets.full_text as search_1
from tweets join users on tweets.user = users.id
display: |-
{{ title }} - tweeted at {{ timestamp }}
{{ search_1 }}
```
This example reuses the value that were stored in the `search_index` table when the indexing query was run.
To load in extra values to display in the template, use a `display_sql` query like this:
```yaml
twitter.db:
tweets:
sql: |-
select
tweets.id as key,
'Tweet by @' || users.screen_name as title,
tweets.created_at as timestamp,
tweets.full_text as search_1
from tweets join users on tweets.user = users.id
display_sql: |-
select
users.screen_name,
tweets.full_text,
tweets.created_at
from
tweets join users on tweets.user = users.id
where
tweets.id = :key
display: |-
{{ display.screen_name }} - tweeted at {{ display.created_at }}
{{ display.full_text }}
```
The `display_sql` query will be executed for every search result, passing the key value from the `search_index` table as the `:key` parameter and the user's search term as the `:q` parameter.
This performs well because [many small queries are efficient in SQLite](https://www.sqlite.org/np1queryprob.html).
If an error occurs while rendering one of your templates the search results page will return a 500 error. You can use the `template_debug` configuration setting described above to instead output debugging information for the search results item that experienced the error.
## Displaying maps
This plugin will eventually include a number of useful shortcuts for rendering interesting content.
The first available shortcut is for displaying maps. Make your custom content output something like this:
```html
```
JavaScript on the page will look for any elements with `data-map-latitude` and `data-map-longitude` and, if it finds any, will load Leaflet and convert those elements into maps centered on that location. The default zoom level will be 12, or you can set a `data-map-zoom` attribute to customize this.
## Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd dogsheep-beta
python3 -mvenv venv
source venv/bin/activate
Or if you are using `pipenv`:
pipenv shell
Now install the dependencies and tests:
pip install -e '.[test]'
To run the tests:
pytest
","
dogsheep-beta
Build a search index across content from multiple SQLite database tables and run faceted searches against it using Datasette
Run the indexer using the dogsheep-beta command-line tool:
$ dogsheep-beta index dogsheep.db config.yml
The config.yml file contains details of the databases and document types that should be indexed:
twitter.db:
tweets:
sql: |- select tweets.id as key, 'Tweet by @' || users.screen_name as title, tweets.created_at as timestamp, tweets.full_text as search_1 from tweets join users on tweets.user = users.idusers:
sql: |- select id as key, name || ' @' || screen_name as title, created_at as timestamp, description as search_1 from users
This will create a search_index table in the dogsheep.db database populated by data from those SQL queries.
By default the search index that this tool creates will be configured for Porter stemming. This means that searches for words like run will match documents containing runs or running.
If you don't want to use Porter stemming, use the --tokenize none option:
$ dogsheep-beta index dogsheep.db config.yml --tokenize none
The columns that can be returned by our query are:
key - a unique (within that type) primary key
title - the title for the item
timestamp - an ISO8601 timestamp, e.g. 2020-09-02T21:00:21
search_1 - a larger chunk of text to be included in the search index
category - an integer category ID, see below
is_public - an integer (0 or 1, defaults to 0 if not set) specifying if this is public or not
Public records are things like your public tweets, blog posts and GitHub commits.
Categories
Indexed items can be assigned a category. Categories are integers that correspond to records in the categories table, which defaults to containing the following:
id
name
1
created
2
saved
3
received
created is for items that have been created by the Dogsheep instance owner.
saved is for items that they have saved, liked or favourited.
received is for items that have been specifically sent to them by other people - incoming emails or direct messages for example.
Datasette plugin
Run datasette install dogsheep-beta (or use pip install dogsheep-beta in the same environment as Datasette) to install the Dogsheep Beta Datasette plugin.
Once installed, a custom search interface will be made available at /-/beta. You can use this interface to execute searches.
The Datasette plugin has some configuration options. You can set these by adding the following to your metadata.json configuration file:
database - the database file that contains your search index. If the file is beta.db you should set database to beta.
config_file - the YAML file containing your Dogsheep Beta configuration.
template_debug - set this to true to enable debugging output if errors occur in your custom templates, see below.
Custom results display
Each indexed item type can define custom display HTML as part of the config.yml file. It can do this using a display key containing a fragment of Jinja template, and optionally a display_sql key with extra SQL to execute to fetch the data to display.
Here's how to define a custom display template for a tweet:
twitter.db:
tweets:
sql: |- select tweets.id as key, 'Tweet by @' || users.screen_name as title, tweets.created_at as timestamp, tweets.full_text as search_1 from tweets join users on tweets.user = users.iddisplay: |- <p>{{ title }} - tweeted at {{ timestamp }}</p> <blockquote>{{ search_1 }}</blockquote>
This example reuses the value that were stored in the search_index table when the indexing query was run.
To load in extra values to display in the template, use a display_sql query like this:
twitter.db:
tweets:
sql: |- select tweets.id as key, 'Tweet by @' || users.screen_name as title, tweets.created_at as timestamp, tweets.full_text as search_1 from tweets join users on tweets.user = users.iddisplay_sql: |- select users.screen_name, tweets.full_text, tweets.created_at from tweets join users on tweets.user = users.id where tweets.id = :keydisplay: |- <p>{{ display.screen_name }} - tweeted at {{ display.created_at }}</p> <blockquote>{{ display.full_text }}</blockquote>
The display_sql query will be executed for every search result, passing the key value from the search_index table as the :key parameter and the user's search term as the :q parameter.
If an error occurs while rendering one of your templates the search results page will return a 500 error. You can use the template_debug configuration setting described above to instead output debugging information for the search results item that experienced the error.
Displaying maps
This plugin will eventually include a number of useful shortcuts for rendering interesting content.
The first available shortcut is for displaying maps. Make your custom content output something like this:
JavaScript on the page will look for any elements with data-map-latitude and data-map-longitude and, if it finds any, will load Leaflet and convert those elements into maps centered on that location. The default zoom level will be 12, or you can set a data-map-zoom attribute to customize this.
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd dogsheep-beta
python3 -mvenv venv
source venv/bin/activate
Or if you are using pipenv:
pipenv shell
Now install the dependencies and tests:
pip install -e '.[test]'
To run the tests:
pytest
",,,,,,
197882382,MDEwOlJlcG9zaXRvcnkxOTc4ODIzODI=,healthkit-to-sqlite,dogsheep/healthkit-to-sqlite,0,53015001,https://github.com/dogsheep/healthkit-to-sqlite,Convert an Apple Healthkit export zip to a SQLite database,0,2019-07-20T05:03:12Z,2021-08-20T00:55:34Z,2021-08-20T00:56:17Z,https://datasette.io/tools/healthkit-to-sqlite,29,91,91,Python,1,1,1,1,0,4,0,0,8,apache-2.0,"[""datasette"", ""datasette-io"", ""datasette-tool"", ""dogsheep"", ""healthkit"", ""sqlite""]",4,8,91,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,53015001,4,3,"# healthkit-to-sqlite
[](https://pypi.org/project/healthkit-to-sqlite/)
[](https://github.com/dogsheep/healthkit-to-sqlite/releases)
[](https://github.com/dogsheep/healthkit-to-sqlite/actions?query=workflow%3ATest)
[](https://github.com/dogsheep/healthkit-to-sqlite/blob/main/LICENSE)
Convert an Apple Healthkit export zip to a SQLite database
## How to install
$ pip install healthkit-to-sqlite
## How to use
First you need to export your Apple HealthKit data.
1. On your iPhone, open the ""Health"" app
2. Click the profile icon in the top right
3. Click ""Export Health Data"" at the bottom of that page
4. Save the resulting file somewhere you can access it, or AirDrop it directly to your laptop.
Now you can convert the resulting `export.zip` file to SQLite like so:
$ healthkit-to-sqlite export.zip healthkit.db
A progress bar will be displayed. You can disable this using `--silent`.
```
Importing from HealthKit [#-------------] 5% 00:01:33
```
You can explore the resulting data using [Datasette](https://datasette.readthedocs.io/) like this:
$ datasette healthkit.db
","
healthkit-to-sqlite
Convert an Apple Healthkit export zip to a SQLite database
How to install
$ pip install healthkit-to-sqlite
How to use
First you need to export your Apple HealthKit data.
On your iPhone, open the ""Health"" app
Click the profile icon in the top right
Click ""Export Health Data"" at the bottom of that page
Save the resulting file somewhere you can access it, or AirDrop it directly to your laptop.
Now you can convert the resulting export.zip file to SQLite like so:
$ healthkit-to-sqlite export.zip healthkit.db
A progress bar will be displayed. You can disable this using --silent.
Importing from HealthKit [#-------------] 5% 00:01:33
You can explore the resulting data using Datasette like this:
$ datasette healthkit.db
",,,,,,
205429375,MDEwOlJlcG9zaXRvcnkyMDU0MjkzNzU=,swarm-to-sqlite,dogsheep/swarm-to-sqlite,0,53015001,https://github.com/dogsheep/swarm-to-sqlite,Create a SQLite database containing your checkin history from Foursquare Swarm,0,2019-08-30T17:37:29Z,2021-02-22T07:58:39Z,2021-01-18T04:36:03Z,,49,37,37,Python,1,1,1,1,0,1,0,0,1,apache-2.0,"[""sqlite"", ""foursquare"", ""swarm"", ""foursquare-api"", ""datasette"", ""dogsheep"", ""datasette-io"", ""datasette-tool""]",1,1,37,main,"{""admin"": false, ""push"": false, ""pull"": false}",,53015001,1,3,"# swarm-to-sqlite
[](https://pypi.org/project/swarm-to-sqlite/)
[](https://github.com/dogsheep/swarm-to-sqlite/releases)
[](https://github.com/dogsheep/swarm-to-sqlite/actions?query=workflow%3ATest)
[](https://github.com/dogsheep/swarm-to-sqlite/blob/main/LICENSE)
Create a SQLite database containing your checkin history from Foursquare Swarm.
## How to install
$ pip install swarm-to-sqlite
## Usage
You will need to first obtain a valid OAuth token for your Foursquare account. You can do so using this tool: https://your-foursquare-oauth-token.glitch.me/
Simplest usage is to simply provide the name of the database file you wish to write to. The tool will prompt you to paste in your token, and will then download your checkins and store them in the specified database file.
$ swarm-to-sqlite checkins.db
Please provide your Foursquare OAuth token:
Importing 3699 checkins [#########-----------------------] 27% 00:02:31
You can also pass the token as a command-line option:
$ swarm-to-sqlite checkins.db --token=XXX
Or as an environment variable:
$ export FOURSQUARE_TOKEN=XXX
$ swarm-to-sqlite checkins.db
To retrieve just checkins within the past X hours, days or weeks, use the `--since=` option. For example, to pull only checkins that happened within the last 10 days use:
$ swarm-to-sqlite checkins.db --token=XXX --since=10d
Use `2w` for two weeks, `10h` for ten hours, `3d` for three days.
In addition to saving the checkins to a database, you can also write them to a JSON file using the `--save` option:
$ swarm-to-sqlite checkins.db --save=checkins.json
Having done this, you can re-import checkins directly from that file (rather than making API calls to fetch data from Foursquare) like this:
$ swarm-to-sqlite checkins.db --load=checkins.json
## Using with Datasette
The SQLite database produced by this tool is designed to be browsed using [Datasette](https://datasette.io/).
You can install the [datasette-cluster-map](https://datasette.io/plugins/datasette-cluster-map) plugin to view your checkins on a map.
","
swarm-to-sqlite
Create a SQLite database containing your checkin history from Foursquare Swarm.
Simplest usage is to simply provide the name of the database file you wish to write to. The tool will prompt you to paste in your token, and will then download your checkins and store them in the specified database file.
$ swarm-to-sqlite checkins.db
Please provide your Foursquare OAuth token:
Importing 3699 checkins [#########-----------------------] 27% 00:02:31
You can also pass the token as a command-line option:
To retrieve just checkins within the past X hours, days or weeks, use the --since= option. For example, to pull only checkins that happened within the last 10 days use:
",,,,,,
206156866,MDEwOlJlcG9zaXRvcnkyMDYxNTY4NjY=,twitter-to-sqlite,dogsheep/twitter-to-sqlite,0,53015001,https://github.com/dogsheep/twitter-to-sqlite,Save data from Twitter to a SQLite database,0,2019-09-03T19:30:08Z,2021-12-26T18:08:43Z,2021-12-26T18:08:40Z,,298,269,269,Python,1,1,1,1,0,13,0,0,10,apache-2.0,"[""datasette"", ""datasette-io"", ""datasette-tool"", ""dogsheep"", ""sqlite"", ""twitter"", ""twitter-api""]",13,10,269,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,53015001,13,5,"# twitter-to-sqlite
[](https://pypi.org/project/twitter-to-sqlite/)
[](https://github.com/dogsheep/twitter-to-sqlite/releases)
[](https://github.com/dogsheep/twitter-to-sqlite/actions?query=workflow%3ATest)
[](https://github.com/dogsheep/twitter-to-sqlite/blob/main/LICENSE)
Save data from Twitter to a SQLite database.
**This tool currently uses Twitter API v1**. You may be unable to use it if you do not have an API key for that version of the API.
- [How to install](#how-to-install)
- [Authentication](#authentication)
- [Retrieving tweets by specific accounts](#retrieving-tweets-by-specific-accounts)
- [Retrieve user profiles in bulk](#retrieve-user-profiles-in-bulk)
- [Retrieve tweets in bulk](#retrieve-tweets-in-bulk)
- [Retrieving Twitter followers](#retrieving-twitter-followers)
- [Retrieving friends](#retrieving-friends)
- [Retrieving favorited tweets](#retrieving-favorited-tweets)
- [Retrieving Twitter lists](#retrieving-twitter-lists)
- [Retrieving Twitter list memberships](#retrieving-twitter-list-memberships)
- [Retrieving just follower and friend IDs](#retrieving-just-follower-and-friend-ids)
- [Retrieving tweets from your home timeline](#retrieving-tweets-from-your-home-timeline)
- [Retrieving your mentions](#retrieving-your-mentions)
- [Providing input from a SQL query with --sql and --attach](#providing-input-from-a-sql-query-with---sql-and---attach)
- [Running searches](#running-searches)
- [Capturing tweets in real-time with track and follow](#capturing-tweets-in-real-time-with-track-and-follow)
* [track](#track)
* [follow](#follow)
- [Importing data from your Twitter archive](#importing-data-from-your-twitter-archive)
- [Design notes](#design-notes)
## How to install
$ pip install twitter-to-sqlite
## Authentication
First, you will need to create a Twitter application at https://developer.twitter.com/en/apps. You may need to apply for a Twitter developer account - if so, you may find this [example of an email application](https://raw.githubusercontent.com/dogsheep/twitter-to-sqlite/main/email.png) useful that has been approved in the past.
Once you have created your application, navigate to the ""Keys and tokens"" page and make note of the following:
* Your API key
* Your API secret key
* Your access token
* Your access token secret
You will need to save all four of these values to a JSON file in order to use this tool.
You can create that JSON file by running the following command and pasting in the values at the prompts:
$ twitter-to-sqlite auth
Create an app here: https://developer.twitter.com/en/apps
Then navigate to 'Keys and tokens' and paste in the following:
API key: xxx
API secret key: xxx
Access token: xxx
Access token secret: xxx
This will create a file called `auth.json` in your current directory containing the required values. To save the file at a different path or filename, use the `--auth=myauth.json` option.
## Retrieving tweets by specific accounts
The `user-timeline` command retrieves all of the tweets posted by the specified user accounts. It defaults to the account belonging to the authenticated user:
$ twitter-to-sqlite user-timeline twitter.db
Importing tweets [#####-------------------------------] 2799/17780 00:01:39
All of these commands assume that there is an `auth.json` file in the current directory. You can provide the path to your `auth.json` file using `-a`:
$ twitter-to-sqlite user-timeline twitter.db -a /path/to/auth.json
To load tweets for other users, pass their screen names as arguments:
$ twitter-to-sqlite user-timeline twitter.db cleopaws nichemuseums
Twitter's API only returns up to around 3,200 tweets for most user accounts, but you may find that it returns all available tweets for your own user account.
You can pass numeric Twitter user IDs instead of screen names using the `--ids` parameter.
You can use `--since` to retrieve every tweet since the last time you imported for that user, or `--since_id=xxx` to retrieve every tweet since a specific tweet ID.
This command also accepts `--sql` and `--attach` options, documented below.
## Retrieve user profiles in bulk
If you have a list of Twitter screen names (or user IDs) you can bulk fetch their fully inflated Twitter profiles using the `users-lookup` command:
$ twitter-to-sqlite users-lookup users.db simonw cleopaws
You can pass user IDs instead using the `--ids` option:
$ twitter-to-sqlite users-lookup users.db 12497 3166449535 --ids
This command also accepts `--sql` and `--attach` options, documented below.
## Retrieve tweets in bulk
If you have a list of tweet IDS you can bulk fetch them using the `statuses-lookup` command:
$ twitter-to-sqlite statuses-lookup tweets.db 1122154819815239680 1122154178493575169
The `--sql` and `--attach` options are supported.
Here's a recipe to retrieve any tweets that existing tweets are in-reply-to which have not yet been stored in your database:
$ twitter-to-sqlite statuses-lookup tweets.db \
--sql='
select in_reply_to_status_id
from tweets
where in_reply_to_status_id is not null' \
--skip-existing
The `--skip-existing` option means that tweets that have already been stored in the database will not be fetched again.
## Retrieving Twitter followers
The `followers` command retrieves details of every follower of the specified accounts. You can use it to retrieve your own followers, or you can pass one or more screen names to pull the followers for other accounts.
The following command pulls your followers and saves them in a SQLite database file called `twitter.db`:
$ twitter-to-sqlite followers twitter.db
This command is **extremely slow**, because Twitter impose a rate limit of no more than one request per minute to this endpoint! If you are running it against an account with thousands of followers you should expect this to take several hours.
To retrieve followers for another account, use:
$ twitter-to-sqlite followers twitter.db cleopaws
This command also accepts the `--ids`, `--sql` and `--attach` options.
See [Analyzing my Twitter followers with Datasette](https://simonwillison.net/2018/Jan/28/analyzing-my-twitter-followers/) for the original inspiration for this command.
## Retrieving friends
The `friends` command works like the `followers` command, but retrieves the specified (or currently authenticated) user's friends - defined as accounts that the user is following.
$ twitter-to-sqlite friends twitter.db
It takes the same options as the `followers` command.
## Retrieving favorited tweets
The `favorites` command retrieves tweets that have been favorited by a specified user. Called without any extra arguments it retrieves tweets favorited by the currently authenticated user:
$ twitter-to-sqlite favorites faves.db
You can also use the `--screen_name` or `--user_id` arguments to retrieve favorite tweets for another user:
$ twitter-to-sqlite favorites faves-obama.db --screen_name=BarackObama
Use the `--stop_after=xxx` argument to retrieve only the most recent number of favorites, e.g. to get the authenticated user's 50 most recent favorites:
$ twitter-to-sqlite favorites faves.db --stop_after=50
## Retrieving Twitter lists
The `lists` command retrieves all of the lists belonging to one or more users.
$ twitter-to-sqlite lists lists.db simonw dogsheep
This command also accepts the `--sql` and `--attach` and `--ids` options.
To additionally fetch the list of members for each list, use `--members`.
## Retrieving Twitter list memberships
The `list-members` command can be used to retrieve details of one or more Twitter lists, including all of their members.
$ twitter-to-sqlite list-members members.db simonw/the-good-place
You can pass multiple `screen_name/list_slug` identifiers.
If you know the numeric IDs of the lists instead, you can use `--ids`:
$ twitter-to-sqlite list-members members.db 927913322841653248 --ids
## Retrieving just follower and friend IDs
It's also possible to retrieve just the numeric Twitter IDs of the accounts that specific users are following (""friends"" in Twitter's API terminology) or followed-by:
$ twitter-to-sqlite followers-ids members.db simonw cleopaws
This will populate the `following` table with `followed_id`/`follower_id` pairs for the two specified accounts, listing every account ID that is following either of those two accounts.
$ twitter-to-sqlite friends-ids members.db simonw cleopaws
This will do the same thing but pull the IDs that those accounts are following.
Both of these commands also support `--sql` and `--attach` as an alternative to passing screen names as direct command-line arguments. You can use `--ids` to process the inputs as user IDs rather than screen names.
The underlying Twitter APIs have a rate limit of 15 requests every 15 minutes - though they do return up to 5,000 IDs in each call. By default both of these subcommands will wait for 61 seconds between API calls in order to stay within the rate limit - you can adjust this behaviour down to just one second delay if you know you will not be making many calls using `--sleep=1`.
## Retrieving tweets from your home timeline
The `home-timeline` command retrieves up to 800 tweets from the home timeline of the authenticated user - generally this means tweets from people you follow.
$ twitter-to-sqlite home-timeline twitter.db
Importing timeline [#################--------] 591/800 00:01:14
The tweets are stored in the `tweets` table, and a record is added to the `timeline_tweets` table noting that this tweet came in due to being spotted in the timeline of your user.
You can use `--since` to retrieve just tweets that have been posted since the last time this command was run, or `--since_id=xxx` to explicitly pass in a tweet ID to use as the last position.
You can then view your timeline in Datasette using the following URL:
`/tweets/tweets?_where=id+in+(select+tweet+from+[timeline_tweets])&_sort_desc=id&_facet=user`
This will filter your tweets table to just tweets that appear in your timeline, ordered by most recent first and use faceting to show you which users are responsible for the most tweets.
## Retrieving your mentions
The `mentions-timeline` command works like `home-timeline` except it retrieves tweets that mention the authenticated user's account. It records the user account that was mentioned in a `mentions_tweets` table.
It supports `--since` and `--since_id` in the same was as `home-timeline` does.
## Providing input from a SQL query with --sql and --attach
This option is available for some subcommands - run `twitter-to-sqlite command-name --help` to check.
You can provide Twitter screen names (or user IDs or tweet IDs) directly as command-line arguments, or you can provide those screen names or IDs by executing a SQL query.
For example: consider a SQLite database with an `attendees` table listing names and Twitter accounts - something like this:
| First | Last | Twitter |
|---------|------------|--------------|
| Simon | Willison | simonw |
| Avril | Lavigne | AvrilLavigne |
You can run the `users-lookup` command to pull the Twitter profile of every user listed in that database by loading the screen names using a `--sql` query:
$ twitter-to-sqlite users-lookup my.db --sql=""select Twitter from attendees""
If your database table contains Twitter IDs, you can select those IDs and pass the `--ids` argument. For example, to fetch the profiles of users who have had their user IDs inserted into the `following` table using the `twitter-to-sqlite friends-ids` command:
$ twitter-to-sqlite users-lookup my.db --sql=""select follower_id from following"" --ids
Or to avoid re-fetching users that have already been fetched:
$ twitter-to-sqlite users-lookup my.db \
--sql=""select followed_id from following where followed_id not in (
select id from users)"" --ids
If your data lives in a separate database file you can attach it using `--attach`. For example, consider the attendees example above but the data lives in an `attendees.db` file, and you want to fetch the user profiles into a `tweets.db` file. You could do that like this:
$ twitter-to-sqlite users-lookup tweets.db \
--attach=attendees.db \
--sql=""select Twitter from attendees.attendees""
The filename (without the extension) will be used as the database alias within SQLite. If you want a different alias for some reason you can specify that with a colon like this:
$ twitter-to-sqlite users-lookup tweets.db \
--attach=foo:attendees.db \
--sql=""select Twitter from foo.attendees""
## Running searches
The `search` command runs a search against the Twitter [standard search API](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets).
$ twitter-to-sqlite search tweets.db ""dogsheep""
This will import up to around 320 tweets that match that search term into the `tweets` table. It will also create a record in the `search_runs` table recording that the search took place, and many-to-many records in the `search_runs_tweets` table recording which tweets were seen for that search at that time.
You can use the `--since` parameter to check for previous search runs with the same arguments and only retrieve tweets that were posted since the last retrieved matching tweet.
The following additional options for `search` are supported:
* `--geocode`: `latitude,longitude,radius` where radius is a number followed by mi or km
* `--lang`: ISO 639-1 language code e.g. `en` or `es`
* `--locale`: Locale: only `ja` is currently effective
* `--result_type`: `mixed`, `recent` or `popular`. Defaults to `mixed`
* `--count`: Number of results per page, defaults to the maximum of 100
* `--stop_after`: Stop after this many results
* `--since_id`: Pull tweets since this Tweet ID. You probably want to use `--since` instead of this.
## Capturing tweets in real-time with track and follow
This functionality is **experimental**. Please [file bug reports](https://github.com/dogsheep/twitter-to-sqlite/issues) if you find any!
Twitter provides a real-time API which can be used to subscribe to tweets as they happen. `twitter-to-sqlite` can use this API to continually update a SQLite database with tweets matching certain keywords, or referencing specific users.
### track
To track keywords, use the `track` command:
$ twitter-to-sqlite track tweets.db kakapo
This command will continue to run until you hit Ctrl+C. It will capture any tweets mentioning the keyword [kakapo](https://en.wikipedia.org/wiki/Kakapo) and store them in the `tweets.db` database file.
You can pass multiple keywords as a space separated list. This will capture tweets matching either of those keywords:
$ twitter-to-sqlite track tweets.db kakapo raccoon
You can enclose phrases in quotes to search for tweets matching both of those keywords:
$ twitter-to-sqlite track tweets.db 'trash panda'
See [the Twitter track documentation](https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters#track) for advanced tips on using this command.
Add the `--verbose` option to see matching tweets (in their verbose JSON form) displayed to the terminal as they are captured:
$ twitter-to-sqlite track tweets.db raccoon --verbose
### follow
The `follow` command will capture all tweets that are relevant to one or more specific Twitter users.
$ twitter-to-sqlite follow tweets.db nytimes
This includes tweets by those users, tweets that reply to or quote those users and retweets by that user. See [the Twitter follow documentation](https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters#follow) for full details.
The command accepts one or more screen names.
You can feed it numeric Twitter user IDs instead of screen names by using the `--ids` flag.
The command also supports the `--sql` and `--attach` options, and the `--verbose` option for displaying tweets as they are captured.
Here's how to start following tweets from every user ID currently represented as being followed in the `following` table (populated using the `friends-ids` command):
$ twitter-to-sqlite follow tweets.db \
--sql=""select distinct followed_id from following"" \
--ids
## Importing data from your Twitter archive
You can request an archive of your Twitter data by [following these instructions](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive).
Twitter will send you a link to download a `.zip` file. You can import the contents of that file into a set of tables in a new database file called `archive.db` (each table beginning with the `archive_` prefix) using the `import` command:
$ twitter-to-sqlite import archive.db ~/Downloads/twitter-2019-06-25-b31f2.zip
This command does not populate any of the regular tables, since Twitter's export data does not exactly match the schema returned by the Twitter API.
It will delete and recreate the corresponding `archive_*` tables every time you run it. If this is not what you want, run the command against a new SQLite database file name rather than running it against one that already exists.
If you have already decompressed your archive, you can run this against the directory that you decompressed it to:
$ twitter-to-sqlite import archive.db ~/Downloads/twitter-2019-06-25-b31f2/
You can also run it against one or more specific files within that folder. For example, to import just the follower.js and following.js files:
$ twitter-to-sqlite import archive.db \
~/Downloads/twitter-2019-06-25-b31f2/follower.js \
~/Downloads/twitter-2019-06-25-b31f2/following.js
You may want to use other commands to populate tables based on data from the archive. For example, to retrieve full API versions of each of the tweets you have favourited in your archive, you could run the following:
$ twitter-to-sqlite statuses-lookup archive.db \
--sql='select tweetId from archive_like' \
--skip-existing
If you want these imported tweets to then be reflected in the `favorited_by` table, you can do so by applying the following SQL query:
$ sqlite3 archive.db
SQLite version 3.22.0 2018-01-22 18:45:57
Enter "".help"" for usage hints.
sqlite> INSERT OR IGNORE INTO favorited_by (tweet, user)
...> SELECT tweetId, 'YOUR_TWITTER_ID' FROM archive_like;
Replace YOUR_TWITTER_ID with your numeric Twitter ID. If you don't know that ID you can find it out by running the following:
$ twitter-to-sqlite fetch \
""https://api.twitter.com/1.1/account/verify_credentials.json"" \
| grep '""id""' | head -n 1
## Design notes
* Tweet IDs are stored as integers, to afford sorting by ID in a sensible way
* While we configure foreign key relationships between tables, we do not ask SQLite to enforce them. This is used by the `following` table to allow the `followers-ids` and `friends-ids` commands to populate it with user IDs even if the user accounts themselves are not yet present in the `users` table.
","
twitter-to-sqlite
Save data from Twitter to a SQLite database.
This tool currently uses Twitter API v1. You may be unable to use it if you do not have an API key for that version of the API.
Once you have created your application, navigate to the ""Keys and tokens"" page and make note of the following:
Your API key
Your API secret key
Your access token
Your access token secret
You will need to save all four of these values to a JSON file in order to use this tool.
You can create that JSON file by running the following command and pasting in the values at the prompts:
$ twitter-to-sqlite auth
Create an app here: https://developer.twitter.com/en/apps
Then navigate to 'Keys and tokens' and paste in the following:
API key: xxx
API secret key: xxx
Access token: xxx
Access token secret: xxx
This will create a file called auth.json in your current directory containing the required values. To save the file at a different path or filename, use the --auth=myauth.json option.
Retrieving tweets by specific accounts
The user-timeline command retrieves all of the tweets posted by the specified user accounts. It defaults to the account belonging to the authenticated user:
Twitter's API only returns up to around 3,200 tweets for most user accounts, but you may find that it returns all available tweets for your own user account.
You can pass numeric Twitter user IDs instead of screen names using the --ids parameter.
You can use --since to retrieve every tweet since the last time you imported for that user, or --since_id=xxx to retrieve every tweet since a specific tweet ID.
This command also accepts --sql and --attach options, documented below.
Retrieve user profiles in bulk
If you have a list of Twitter screen names (or user IDs) you can bulk fetch their fully inflated Twitter profiles using the users-lookup command:
Here's a recipe to retrieve any tweets that existing tweets are in-reply-to which have not yet been stored in your database:
$ twitter-to-sqlite statuses-lookup tweets.db \
--sql='
select in_reply_to_status_id
from tweets
where in_reply_to_status_id is not null' \
--skip-existing
The --skip-existing option means that tweets that have already been stored in the database will not be fetched again.
Retrieving Twitter followers
The followers command retrieves details of every follower of the specified accounts. You can use it to retrieve your own followers, or you can pass one or more screen names to pull the followers for other accounts.
The following command pulls your followers and saves them in a SQLite database file called twitter.db:
$ twitter-to-sqlite followers twitter.db
This command is extremely slow, because Twitter impose a rate limit of no more than one request per minute to this endpoint! If you are running it against an account with thousands of followers you should expect this to take several hours.
To retrieve followers for another account, use:
$ twitter-to-sqlite followers twitter.db cleopaws
This command also accepts the --ids, --sql and --attach options.
The friends command works like the followers command, but retrieves the specified (or currently authenticated) user's friends - defined as accounts that the user is following.
$ twitter-to-sqlite friends twitter.db
It takes the same options as the followers command.
Retrieving favorited tweets
The favorites command retrieves tweets that have been favorited by a specified user. Called without any extra arguments it retrieves tweets favorited by the currently authenticated user:
$ twitter-to-sqlite favorites faves.db
You can also use the --screen_name or --user_id arguments to retrieve favorite tweets for another user:
It's also possible to retrieve just the numeric Twitter IDs of the accounts that specific users are following (""friends"" in Twitter's API terminology) or followed-by:
This will populate the following table with followed_id/follower_id pairs for the two specified accounts, listing every account ID that is following either of those two accounts.
This will do the same thing but pull the IDs that those accounts are following.
Both of these commands also support --sql and --attach as an alternative to passing screen names as direct command-line arguments. You can use --ids to process the inputs as user IDs rather than screen names.
The underlying Twitter APIs have a rate limit of 15 requests every 15 minutes - though they do return up to 5,000 IDs in each call. By default both of these subcommands will wait for 61 seconds between API calls in order to stay within the rate limit - you can adjust this behaviour down to just one second delay if you know you will not be making many calls using --sleep=1.
Retrieving tweets from your home timeline
The home-timeline command retrieves up to 800 tweets from the home timeline of the authenticated user - generally this means tweets from people you follow.
The tweets are stored in the tweets table, and a record is added to the timeline_tweets table noting that this tweet came in due to being spotted in the timeline of your user.
You can use --since to retrieve just tweets that have been posted since the last time this command was run, or --since_id=xxx to explicitly pass in a tweet ID to use as the last position.
You can then view your timeline in Datasette using the following URL:
This will filter your tweets table to just tweets that appear in your timeline, ordered by most recent first and use faceting to show you which users are responsible for the most tweets.
Retrieving your mentions
The mentions-timeline command works like home-timeline except it retrieves tweets that mention the authenticated user's account. It records the user account that was mentioned in a mentions_tweets table.
It supports --since and --since_id in the same was as home-timeline does.
Providing input from a SQL query with --sql and --attach
This option is available for some subcommands - run twitter-to-sqlite command-name --help to check.
You can provide Twitter screen names (or user IDs or tweet IDs) directly as command-line arguments, or you can provide those screen names or IDs by executing a SQL query.
For example: consider a SQLite database with an attendees table listing names and Twitter accounts - something like this:
First
Last
Twitter
Simon
Willison
simonw
Avril
Lavigne
AvrilLavigne
You can run the users-lookup command to pull the Twitter profile of every user listed in that database by loading the screen names using a --sql query:
$ twitter-to-sqlite users-lookup my.db --sql=""select Twitter from attendees""
If your database table contains Twitter IDs, you can select those IDs and pass the --ids argument. For example, to fetch the profiles of users who have had their user IDs inserted into the following table using the twitter-to-sqlite friends-ids command:
$ twitter-to-sqlite users-lookup my.db --sql=""select follower_id from following"" --ids
Or to avoid re-fetching users that have already been fetched:
$ twitter-to-sqlite users-lookup my.db \
--sql=""select followed_id from following where followed_id not in (
select id from users)"" --ids
If your data lives in a separate database file you can attach it using --attach. For example, consider the attendees example above but the data lives in an attendees.db file, and you want to fetch the user profiles into a tweets.db file. You could do that like this:
The filename (without the extension) will be used as the database alias within SQLite. If you want a different alias for some reason you can specify that with a colon like this:
This will import up to around 320 tweets that match that search term into the tweets table. It will also create a record in the search_runs table recording that the search took place, and many-to-many records in the search_runs_tweets table recording which tweets were seen for that search at that time.
You can use the --since parameter to check for previous search runs with the same arguments and only retrieve tweets that were posted since the last retrieved matching tweet.
The following additional options for search are supported:
--geocode: latitude,longitude,radius where radius is a number followed by mi or km
--lang: ISO 639-1 language code e.g. en or es
--locale: Locale: only ja is currently effective
--result_type: mixed, recent or popular. Defaults to mixed
--count: Number of results per page, defaults to the maximum of 100
--stop_after: Stop after this many results
--since_id: Pull tweets since this Tweet ID. You probably want to use --since instead of this.
Capturing tweets in real-time with track and follow
This functionality is experimental. Please file bug reports if you find any!
Twitter provides a real-time API which can be used to subscribe to tweets as they happen. twitter-to-sqlite can use this API to continually update a SQLite database with tweets matching certain keywords, or referencing specific users.
track
To track keywords, use the track command:
$ twitter-to-sqlite track tweets.db kakapo
This command will continue to run until you hit Ctrl+C. It will capture any tweets mentioning the keyword kakapo and store them in the tweets.db database file.
You can pass multiple keywords as a space separated list. This will capture tweets matching either of those keywords:
The follow command will capture all tweets that are relevant to one or more specific Twitter users.
$ twitter-to-sqlite follow tweets.db nytimes
This includes tweets by those users, tweets that reply to or quote those users and retweets by that user. See the Twitter follow documentation for full details.
The command accepts one or more screen names.
You can feed it numeric Twitter user IDs instead of screen names by using the --ids flag.
The command also supports the --sql and --attach options, and the --verbose option for displaying tweets as they are captured.
Here's how to start following tweets from every user ID currently represented as being followed in the following table (populated using the friends-ids command):
Twitter will send you a link to download a .zip file. You can import the contents of that file into a set of tables in a new database file called archive.db (each table beginning with the archive_ prefix) using the import command:
This command does not populate any of the regular tables, since Twitter's export data does not exactly match the schema returned by the Twitter API.
It will delete and recreate the corresponding archive_* tables every time you run it. If this is not what you want, run the command against a new SQLite database file name rather than running it against one that already exists.
If you have already decompressed your archive, you can run this against the directory that you decompressed it to:
You may want to use other commands to populate tables based on data from the archive. For example, to retrieve full API versions of each of the tweets you have favourited in your archive, you could run the following:
If you want these imported tweets to then be reflected in the favorited_by table, you can do so by applying the following SQL query:
$ sqlite3 archive.db
SQLite version 3.22.0 2018-01-22 18:45:57
Enter "".help"" for usage hints.
sqlite> INSERT OR IGNORE INTO favorited_by (tweet, user)
...> SELECT tweetId, 'YOUR_TWITTER_ID' FROM archive_like;
<Ctrl+D>
Replace YOUR_TWITTER_ID with your numeric Twitter ID. If you don't know that ID you can find it out by running the following:
Tweet IDs are stored as integers, to afford sorting by ID in a sensible way
While we configure foreign key relationships between tables, we do not ask SQLite to enforce them. This is used by the following table to allow the followers-ids and friends-ids commands to populate it with user IDs even if the user accounts themselves are not yet present in the users table.
",1,public,0,,,
206202864,MDEwOlJlcG9zaXRvcnkyMDYyMDI4NjQ=,inaturalist-to-sqlite,dogsheep/inaturalist-to-sqlite,0,53015001,https://github.com/dogsheep/inaturalist-to-sqlite,Create a SQLite database containing your observation history from iNaturalist,0,2019-09-04T01:21:21Z,2020-12-19T05:18:38Z,2020-10-22T00:08:58Z,,17,2,2,Python,1,1,1,1,0,0,0,0,0,apache-2.0,"[""sqlite"", ""inaturalist"", ""datasette"", ""dogsheep"", ""datasette-io"", ""datasette-tool""]",0,0,2,master,"{""admin"": false, ""push"": false, ""pull"": false}",,53015001,0,1,"# inaturalist-to-sqlite
[](https://pypi.org/project/inaturalist-to-sqlite/)
[](https://circleci.com/gh/dogsheep/inaturalist-to-sqlite)
[](https://github.com/dogsheep/inaturalist-to-sqlite/blob/master/LICENSE)
Create a SQLite database containing your observation history from [iNaturalist](https://www.inaturalist.org/).
## How to install
$ pip install inaturalist-to-sqlite
## Usage
$ inaturalist-to-sqlite inaturalist.db yourusername
(Or try `simonw` if you don't yet have an iNaturalist account)
This will import all of your iNaturalist observations into a SQLite database called `inaturalist.db`.","
inaturalist-to-sqlite
Create a SQLite database containing your observation history from iNaturalist.
(Or try simonw if you don't yet have an iNaturalist account)
This will import all of your iNaturalist observations into a SQLite database called inaturalist.db.
",,,,,,
206649770,MDEwOlJlcG9zaXRvcnkyMDY2NDk3NzA=,google-takeout-to-sqlite,dogsheep/google-takeout-to-sqlite,0,53015001,https://github.com/dogsheep/google-takeout-to-sqlite,Save data from Google Takeout to a SQLite database,0,2019-09-05T20:15:15Z,2021-06-08T15:31:47Z,2021-02-24T00:34:55Z,,14,51,51,Python,1,1,1,1,0,4,0,0,6,apache-2.0,"[""google"", ""sqlite"", ""datasette"", ""dogsheep"", ""datasette-io"", ""datasette-tool""]",4,6,51,master,"{""admin"": false, ""push"": false, ""pull"": false}",,53015001,4,3,"# google-takeout-to-sqlite
[](https://pypi.org/project/google-takeout-to-sqlite/)
[](https://circleci.com/gh/dogsheep/google-takeout-to-sqlite)
[](https://github.com/dogsheep/google-takeout-to-sqlite/blob/master/LICENSE)
Save data from google-takeout to a SQLite database.
## How to install
$ pip install google-takeout-to-sqlite
Request your Google data from https://takeout.google.com/ - wait for the email and download the zip file.
This tool only supports a subset of the available options. More will be added over time.
## My Activity
You can request the ""My Activity"" export and then import it with the following command:
$ google-takeout-to-sqlite my-activity takeout.db ~/Downloads/takeout-20190530.zip
This will create a database file called `takeout.db` if one does not already exist.
## Location History
Your location history records latitude, longitude and timestame for where Google has tracked your location. You can import it using this command:
$ google-takeout-to-sqlite location-history takeout.db ~/Downloads/takeout-20190530.zip
## Browsing your data with Datasette
Once you have imported Google data into a SQLite database file you can browse your data using [Datasette](https://github.com/simonw/datasette). Install Datasette like so:
$ pip install datasette
Now browse your data by running this and then visiting `http://localhost:8001/`
$ datasette takeout.db
Install the [datasette-cluster-map](https://github.com/simonw/datasette-cluster-map) plugin to see your location history on a map:
$ pip install datasette-cluster-map
","
google-takeout-to-sqlite
Save data from google-takeout to a SQLite database.
",,,,,,
207052882,MDEwOlJlcG9zaXRvcnkyMDcwNTI4ODI=,github-to-sqlite,dogsheep/github-to-sqlite,0,53015001,https://github.com/dogsheep/github-to-sqlite,Save data from GitHub to a SQLite database,0,2019-09-08T02:50:28Z,2022-09-20T04:36:37Z,2022-09-28T21:07:54Z,https://github-to-sqlite.dogsheep.net/,143,235,235,Python,1,1,1,1,0,32,0,0,20,apache-2.0,"[""datasette"", ""datasette-io"", ""datasette-tool"", ""dogsheep"", ""github-api"", ""sqlite""]",32,20,235,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,53015001,32,6,"# github-to-sqlite
[](https://pypi.org/project/github-to-sqlite/)
[](https://github.com/dogsheep/github-to-sqlite/releases)
[](https://github.com/dogsheep/github-to-sqlite/actions?query=workflow%3ATest)
[](https://github.com/dogsheep/github-to-sqlite/blob/main/LICENSE)
Save data from GitHub to a SQLite database.
- [Demo](#demo)
- [How to install](#how-to-install)
- [Authentication](#authentication)
- [Fetching issues for a repository](#fetching-issues-for-a-repository)
- [Fetching pull requests for a repository](#fetching-pull-requests-for-a-repository)
- [Fetching issue comments for a repository](#fetching-issue-comments-for-a-repository)
- [Fetching commits for a repository](#fetching-commits-for-a-repository)
- [Fetching releases for a repository](#fetching-releases-for-a-repository)
- [Fetching tags for a repository](#fetching-tags-for-a-repository)
- [Fetching contributors to a repository](#fetching-contributors-to-a-repository)
- [Fetching repos belonging to a user or organization](#fetching-repos-belonging-to-a-user-or-organization)
- [Fetching specific repositories](#fetching-specific-repositories)
- [Fetching repos that have been starred by a user](#fetching-repos-that-have-been-starred-by-a-user)
- [Fetching users that have starred specific repos](#fetching-users-that-have-starred-specific-repos)
- [Fetching GitHub Actions workflows](#fetching-github-actions-workflows)
- [Scraping dependents for a repository](#scraping-dependents-for-a-repository)
- [Fetching emojis](#fetching-emojis)
- [Making authenticated API calls](#making-authenticated-api-calls)
## Demo
https://github-to-sqlite.dogsheep.net/ hosts a [Datasette](https://datasette.io/) demo of a database created by [running this tool](https://github.com/dogsheep/github-to-sqlite/blob/main/.github/workflows/deploy-demo.yml#L40-L60) against all of the repositories in the [Dogsheep GitHub organization](https://github.com/dogsheep), plus the [datasette](https://github.com/simonw/datasette) and [sqlite-utils](https://github.com/simonw/sqlite-utils) repositories.
## How to install
$ pip install github-to-sqlite
## Authentication
Create a GitHub personal access token: https://github.com/settings/tokens
Run this command and paste in your new token:
$ github-to-sqlite auth
This will create a file called `auth.json` in your current directory containing the required value. To save the file at a different path or filename, use the `--auth=myauth.json` option.
As an alternative to using an `auth.json` file you can add your access token to an environment variable called `GITHUB_TOKEN`.
## Fetching issues for a repository
The `issues` command retrieves all of the issues belonging to a specified repository.
$ github-to-sqlite issues github.db simonw/datasette
If an `auth.json` file is present it will use the token from that file. It works without authentication for public repositories but you should be aware that GitHub have strict IP-based rate limits for unauthenticated requests.
You can point to a different location of `auth.json` using `-a`:
$ github-to-sqlite issues github.db simonw/datasette -a /path/to/auth.json
You can use the `--issue` option one or more times to load specific issues:
$ github-to-sqlite issues github.db simonw/datasette --issue=1
Example: [issues table](https://github-to-sqlite.dogsheep.net/github/issues)
## Fetching pull requests for a repository
While pull requests are a type of issue, you will get more information on pull requests by pulling them separately. For example, whether a pull request has been merged and when.
Following the API of issues, the `pull-requests` command retrieves all of the pull requests belonging to a specified repository.
$ github-to-sqlite pull-requests github.db simonw/datasette
You can use the `--pull-request` option one or more times to load specific pull request:
$ github-to-sqlite pull-requests github.db simonw/datasette --pull-request=81
Note that the `merged_by` column on the `pull_requests` table will only be populated for pull requests that are loaded using the `--pull-request` option - the GitHub API does not return this field for pull requests that are loaded in bulk.
Example: [pull_requests table](https://github-to-sqlite.dogsheep.net/github/pull_requests)
## Fetching issue comments for a repository
The `issue-comments` command retrieves all of the comments on all of the issues in a repository.
It is recommended you run `issues` first, so that each imported comment can have a foreign key poining to its issue.
$ github-to-sqlite issues github.db simonw/datasette
$ github-to-sqlite issue-comments github.db simonw/datasette
You can use the `--issue` option to only load comments for a specific issue within that repository, for example:
$ github-to-sqlite issue-comments github.db simonw/datasette --issue=1
Example: [issue_comments table](https://github-to-sqlite.dogsheep.net/github/issue_comments)
## Fetching commits for a repository
The `commits` command retrieves details of all of the commits for one or more repositories. It currently fetches the sha, commit message and author and committer details - it does no retrieve the full commit body.
$ github-to-sqlite commits github.db simonw/datasette simonw/sqlite-utils
The command accepts one or more repositories.
By default it will stop as soon as it sees a commit that has previously been retrieved. You can force it to retrieve all commits (including those that have been previously inserted) using `--all`.
Example: [commits table](https://github-to-sqlite.dogsheep.net/github/commits)
## Fetching releases for a repository
The `releases` command retrieves the releases for one or more repositories.
$ github-to-sqlite releases github.db simonw/datasette simonw/sqlite-utils
The command accepts one or more repositories.
Example: [releases table](https://github-to-sqlite.dogsheep.net/github/releases)
## Fetching tags for a repository
The `tags` command retrieves all of the tags for one or more repositories.
$ github-to-sqlite tags github.db simonw/datasette simonw/sqlite-utils
Example: [tags table](https://github-to-sqlite.dogsheep.net/github/tags)
## Fetching contributors to a repository
The `contributors` command retrieves details of all of the contributors for one or more repositories.
$ github-to-sqlite contributors github.db simonw/datasette simonw/sqlite-utils
The command accepts one or more repositories. It populates a `contributors` table, with foreign keys to `repos` and `users` and a `contributions` table listing the number of commits to that repository for each contributor.
Example: [contributors table](https://github-to-sqlite.dogsheep.net/github/contributors)
## Fetching repos belonging to a user or organization
The `repos` command fetches repos belonging to a user or organization.
Without any other arguments, this command will fetch all repos that the currently authenticated user owns, collaborates on or can access via one of their organizations:
$ github-to-sqlite repos github.db
To fetch repos belonging to a specific user or organization, provide their username as an argument:
$ github-to-sqlite repos github.db dogsheep # organization
$ github-to-sqlite repos github.db simonw # user
You can pass more than one username to fetch for multiple users or organizations at once:
$ github-to-sqlite repos github.db simonw dogsheep
Add the `--readme` option to save the README for the repo in a column called `readme`. Add `--readme-html` to save the HTML rendered version of the README into a collumn called `readme_html`.
Example: [repos table](https://github-to-sqlite.dogsheep.net/github/repos)
## Fetching specific repositories
You can use `-r` with the `repos` command one or more times to fetch just specific repositories.
$ github-to-sqlite repos github.db -r simonw/datasette -r dogsheep/github-to-sqlite
## Fetching repos that have been starred by a user
The `starred` command fetches the repos that have been starred by a user.
$ github-to-sqlite starred github.db simonw
If you are using an `auth.json` file you can omit the username to retrieve the starred repos for the authenticated user.
Example: [stars table](https://github-to-sqlite.dogsheep.net/github/stars)
## Fetching users that have starred specific repos
The `stargazers` command fetches the users that have starred the specified repos.
$ github-to-sqlite stargazers github.db simonw/datasette dogsheep/github-to-sqlite
You can specify one or more repository using `owner/repo` syntax.
Users fetched using this command will be inserted into the `users` table. Many-to-many records showing which repository they starred will be added to the `stars` table.
## Fetching GitHub Actions workflows
The `workflows` command fetches the YAML workflow configurations from each repository's `.github/workflows` directory and parses them to populate `workflows`, `jobs` and `steps` tables.
$ github-to-sqlite workflows github.db simonw/datasette dogsheep/github-to-sqlite
You can specify one or more repository using `owner/repo` syntax.
Example: [workflows table](https://github-to-sqlite.dogsheep.net/github/workflows), [jobs table](https://github-to-sqlite.dogsheep.net/github/jobs), [steps table](https://github-to-sqlite.dogsheep.net/github/steps)
## Scraping dependents for a repository
The GitHub dependency graph can show other GitHub projects that depend on a specific repo, for example [simonw/datasette/network/dependents](https://github.com/simonw/datasette/network/dependents).
This data is not yet available through the GitHub API. The `scrape-dependents` command scrapes those pages and uses the GitHub API to load full versions of the dependent repositories.
$ github-to-sqlite scrape-dependents github.db simonw/datasette
The command accepts one or more repositories.
Add `-v` for verbose output.
Example: [dependents table](https://github-to-sqlite.dogsheep.net/github/dependents?_sort_desc=first_seen_utc)
## Fetching emojis
You can fetch a list of every emoji supported by GitHub using the `emojis` command:
$ github-to-sqlite emojis github.db
This will create a table callad `emojis` with a primary key `name` and a `url` column.
If you add the `--fetch` option the command will also fetch the binary content of the images and place them in an `image` column:
$ github-to-sqlite emojis emojis.db -f
[########----------------------------] 397/1799 22% 00:03:43
You can then use the [datasette-render-images](https://github.com/simonw/datasette-render-images) plugin to browse them visually.
Example: [emojis table](https://github-to-sqlite.dogsheep.net/github/emojis)
## Making authenticated API calls
The `github-to-sqlite get` command provides a convenient shortcut for making authenticated calls to the API. Once you have created your `auth.json` file (or set a `GITHUB_TOKEN` environment variable) you can use it like this:
$ github-to-sqlite get https://api.github.com/gists
This will make an authenticated call to the URL you provide and pretty-print the resulting JSON to the console.
You can ommit the `https://api.github.com/` prefix, for example:
$ github-to-sqlite get /gists
Many GitHub APIs are [paginated using the HTTP Link header](https://docs.github.com/en/rest/guides/traversing-with-pagination). You can follow this pagination and output a list of all of the resulting items using `--paginate`:
$ github-to-sqlite get /users/simonw/repos --paginate
You can outline newline-delimited JSON for each item using `--nl`. This can be useful for streaming items into another tool.
$ github-to-sqlite get /users/simonw/repos --nl
","
This will create a file called auth.json in your current directory containing the required value. To save the file at a different path or filename, use the --auth=myauth.json option.
As an alternative to using an auth.json file you can add your access token to an environment variable called GITHUB_TOKEN.
Fetching issues for a repository
The issues command retrieves all of the issues belonging to a specified repository.
If an auth.json file is present it will use the token from that file. It works without authentication for public repositories but you should be aware that GitHub have strict IP-based rate limits for unauthenticated requests.
You can point to a different location of auth.json using -a:
$ github-to-sqlite issues github.db simonw/datasette -a /path/to/auth.json
You can use the --issue option one or more times to load specific issues:
While pull requests are a type of issue, you will get more information on pull requests by pulling them separately. For example, whether a pull request has been merged and when.
Following the API of issues, the pull-requests command retrieves all of the pull requests belonging to a specified repository.
Note that the merged_by column on the pull_requests table will only be populated for pull requests that are loaded using the --pull-request option - the GitHub API does not return this field for pull requests that are loaded in bulk.
The commits command retrieves details of all of the commits for one or more repositories. It currently fetches the sha, commit message and author and committer details - it does no retrieve the full commit body.
By default it will stop as soon as it sees a commit that has previously been retrieved. You can force it to retrieve all commits (including those that have been previously inserted) using --all.
The command accepts one or more repositories. It populates a contributors table, with foreign keys to repos and users and a contributions table listing the number of commits to that repository for each contributor.
Fetching repos belonging to a user or organization
The repos command fetches repos belonging to a user or organization.
Without any other arguments, this command will fetch all repos that the currently authenticated user owns, collaborates on or can access via one of their organizations:
$ github-to-sqlite repos github.db
To fetch repos belonging to a specific user or organization, provide their username as an argument:
Add the --readme option to save the README for the repo in a column called readme. Add --readme-html to save the HTML rendered version of the README into a collumn called readme_html.
You can specify one or more repository using owner/repo syntax.
Users fetched using this command will be inserted into the users table. Many-to-many records showing which repository they starred will be added to the stars table.
Fetching GitHub Actions workflows
The workflows command fetches the YAML workflow configurations from each repository's .github/workflows directory and parses them to populate workflows, jobs and steps tables.
This data is not yet available through the GitHub API. The scrape-dependents command scrapes those pages and uses the GitHub API to load full versions of the dependent repositories.
The github-to-sqlite get command provides a convenient shortcut for making authenticated calls to the API. Once you have created your auth.json file (or set a GITHUB_TOKEN environment variable) you can use it like this:
$ github-to-sqlite get https://api.github.com/gists
This will make an authenticated call to the URL you provide and pretty-print the resulting JSON to the console.
You can ommit the https://api.github.com/ prefix, for example:
$ github-to-sqlite get /gists
Many GitHub APIs are paginated using the HTTP Link header. You can follow this pagination and output a list of all of the resulting items using --paginate:
$ github-to-sqlite get /users/simonw/repos --paginate
You can outline newline-delimited JSON for each item using --nl. This can be useful for streaming items into another tool.
$ github-to-sqlite get /users/simonw/repos --nl
",1,public,0,,0,
209590345,MDEwOlJlcG9zaXRvcnkyMDk1OTAzNDU=,genome-to-sqlite,dogsheep/genome-to-sqlite,0,53015001,https://github.com/dogsheep/genome-to-sqlite,Import your genome into a SQLite database,0,2019-09-19T15:38:39Z,2021-01-18T19:39:48Z,2019-09-19T15:41:17Z,,9,13,13,Python,1,1,1,1,0,0,0,0,2,apache-2.0,"[""genetics"", ""sqlite"", ""23andme"", ""personal-analytics"", ""datasette"", ""dogsheep"", ""datasette-io"", ""datasette-tool""]",0,2,13,master,"{""admin"": false, ""push"": false, ""pull"": false}",,53015001,0,2,"# genome-to-sqlite
[](https://pypi.org/project/genome-to-sqlite/)
[](https://circleci.com/gh/dogsheep/genome-to-sqlite)
[](https://github.com/dogsheep/genome-to-sqlite/blob/master/LICENSE)
Import your genome into a SQLite database.
## How to install
$ pip install genome-to-sqlite
## How to use
First, export your genome. This tool has only been tested against 23andMe so far. You can request an export of your genome from https://you.23andme.com/tools/data/download/
Now you can convert the resulting `export.zip` file to SQLite like so:
$ genome-to-sqlite export.zip genome.db
A progress bar will be displayed. You can disable this using `--silent`.
```
Importing genome [#----------------] 5% 00:01:33
```
You can explore the resulting data using [Datasette](https://datasette.readthedocs.io/) like this:
$ datasette genome.db --config facet_time_limit_ms:1000
Bumping up the facet time limit is useful in order to enable faceting by chromosome:
http://127.0.0.1:8001/genome/genome?_facet=chromosome&_sort=position
","
",,,,,,
213286752,MDEwOlJlcG9zaXRvcnkyMTMyODY3NTI=,pocket-to-sqlite,dogsheep/pocket-to-sqlite,0,53015001,https://github.com/dogsheep/pocket-to-sqlite,Create a SQLite database containing data from your Pocket account,0,2019-10-07T03:24:14Z,2022-08-21T21:11:59Z,2022-08-22T16:21:34Z,,20,63,63,Python,1,1,1,1,0,3,0,0,5,apache-2.0,"[""datasette"", ""datasette-io"", ""datasette-tool"", ""dogsheep"", ""pocket"", ""pocket-api"", ""sqlite""]",3,5,63,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,53015001,3,4,"# pocket-to-sqlite
[](https://pypi.org/project/pocket-to-sqlite/)
[](https://github.com/dogsheep/pocket-to-sqlite/releases)
[](https://github.com/dogsheep/pocket-to-sqlite/actions?query=workflow%3ATest)
[](https://github.com/dogsheep/pocket-to-sqlite/blob/main/LICENSE)
Create a SQLite database containing data from your [Pocket](https://getpocket.com/) account.
## How to install
$ pip install pocket-to-sqlite
## Usage
You will need to first obtain a valid OAuth token for your Pocket account. You can do this by running the `auth` command and following the prompts:
$ pocket-to-sqlite auth
Visit this page and sign in with your Pocket account:
https://getpocket.com/auth/author...
Once you have signed in there, hit to continue
Authentication tokens written to auth.json
Now you can fetch all of your items from Pocket like this:
$ pocket-to-sqlite fetch pocket.db
The first time you run this command it will fetch all of your items, and display a progress bar while it does it.
On subsequent runs it will only fetch new items.
You can force it to fetch everything from the beginning again using `--all`. Use `--silent` to disable the progress bar.
## Using with Datasette
The SQLite database produced by this tool is designed to be browsed using [Datasette](https://datasette.readthedocs.io/). Use the [datasette-render-timestamps](https://github.com/simonw/datasette-render-timestamps) plugin to improve the display of the timestamp values.
","
pocket-to-sqlite
Create a SQLite database containing data from your Pocket account.
How to install
$ pip install pocket-to-sqlite
Usage
You will need to first obtain a valid OAuth token for your Pocket account. You can do this by running the auth command and following the prompts:
$ pocket-to-sqlite auth
Visit this page and sign in with your Pocket account:
https://getpocket.com/auth/author...
Once you have signed in there, hit <enter> to continue
Authentication tokens written to auth.json
Now you can fetch all of your items from Pocket like this:
$ pocket-to-sqlite fetch pocket.db
The first time you run this command it will fetch all of your items, and display a progress bar while it does it.
On subsequent runs it will only fetch new items.
You can force it to fetch everything from the beginning again using --all. Use --silent to disable the progress bar.
Using with Datasette
The SQLite database produced by this tool is designed to be browsed using Datasette. Use the datasette-render-timestamps plugin to improve the display of the timestamp values.
",1,public,0,,0,
248903544,MDEwOlJlcG9zaXRvcnkyNDg5MDM1NDQ=,hacker-news-to-sqlite,dogsheep/hacker-news-to-sqlite,0,53015001,https://github.com/dogsheep/hacker-news-to-sqlite,Create a SQLite database containing data pulled from Hacker News,0,2020-03-21T04:02:05Z,2021-06-06T22:42:00Z,2021-03-13T19:15:06Z,,19,25,25,Python,1,1,1,1,0,2,0,0,0,apache-2.0,"[""hacker-news"", ""datasette"", ""dogsheep"", ""datasette-io"", ""datasette-tool""]",2,0,25,main,"{""admin"": false, ""push"": false, ""pull"": false}",,53015001,2,1,"# hacker-news-to-sqlite
[](https://pypi.org/project/hacker-news-to-sqlite/)
[](https://github.com/dogsheep/hacker-news-to-sqlite/releases)
[](https://github.com/dogsheep/hacker-news-to-sqlite/actions?query=workflow%3ATest)
[](https://github.com/simonw/hacker-news-to-sqlite/blob/main/LICENSE)
Create a SQLite database containing data fetched from [Hacker News](https://news.ycombinator.com/).
## How to install
$ pip install hacker-news-to-sqlite
## Usage
$ hacker-news-to-sqlite user hacker-news.db your-username
Importing items: 37%|███████████ | 845/2297 [05:09<11:02, 2.19it/s]
Imports all of your Hacker News submissions and comments into a SQLite database called `hacker-news.db`.
$ hacker-news-to-sqlite trees hacker-news.db 22640038 22643218
Fetches the entire comments tree in which any of those content IDs appears.
## Browsing your data with Datasette
You can use [Datasette](https://datasette.readthedocs.org/) to browse your data. Install Datasette like this:
$ pip install datasette
Now run it against your `hacker-news.db` file like so:
$ datasette hacker-news.db
Visit `https://localhost:8001/` to search and explore your data.
You can improve the display of your data usinng the [datasette-render-timestamps](https://github.com/simonw/datasette-render-timestamps) and [datasette-render-html](https://github.com/simonw/datasette-render-html) plugins. Install them like this:
$ pip install datasette-render-timestamps datasette-render-html
Now save the following configuration in a file called `metadata.json`:
```json
{
""databases"": {
""hacker-news"": {
""tables"": {
""items"": {
""plugins"": {
""datasette-render-html"": {
""columns"": [
""text""
]
},
""datasette-render-timestamps"": {
""columns"": [
""time""
]
}
}
},
""users"": {
""plugins"": {
""datasette-render-timestamps"": {
""columns"": [
""created""
]
}
}
}
}
}
}
}
```
Run Datasette like this:
$ datasette -m metadata.json hacker-news.db
The timestamp columns will now be rendered as human-readable dates, and any HTML in your posts will be displayed as rendered HTML.
","
hacker-news-to-sqlite
Create a SQLite database containing data fetched from Hacker News.
The timestamp columns will now be rendered as human-readable dates, and any HTML in your posts will be displayed as rendered HTML.
",,,,,,
256834907,MDEwOlJlcG9zaXRvcnkyNTY4MzQ5MDc=,dogsheep-photos,dogsheep/dogsheep-photos,0,53015001,https://github.com/dogsheep/dogsheep-photos,Upload your photos to S3 and import metadata about them into a SQLite database,0,2020-04-18T19:22:13Z,2021-11-04T20:45:03Z,2021-11-04T20:45:00Z,,68,124,124,Python,1,1,1,1,0,7,0,0,19,apache-2.0,"[""datasette"", ""datasette-io"", ""datasette-tool"", ""dogsheep"", ""sqlite""]",7,19,124,master,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,53015001,7,10,"# dogsheep-photos
[](https://pypi.org/project/dogsheep-photos/)
[](https://github.com/dogsheep/dogsheep-photos/releases)
[](https://circleci.com/gh/dogsheep/dogsheep-photos)
[](https://github.com/dogsheep/dogsheep-photos/blob/master/LICENSE)
Save details of your photos to a SQLite database and upload them to S3.
See [Using SQL to find my best photo of a pelican according to Apple Photos](https://simonwillison.net/2020/May/21/apple-photos-sqlite/) for background information on this project.
## What these tools do
These tools are a work-in-progress mechanism for taking full ownership of your photos. The core idea is to help implement the following:
* Every photo you have taken lives in a single, private Amazon S3 bucket
* You have a single SQLite database file which stores metadata about those photos - potentially pulled from multiple different places. This may include EXIF data, Apple Photos, the results of running machine learning APIs against photos and much more besides.
* You can then use [Datasette](https://github.com/simonw/datasette) to explore your own photos.
I'm a heavy user of Apple Photos so the initial releases of this tool will have a bias towards that, but ideally I would like a subset of these tools to be useful to people no matter which core photo solution they are using.
## Installation
$ pip install dogsheep-photos
## Authentication (if using S3)
If you want to use S3 to store your photos, you will need to first create S3 credentials for a new, dedicated bucket.
You may find the [s3-credentials tool](https://github.com/simonw/s3-credentials) useful for this.
Run this command and paste in your credentials. You will need three values: the name of your S3 bucket, your Access key ID and your Secret access key.
$ dogsheep-photos s3-auth
This will create a file called `auth.json` in your current directory containing the required values. To save the file at a different path or filename, use the `--auth=myauth.json` option.
## Uploading photos
Run this command to upload every photo in a specific directory to your S3 bucket:
$ dogsheep-photos upload photos.db \
~/Pictures/Photos\ Library.photoslibrary/original
The command will only upload photos that have not yet been uploaded, based on their sha256 hash.
`photos.db` will be created with an `uploads` table containing details of which files were uploaded.
To see what the command would do without uploading any files, use the `--dry-run` option.
The sha256 hash of the photo contents will be used as the name of the file in the bucket, with an extension matching the type of file. This is an implementation of the [Content addressable storage](https://en.wikipedia.org/wiki/Content-addressable_storage) pattern.
## Importing Apple Photos metadata
The `apple-photos` command imports metadata from your Apple Photos library.
$ photo-to-sqlite apple-photos photos.db
Imported metadata includes places, people, albums, quality scores and machine learning labels for the photo contents.
## Creating a subset database
You can create a new, subset database of photos using the `create-subset` command.
This is useful for creating a shareable SQLite database that only contains metadata for a selected set of photos.
Since photo metadata contains latitude and longitude you may not want to share a database that includes photos taken at your home address.
`create-subset` takes three arguments: an existing database file created using the `apple-photos` command, the name of the new, shareable database file you would like to create and a SQL query that returns the `sha256` hash values of the photos you would like to include in that database.
For example, here's how to create a shareable database of just the photos that have been added to albums containing the word ""Public"":
$ dogsheep-photos create-subset \
photos.db \
public.db \
""select sha256 from apple_photos where albums like '%Public%'""
## Serving photos locally with datasette-media
If you don't want to upload your photos to S3 but you still want to browse them using Datasette you can do so using the [datasette-media](https://github.com/simonw/datasette-media) plugin. This plugin adds the ability to serve images and other static files directly from disk, configured using a SQL query.
To use it, first install Datasette and the plugin:
$ pip install datasette datasette-media
If any of your photos are `.HEIC` images taken by an iPhone you should also install the optional `pyheif` dependency:
$ pip install pyheif
Now create a `metadata.yaml` file configuring the plugin:
```yaml
plugins:
datasette-media:
thumbnail:
sql: |-
select path as filepath, 200 as resize_height from apple_photos where uuid = :key
large:
sql: |-
select path as filepath, 1024 as resize_height from apple_photos where uuid = :key
```
This will configure two URL endpoints - one for 200 pixel high thumbnails and one for 1024 pixel high larger images.
Create your `photos.db` database using the `apple-photos` command, then run Datasette like this:
$ datasette -m metadata.yaml
Your photos will be served on URLs that look like this:
http://127.0.0.1:8001/-/media/thumbnail/F4469918-13F3-43D8-9EC1-734C0E6B60AD
http://127.0.0.1:8001/-/media/large/F4469918-13F3-43D8-9EC1-734C0E6B60AD
You can find the UUIDs for use in these URLs by running `select uuid from photos_with_apple_metadata`.
### Displaying images using datasette-json-html
If you are using `datasette-media` to serve photos you can include images directly in Datasette query results using the [datasette-json-html](https://github.com/simonw/datasette-json-html) plugin.
Run `pip install datasette-json-html` to install the plugin, then use queries like this to view your images:
```sql
select
json_object(
'img_src',
'/-/media/thumbnail/' || uuid
) as photo,
uuid,
date
from
apple_photos
order by
date desc
limit 10;
```
The `photo` column returned by this query should render as image tags that display the correct images.
### Displaying images using custom template pages
Datasette's [custom pages](https://datasette.readthedocs.io/en/stable/custom_templates.html#custom-pages) feature lets you create custom pages for a Datasette instance by dropping HTML templates into a `templates/pages` directory and then running Datasette using `datasette --template-dir=templates/`.
You can combine that ability with the [datasette-template-sql](https://github.com/simonw/datasette-template-sql) plugin to create custom template pages that directly display photos served by `datasette-media`.
Install the plugin using `pip install datasette-template-sql`.
Create a `templates/pages` folder and add the following files:
`recent-photos.html`
```html+jinja
Recent photos
{% for photo in sql(""select * from apple_photos order by date desc limit 20"") %}
{% endfor %}
```
`random-photos.html`
```html+jinja
Random photos
{% for photo in sql(""with foo as (select * from apple_photos order by date desc limit 5000) select * from foo order by random() limit 20"") %}
{% endfor %}
```
Now run Datasette like this:
$ datasette photos.db -m metadata.yaml --template-dir=templates/
Visiting `http://localhost:8001/recent-photos` will display 20 recent photos. Visiting `http://localhost:8001/random-photos` will display 20 photos randomly selected from your 5,000 most recent.
","
dogsheep-photos
Save details of your photos to a SQLite database and upload them to S3.
These tools are a work-in-progress mechanism for taking full ownership of your photos. The core idea is to help implement the following:
Every photo you have taken lives in a single, private Amazon S3 bucket
You have a single SQLite database file which stores metadata about those photos - potentially pulled from multiple different places. This may include EXIF data, Apple Photos, the results of running machine learning APIs against photos and much more besides.
You can then use Datasette to explore your own photos.
I'm a heavy user of Apple Photos so the initial releases of this tool will have a bias towards that, but ideally I would like a subset of these tools to be useful to people no matter which core photo solution they are using.
Installation
$ pip install dogsheep-photos
Authentication (if using S3)
If you want to use S3 to store your photos, you will need to first create S3 credentials for a new, dedicated bucket.
Run this command and paste in your credentials. You will need three values: the name of your S3 bucket, your Access key ID and your Secret access key.
$ dogsheep-photos s3-auth
This will create a file called auth.json in your current directory containing the required values. To save the file at a different path or filename, use the --auth=myauth.json option.
Uploading photos
Run this command to upload every photo in a specific directory to your S3 bucket:
The command will only upload photos that have not yet been uploaded, based on their sha256 hash.
photos.db will be created with an uploads table containing details of which files were uploaded.
To see what the command would do without uploading any files, use the --dry-run option.
The sha256 hash of the photo contents will be used as the name of the file in the bucket, with an extension matching the type of file. This is an implementation of the Content addressable storage pattern.
Importing Apple Photos metadata
The apple-photos command imports metadata from your Apple Photos library.
$ photo-to-sqlite apple-photos photos.db
Imported metadata includes places, people, albums, quality scores and machine learning labels for the photo contents.
Creating a subset database
You can create a new, subset database of photos using the create-subset command.
This is useful for creating a shareable SQLite database that only contains metadata for a selected set of photos.
Since photo metadata contains latitude and longitude you may not want to share a database that includes photos taken at your home address.
create-subset takes three arguments: an existing database file created using the apple-photos command, the name of the new, shareable database file you would like to create and a SQL query that returns the sha256 hash values of the photos you would like to include in that database.
For example, here's how to create a shareable database of just the photos that have been added to albums containing the word ""Public"":
$ dogsheep-photos create-subset \
photos.db \
public.db \
""select sha256 from apple_photos where albums like '%Public%'""
Serving photos locally with datasette-media
If you don't want to upload your photos to S3 but you still want to browse them using Datasette you can do so using the datasette-media plugin. This plugin adds the ability to serve images and other static files directly from disk, configured using a SQL query.
To use it, first install Datasette and the plugin:
$ pip install datasette datasette-media
If any of your photos are .HEIC images taken by an iPhone you should also install the optional pyheif dependency:
$ pip install pyheif
Now create a metadata.yaml file configuring the plugin:
plugins:
datasette-media:
thumbnail:
sql: |- select path as filepath, 200 as resize_height from apple_photos where uuid = :keylarge:
sql: |- select path as filepath, 1024 as resize_height from apple_photos where uuid = :key
This will configure two URL endpoints - one for 200 pixel high thumbnails and one for 1024 pixel high larger images.
Create your photos.db database using the apple-photos command, then run Datasette like this:
$ datasette -m metadata.yaml
Your photos will be served on URLs that look like this:
You can find the UUIDs for use in these URLs by running select uuid from photos_with_apple_metadata.
Displaying images using datasette-json-html
If you are using datasette-media to serve photos you can include images directly in Datasette query results using the datasette-json-html plugin.
Run pip install datasette-json-html to install the plugin, then use queries like this to view your images:
select
json_object(
'img_src',
'/-/media/thumbnail/'|| uuid
) as photo,
uuid,
datefrom
apple_photos
order bydatedesclimit10;
The photo column returned by this query should render as image tags that display the correct images.
Displaying images using custom template pages
Datasette's custom pages feature lets you create custom pages for a Datasette instance by dropping HTML templates into a templates/pages directory and then running Datasette using datasette --template-dir=templates/.
You can combine that ability with the datasette-template-sql plugin to create custom template pages that directly display photos served by datasette-media.
Install the plugin using pip install datasette-template-sql.
Create a templates/pages folder and add the following files:
recent-photos.html
<h1>Recent photos</h1>
<div>
{%forphotoinsql(""select * from apple_photos order by date desc limit 20"") %}
<imgsrc=""/-/media/photo/{{ photo['uuid'] }}"">
{%endfor%}
</div>
random-photos.html
<h1>Random photos</h1>
<div>
{%forphotoinsql(""with foo as (select * from apple_photos order by date desc limit 5000) select * from foo order by random() limit 20"") %}
<imgsrc=""/-/media/photo/{{ photo['uuid'] }}"">
{%endfor%}
</div>
Visiting http://localhost:8001/recent-photos will display 20 recent photos. Visiting http://localhost:8001/random-photos will display 20 photos randomly selected from your 5,000 most recent.
",1,public,0,,,
303218369,MDEwOlJlcG9zaXRvcnkzMDMyMTgzNjk=,evernote-to-sqlite,dogsheep/evernote-to-sqlite,0,53015001,https://github.com/dogsheep/evernote-to-sqlite,Tools for converting Evernote content to SQLite,0,2020-10-11T21:45:49Z,2021-08-26T19:01:54Z,2021-08-26T19:02:47Z,,51,24,24,Python,1,1,1,1,0,4,0,0,3,apache-2.0,"[""datasette-io"", ""datasette-tool"", ""dogsheep"", ""evernote"", ""sqlite""]",4,3,24,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,53015001,4,4,"# evernote-to-sqlite
[](https://pypi.org/project/evernote-to-sqlite/)
[](https://github.com/dogsheep/evernote-to-sqlite/releases)
[](https://github.com/dogsheep/evernote-to-sqlite/actions?query=workflow%3ATest)
[](https://github.com/dogsheep/evernote-to-sqlite/blob/master/LICENSE)
Tools for converting Evernote content to SQLite. See [Building an Evernote to SQLite exporter](https://simonwillison.net/2020/Oct/16/building-evernote-sqlite-exporter/) for background on this project.
## Installation
Install this tool using `pip`:
$ pip install evernote-to-sqlite
## Usage
Currently the only available command is `evernote-to-sqlite enex`, which converts Evernote's ENEX export files into a SQLite database.
You can create [an ENEX export](https://help.evernote.com/hc/en-us/articles/209005557-Export-notes-and-notebooks-as-ENEX-or-HTML) in the Evernote desktop application by selecting some notes (or all of your notes) and using the `File -> Export Notes...` menu option.
This used to be able to export everything in one go, but it looks like more recent Evernote versions only allow exporting up to fifty notes at a time, or let you export an entire notebook by right-clicking on the notebook and selecting ""Export notebook..."".
You can convert that file to SQLite like so:
$ evernote-to-sqlite enex evernote.db MyNotes.enex
This will display a progress bar and create a SQLite database file called `evernote.db`.
### Limitations
Unfortunately the ENEX export format does not include a unique identifier for each note. This means you cannot use this tool to re-import notes after they have been updated - you should consider this tool to be a one-time transformation of an ENEX file into an equivalent SQLite database.
ENEX exports also do not include details of which notebook a note belongs to.
## Development
To contribute to this tool, first checkout the code. Then create a new virtual environment:
cd evernote-to-sqlite
python -mvenv venv
source venv/bin/activate
Or if you are using `pipenv`:
pipenv shell
Now install the dependencies and tests:
pip install -e '.[test]'
To run the tests:
pytest
","
Currently the only available command is evernote-to-sqlite enex, which converts Evernote's ENEX export files into a SQLite database.
You can create an ENEX export in the Evernote desktop application by selecting some notes (or all of your notes) and using the File -> Export Notes... menu option.
This used to be able to export everything in one go, but it looks like more recent Evernote versions only allow exporting up to fifty notes at a time, or let you export an entire notebook by right-clicking on the notebook and selecting ""Export notebook..."".
This will display a progress bar and create a SQLite database file called evernote.db.
Limitations
Unfortunately the ENEX export format does not include a unique identifier for each note. This means you cannot use this tool to re-import notes after they have been updated - you should consider this tool to be a one-time transformation of an ENEX file into an equivalent SQLite database.
ENEX exports also do not include details of which notebook a note belongs to.
Development
To contribute to this tool, first checkout the code. Then create a new virtual environment:
cd evernote-to-sqlite
python -mvenv venv
source venv/bin/activate