datasette-atom

Datasette plugin that adds support for generating Atom feeds with the results of a SQL query.

Installation

Install this plugin in the same environment as Datasette to enable the .atom output extension.

$ pip install datasette-atom

Usage

To create an Atom feed you need to define a custom SQL query that returns a required set of columns:

atom_id - a unique ID for each row. This article has suggestions about ways to create these IDs.
atom_title - a title for that row.
atom_updated - an RFC 3339 timestamp representing the last time the entry was modified in a significant way. This can usually be the time that the row was created.

The following columns are optional:

atom_content - content that should be shown in the feed. This will be treated as a regular string, so any embedded HTML tags will be escaped when they are displayed.
atom_content_html - content that should be shown in the feed. This will be treated as an HTML string, and will be sanitized using Bleach to ensure it does not have any malicious code in it before being returned as part of a <content type=""html""> Atom element. If both are provided, this will be used in place of atom_content.
atom_link - a URL that should be used as the link that the feed entry points to.
atom_author_name - the name of the author of the entry. If you provide this you can also provide atom_author_uri and atom_author_email with a URL and e-mail address for that author.

A query that returns these columns can then be returned as an Atom feed by adding the .atom extension.

Example

Here is an example SQL query which generates an Atom feed for new entries on www.niche-museums.com:

select
  'tag:niche-museums.com,' || substr(created, 0, 11) || ':' || id as atom_id,
  name as atom_title,
  created as atom_updated,
  'https://www.niche-museums.com/browse/museums/' || id as atom_link,
  coalesce(
    '<img src=""' || photo_url || '?w=800&amp;h=400&amp;fit=crop&amp;auto=compress"">',
    ''
  ) || '<p>' || description || '</p>' as atom_content_html
from
  museums
order by
  created desc
limit
  15

You can try this query by pasting it in here - then click the .atom link to see it as an Atom feed.

Using a canned query

Datasette's canned query mechanism is a useful way to configure feeds. If a canned query definition has a title that will be used as the title of the Atom feed.

Here's an example, defined using a metadata.yaml file:

databases:
  browse:
    queries:
      feed:
        title: Niche Museums
        sql: |-
          select
            'tag:niche-museums.com,' || substr(created, 0, 11) || ':' || id as atom_id,
            name as atom_title,
            created as atom_updated,
            'https://www.niche-museums.com/browse/museums/' || id as atom_link,
            coalesce(
              '<img src=""' || photo_url || '?w=800&amp;h=400&amp;fit=crop&amp;auto=compress"">',
              ''
            ) || '<p>' || description || '</p>' as atom_content_html
          from
            museums
          order by
            created desc
          limit
            15

Disabling HTML filtering

The HTML allow-list used by Bleach for the atom_content_html column can be found in the clean(html) function at the bottom of datasette_atom/init.py.

You can disable Bleach entirely for Atom feeds generated using a canned query. You should only do this if you are certain that no user-provided HTML could be included in that value.

Here's how to do that in metadata.json:

{
  ""plugins"": {
    ""datasette-atom"": {
      ""allow_unsafe_html_in_canned_queries"": true
    }
  }
}

Setting this to true will disable Bleach filtering for all canned queries across all databases.

You can disable Bleach filtering just for a specific list of canned queries like so:

{
  ""plugins"": {
    ""datasette-atom"": {
      ""allow_unsafe_html_in_canned_queries"": {
        ""museums"": [""latest"", ""moderation""]
      }
    }
  }
}

This will disable Bleach just for the canned queries called latest and moderation in the museums.db database.

shapefile-to-sqlite

Load shapefiles into a SQLite (optionally SpatiaLite) database.

Project background: Things I learned about shapefiles building shapefile-to-sqlite

How to install

$ pip install shapefile-to-sqlite

How to use

You can run this tool against a shapefile file like so:

$ shapefile-to-sqlite my.db features.shp

This will load the geometries as GeoJSON in a text column.

Using with SpatiaLite

If you have SpatiaLite available you can load them as SpatiaLite geometries like this:

$ shapefile-to-sqlite my.db features.shp --spatialite

The data will be loaded into a table called features - based on the name of the shapefile. You can specify an alternative table name using --table:

$ shapefile-to-sqlite my.db features.shp --table=places --spatialite

The tool will search for the SpatiaLite module in the following locations:

/usr/lib/x86_64-linux-gnu/mod_spatialite.so
/usr/local/lib/mod_spatialite.dylib

If you have installed the module in another location, you can use the --spatialite_mod=xxx option to specify where:

$ shapefile-to-sqlite my.db features.shp \
    --spatialite_mod=/usr/lib/mod_spatialite.dylib

You can use the --spatial-index option to create a spatial index on the geometry column:

$ shapefile-to-sqlite my.db features.shp --spatial-index

You can omit --spatialite if you use either --spatialite-mod or --spatial-index.

Projections

By default, this tool will attempt to convert geometries in the shapefile to the WGS 84 projection, for best conformance with the GeoJSON specification.

If you want it to leave the data in whatever projection was used by the shapefile, use the --crs=keep option.

You can convert the data to another output projection by passing it to the --crs option. For example, to convert to EPSG:2227 (California zone 3) use --crs=espg:2227.

The full list of formats accepted by the --crs option is documented here.

Extracting columns

If your data contains columns with a small number of heavily duplicated values - the names of specific agencies responsible for parcels of land for example - you can extract those columns into separate lookup tables referenced by foreign keys using the -c option:

$ shapefile-to-sqlite my.db features.shp -c agency

This will create a agency table with id and name columns, and will create the agency column in your main table as an integer foreign key reference to that table.

The -c option can be used multiple times.

CPAD_2020a_Units is an example of a table created using the -c option.