home / content

releases

9 rows where repo = 508461227

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: prerelease, created_at (date), published_at (date)

id ▼ html_url author node_id tag_name target_commitish name draft prerelease created_at published_at body repo reactions mentions_count
70819259 https://github.com/simonw/s3-ocr/releases/tag/0.1a0 simonw 9599 RE_kwDOHk6Aq84EOJ27 0.1a0 main 0.1a0 0 1 2022-06-29T02:50:39Z 2022-06-29T02:53:10Z - `s3-ocr start <bucket>` command for triggering OCR runs using [Textract](https://aws.amazon.com/textract/) for every PDF file in a bucket. [#1](https://github.com/simonw/s3-ocr/issues/1) - `s3-ocr status <bucket>` command for checking on the status of the ongoing OCR tasks. s3-ocr 508461227    
70901710 https://github.com/simonw/s3-ocr/releases/tag/0.2a0 simonw 9599 RE_kwDOHk6Aq84EOd_O 0.2a0 main 0.2a0 0 1 2022-06-29T19:34:38Z 2022-06-29T19:35:21Z - New `s3-ocr index database.db name-of-bucket` command for creating a SQLite database containing the OCR results that have been written to the bucket. [#2](https://github.com/simonw/s3-ocr/issues/2) s3-ocr 508461227    
70924560 https://github.com/simonw/s3-ocr/releases/tag/0.3 simonw 9599 RE_kwDOHk6Aq84EOjkQ 0.3 main 0.3 0 0 2022-06-30T00:43:10Z 2022-06-30T00:44:10Z First non-alpha release. - Breaking change: the order of arguments for `s3-ocr index <bucket> <database_file>` has been swapped, for consistency with other commands. [#9](https://github.com/simonw/s3-ocr/issues/9) - Breaking change: the `start` command no longer defaults to processing every `.pdf` file in the bucket. It now accepts a list of keys, or use the `--all` option to process every PDF file. [#10](https://github.com/simonw/s3-ocr/issues/10) - New `s3-ocr fetch <bucket> <path>` command for fetching the raw OCR JSON data for that file. [#7](https://github.com/simonw/s3-ocr/issues/7) - New `s3-ocr text <bucket> <path>` command for outputting just the extracted OCR text for a specified file. [#8](https://github.com/simonw/s3-ocr/issues/8) s3-ocr 508461227    
71020693 https://github.com/simonw/s3-ocr/releases/tag/0.4 simonw 9599 RE_kwDOHk6Aq84EO7CV 0.4 main 0.4 0 0 2022-06-30T21:01:50Z 2022-06-30T21:03:44Z - New command: `s3-ocr inspect-job <job_id>` returns information about the status of a specific job. [#15](https://github.com/simonw/s3-ocr/issues/15) - Added a live demo at [s3-ocr-demo.datasette.io](https://s3-ocr-demo.datasette.io/). [#16](https://github.com/simonw/s3-ocr/issues/16) s3-ocr 508461227    
72282612 https://github.com/simonw/s3-ocr/releases/tag/0.5 simonw 9599 RE_kwDOHk6Aq84ETvH0 0.5 main 0.5 0 0 2022-07-19T02:31:21Z 2022-07-19T02:35:36Z - Ability to run OCR against just the PDF files contained within a specific folder in the S3 bucket, using `s3-ocr start my-bucket --prefix my-prefix/`. [#20](https://github.com/simonw/s3-ocr/issues/20) - New command: `s3-ocr dedupe my-bucket` - scans the bucket for any new files that are duplicates of files that have already been OCRd and writes out job results to reuse existing OCR results and avoid processing them a second time in the future. [#19](https://github.com/simonw/s3-ocr/issues/19) s3-ocr 508461227    
73837981 https://github.com/simonw/s3-ocr/releases/tag/0.6 simonw 9599 RE_kwDOHk6Aq84EZq2d 0.6 main 0.6 0 0 2022-08-07T17:38:41Z 2022-08-07T17:42:07Z - `s3-ocr start` now automatically pauses and then retries if Textract complains that there are too many jobs running. This can be turned into an early exit with an error message using the new `--no-retry` option. [#21](https://github.com/simonw/s3-ocr/issues/21) - New `s3-ocr start --dry-run` option for displaying what would happen without starting the OCR process. [#22](https://github.com/simonw/s3-ocr/issues/22) - Textract now runs in the same region as the S3 bucket it is writing to, avoiding an error. [#24](https://github.com/simonw/s3-ocr/issues/24) s3-ocr 508461227    
74032562 https://github.com/simonw/s3-ocr/releases/tag/0.6.1 simonw 9599 RE_kwDOHk6Aq84EaaWy 0.6.1 main 0.6.1 0 0 2022-08-09T19:39:05Z 2022-08-09T19:40:19Z - Now pins to `click>=8.0`, which should avoid a bug where installing this on a machine with an older version of Click present would lead to the commands failing to register. [#25](https://github.com/simonw/s3-ocr/issues/25) - `s3-ocr --help` now includes links to the documentation and changelog. s3-ocr 508461227    
74037307 https://github.com/simonw/s3-ocr/releases/tag/0.6.2 simonw 9599 RE_kwDOHk6Aq84Eabg7 0.6.2 main 0.6.2 0 0 2022-08-09T20:34:49Z 2022-08-09T20:35:32Z - Fixed bug where commands were sometimes not properly registered. [#26](https://github.com/simonw/s3-ocr/issues/26) s3-ocr 508461227    
74060956 https://github.com/simonw/s3-ocr/releases/tag/0.6.3 simonw 9599 RE_kwDOHk6Aq84EahSc 0.6.3 main 0.6.3 0 0 2022-08-10T04:42:11Z 2022-08-10T04:43:17Z - Pages with no OCR text on them are now recorded as rows with empty strings, instead of being skipped entirely. [#23](https://github.com/simonw/s3-ocr/issues/23) s3-ocr 508461227    

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [releases] (
   [html_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [author] INTEGER REFERENCES [users]([id]),
   [node_id] TEXT,
   [tag_name] TEXT,
   [target_commitish] TEXT,
   [name] TEXT,
   [draft] INTEGER,
   [prerelease] INTEGER,
   [created_at] TEXT,
   [published_at] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id])
, [reactions] TEXT, [mentions_count] INTEGER);
CREATE INDEX [idx_releases_repo]
    ON [releases] ([repo]);
CREATE INDEX [idx_releases_author]
    ON [releases] ([author]);
Powered by Datasette · Queries took 25.773ms