name,summary,classifiers,description,author,author_email,description_content_type,home_page,keywords,license,maintainer,maintainer_email,package_url,platform,project_url,project_urls,release_url,requires_dist,requires_python,version,yanked,yanked_reason csv-diff,Python CLI tool and library for diffing CSV and JSON files,"[""Development Status :: 4 - Beta"", ""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.6"", ""Programming Language :: Python :: 3.7""]","# csv-diff [![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/) [![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases) [![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE) Tool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco’s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project. ## Installation pip install csv-diff ## Usage Consider two CSV files: `one.csv` id,name,age 1,Cleo,4 2,Pancakes,2 `two.csv` id,name,age 1,Cleo,5 3,Bailey,1 `csv-diff` can show a human-readable summary of differences between the files: $ csv-diff one.csv two.csv --key=id 1 row changed, 1 row added, 1 row removed 1 row changed Row 1 age: ""4"" => ""5"" 1 row added id: 3 name: Bailey age: 1 1 row removed id: 2 name: Pancakes age: 2 The `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed. The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`. You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON. Use `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output: % csv-diff one.csv two.csv --key=id --show-unchanged 1 row changed id: 1 age: ""4"" => ""5"" Unchanged: name: ""Cleo"" You can use the `--json` option to get a machine-readable difference: $ csv-diff one.csv two.csv --key=id --json { ""added"": [ { ""id"": ""3"", ""name"": ""Bailey"", ""age"": ""1"" } ], ""removed"": [ { ""id"": ""2"", ""name"": ""Pancakes"", ""age"": ""2"" } ], ""changed"": [ { ""key"": ""1"", ""changes"": { ""age"": [ ""4"", ""5"" ] } } ], ""columns_added"": [], ""columns_removed"": [] } ## As a Python library You can also import the Python library into your own code like so: from csv_diff import load_csv, compare diff = compare( load_csv(open(""one.csv""), key=""id""), load_csv(open(""two.csv""), key=""id"") ) `diff` will now contain the same data structure as the output in the `--json` example above. If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows. ",Simon Willison,,text/markdown,https://github.com/simonw/csv-diff,,"Apache License, Version 2.0",,,https://pypi.org/project/csv-diff/,,https://pypi.org/project/csv-diff/,"{""Homepage"": ""https://github.com/simonw/csv-diff""}",https://pypi.org/project/csv-diff/1.1/,"[""click"", ""dictdiffer"", ""pytest ; extra == 'test'""]",,1.1,0, datasette,An open source multi-tool for exploring and publishing data,"[""Development Status :: 4 - Beta"", ""Framework :: Datasette"", ""Intended Audience :: Developers"", ""Intended Audience :: End Users/Desktop"", ""Intended Audience :: Science/Research"", ""License :: OSI Approved :: Apache Software License"", ""Programming Language :: Python :: 3.10"", ""Programming Language :: Python :: 3.7"", ""Programming Language :: Python :: 3.8"", ""Programming Language :: Python :: 3.9"", ""Topic :: Database""]"," [![PyPI](https://img.shields.io/pypi/v/datasette.svg)](https://pypi.org/project/datasette/) [![Changelog](https://img.shields.io/github/v/release/simonw/datasette?label=changelog)](https://docs.datasette.io/en/stable/changelog.html) [![Python 3.x](https://img.shields.io/pypi/pyversions/datasette.svg?logo=python&logoColor=white)](https://pypi.org/project/datasette/) [![Tests](https://github.com/simonw/datasette/workflows/Test/badge.svg)](https://github.com/simonw/datasette/actions?query=workflow%3ATest) [![Documentation Status](https://readthedocs.org/projects/datasette/badge/?version=latest)](https://docs.datasette.io/en/latest/?badge=latest) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette/blob/main/LICENSE) [![docker: datasette](https://img.shields.io/badge/docker-datasette-blue)](https://hub.docker.com/r/datasetteproject/datasette) [![discord](https://img.shields.io/discord/823971286308356157?label=discord)](https://discord.gg/ktd74dm5mw) *An open source multi-tool for exploring and publishing data* Datasette is a tool for exploring and publishing data. It helps people take data of any shape or size and publish that as an interactive, explorable website and accompanying API. Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. [Explore a demo](https://global-power-plants.datasettes.com/global-power-plants/global-power-plants), watch [a video about the project](https://simonwillison.net/2021/Feb/7/video/) or try it out by [uploading and publishing your own CSV data](https://docs.datasette.io/en/stable/getting_started.html#try-datasette-without-installing-anything-using-glitch). * [datasette.io](https://datasette.io/) is the official project website * Latest [Datasette News](https://datasette.io/news) * Comprehensive documentation: https://docs.datasette.io/ * Examples: https://datasette.io/examples * Live demo of current `main` branch: https://latest.datasette.io/ * Questions, feedback or want to talk about the project? Join our [Discord](https://discord.gg/ktd74dm5mw) Want to stay up-to-date with the project? Subscribe to the [Datasette newsletter](https://datasette.substack.com/) for tips, tricks and news on what's new in the Datasette ecosystem. ## Installation If you are on a Mac, [Homebrew](https://brew.sh/) is the easiest way to install Datasette: brew install datasette You can also install it using `pip` or `pipx`: pip install datasette Datasette requires Python 3.7 or higher. We also have [detailed installation instructions](https://docs.datasette.io/en/stable/installation.html) covering other options such as Docker. ## Basic usage datasette serve path/to/database.db This will start a web server on port 8001 - visit http://localhost:8001/ to access the web interface. `serve` is the default subcommand, you can omit it if you like. Use Chrome on OS X? You can run datasette against your browser history like so: datasette ~/Library/Application\ Support/Google/Chrome/Default/History --nolock Now visiting http://localhost:8001/History/downloads will show you a web interface to browse your downloads data: ![Downloads table rendered by datasette](https://static.simonwillison.net/static/2017/datasette-downloads.png) ## metadata.json If you want to include licensing and source information in the generated datasette website you can do so using a JSON file that looks something like this: { ""title"": ""Five Thirty Eight"", ""license"": ""CC Attribution 4.0 License"", ""license_url"": ""http://creativecommons.org/licenses/by/4.0/"", ""source"": ""fivethirtyeight/data on GitHub"", ""source_url"": ""https://github.com/fivethirtyeight/data"" } Save this in `metadata.json` and run Datasette like so: datasette serve fivethirtyeight.db -m metadata.json The license and source information will be displayed on the index page and in the footer. They will also be included in the JSON produced by the API. ## datasette publish If you have [Heroku](https://heroku.com/) or [Google Cloud Run](https://cloud.google.com/run/) configured, Datasette can deploy one or more SQLite databases to the internet with a single command: datasette publish heroku database.db Or: datasette publish cloudrun database.db This will create a docker image containing both the datasette application and the specified SQLite database files. It will then deploy that image to Heroku or Cloud Run and give you a URL to access the resulting website and API. See [Publishing data](https://docs.datasette.io/en/stable/publish.html) in the documentation for more details. ## Datasette Lite [Datasette Lite](https://lite.datasette.io/) is Datasette packaged using WebAssembly so that it runs entirely in your browser, no Python web application server required. Read more about that in the [Datasette Lite documentation](https://github.com/simonw/datasette-lite/blob/main/README.md). ",Simon Willison,,text/markdown,https://datasette.io/,,"Apache License, Version 2.0",,,https://pypi.org/project/datasette/,,https://pypi.org/project/datasette/,"{""CI"": ""https://github.com/simonw/datasette/actions?query=workflow%3ATest"", ""Changelog"": ""https://docs.datasette.io/en/stable/changelog.html"", ""Documentation"": ""https://docs.datasette.io/en/stable/"", ""Homepage"": ""https://datasette.io/"", ""Issues"": ""https://github.com/simonw/datasette/issues"", ""Live demo"": ""https://latest.datasette.io/"", ""Source code"": ""https://github.com/simonw/datasette""}",https://pypi.org/project/datasette/0.63.1/,"[""asgiref (>=3.2.10)"", ""click (>=7.1.1)"", ""click-default-group-wheel (>=1.2.2)"", ""Jinja2 (>=2.10.3)"", ""hupper (>=1.9)"", ""httpx (>=0.20)"", ""pint (>=0.9)"", ""pluggy (>=1.0)"", ""uvicorn (>=0.11)"", ""aiofiles (>=0.4)"", ""janus (>=0.6.2)"", ""asgi-csrf (>=0.9)"", ""PyYAML (>=5.3)"", ""mergedeep (>=1.1.1)"", ""itsdangerous (>=1.1)"", ""furo (==2022.9.29) ; extra == 'docs'"", ""sphinx-autobuild ; extra == 'docs'"", ""codespell ; extra == 'docs'"", ""blacken-docs ; extra == 'docs'"", ""sphinx-copybutton ; extra == 'docs'"", ""rich ; extra == 'rich'"", ""pytest (>=5.2.2) ; extra == 'test'"", ""pytest-xdist (>=2.2.1) ; extra == 'test'"", ""pytest-asyncio (>=0.17) ; extra == 'test'"", ""beautifulsoup4 (>=4.8.1) ; extra == 'test'"", ""black (==22.10.0) ; extra == 'test'"", ""blacken-docs (==1.12.1) ; extra == 'test'"", ""pytest-timeout (>=1.4.2) ; extra == 'test'"", ""trustme (>=0.7) ; extra == 'test'"", ""cogapp (>=3.3.0) ; extra == 'test'""]",>=3.7,0.63.1,0,