id,node_id,name,full_name,private,owner,html_url,description,fork,created_at,updated_at,pushed_at,homepage,size,stargazers_count,watchers_count,language,has_issues,has_projects,has_downloads,has_wiki,has_pages,forks_count,archived,disabled,open_issues_count,license,topics,forks,open_issues,watchers,default_branch,permissions,temp_clone_token,organization,network_count,subscribers_count,readme,readme_html,allow_forking,visibility,is_template,template_repository,web_commit_signoff_required,has_discussions
175321497,MDEwOlJlcG9zaXRvcnkxNzUzMjE0OTc=,csv-diff,simonw/csv-diff,0,9599,https://github.com/simonw/csv-diff,Python CLI tool and library for diffing CSV and JSON files,0,2019-03-13T01:11:26Z,2022-07-29T20:01:02Z,2022-07-29T20:00:59Z,,34,198,198,Python,1,1,1,1,0,29,0,0,18,apache-2.0,"[""click"", ""csv"", ""datasette-io"", ""datasette-tool"", ""diff"", ""git-scraping""]",29,18,198,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,,29,7,"# csv-diff
[](https://pypi.org/project/csv-diff/)
[](https://github.com/simonw/csv-diff/releases)
[](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)
[](https://github.com/simonw/csv-diff/blob/main/LICENSE)
Tool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco’s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.
## Installation
pip install csv-diff
## Usage
Consider two CSV files:
`one.csv`
id,name,age
1,Cleo,4
2,Pancakes,2
`two.csv`
id,name,age
1,Cleo,5
3,Bailey,1
`csv-diff` can show a human-readable summary of differences between the files:
$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed
1 row changed
Row 1
age: ""4"" => ""5""
1 row added
id: 3
name: Bailey
age: 1
1 row removed
id: 2
name: Pancakes
age: 2
The `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.
The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.
You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.
Use `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:
% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed
id: 1
age: ""4"" => ""5""
Unchanged:
name: ""Cleo""
You can use the `--json` option to get a machine-readable difference:
$ csv-diff one.csv two.csv --key=id --json
{
""added"": [
{
""id"": ""3"",
""name"": ""Bailey"",
""age"": ""1""
}
],
""removed"": [
{
""id"": ""2"",
""name"": ""Pancakes"",
""age"": ""2""
}
],
""changed"": [
{
""key"": ""1"",
""changes"": {
""age"": [
""4"",
""5""
]
}
}
],
""columns_added"": [],
""columns_removed"": []
}
## As a Python library
You can also import the Python library into your own code like so:
from csv_diff import load_csv, compare
diff = compare(
load_csv(open(""one.csv""), key=""id""),
load_csv(open(""two.csv""), key=""id"")
)
`diff` will now contain the same data structure as the output in the `--json` example above.
If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.
## As a Docker container
### Build the image
$ docker build -t csvdiff .
### Run the container
$ docker run --rm -v $(pwd):/files csvdiff
Suppose current directory contains two csv files : one.csv two.csv
$ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv
## Alternatives
- [csvdiff](https://github.com/aswinkarthik/csvdiff) is a ""fast diff tool for comparing CSV files"" - you may get better results from this than from `csv-diff` against larger files.
","
The --key=id option means that the id column should be treated as the unique key, to identify which records have changed.
The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using --format=tsv or --format=csv.
You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use --format=json if your input files are JSON.
Use --show-unchanged to include full details of the unchanged values for rows with at least one change in the diff output: