id,node_id,name,full_name,private,owner,html_url,description,fork,created_at,updated_at,pushed_at,homepage,size,stargazers_count,watchers_count,language,has_issues,has_projects,has_downloads,has_wiki,has_pages,forks_count,archived,disabled,open_issues_count,license,topics,forks,open_issues,watchers,default_branch,permissions,temp_clone_token,organization,network_count,subscribers_count,readme,readme_html,allow_forking,visibility,is_template,template_repository,web_commit_signoff_required,has_discussions
175321497,MDEwOlJlcG9zaXRvcnkxNzUzMjE0OTc=,csv-diff,simonw/csv-diff,0,9599,https://github.com/simonw/csv-diff,Python CLI tool and library for diffing CSV and JSON files,0,2019-03-13T01:11:26Z,2022-07-29T20:01:02Z,2022-07-29T20:00:59Z,,34,198,198,Python,1,1,1,1,0,29,0,0,18,apache-2.0,"[""click"", ""csv"", ""datasette-io"", ""datasette-tool"", ""diff"", ""git-scraping""]",29,18,198,main,"{""admin"": false, ""maintain"": false, ""push"": false, ""triage"": false, ""pull"": false}",,,29,7,"# csv-diff

[![PyPI](https://img.shields.io/pypi/v/csv-diff.svg)](https://pypi.org/project/csv-diff/)
[![Changelog](https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&label=changelog)](https://github.com/simonw/csv-diff/releases)
[![Tests](https://github.com/simonw/csv-diff/workflows/Test/badge.svg)](https://github.com/simonw/csv-diff/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/csv-diff/blob/main/LICENSE)

Tool for viewing the difference between two CSV, TSV or JSON files. See [Generating a commit log for San Francisco’s official list of trees](https://simonwillison.net/2019/Mar/13/tree-history/) (and the [sf-tree-history repo commit log](https://github.com/simonw/sf-tree-history/commits)) for background information on this project.

## Installation

    pip install csv-diff

## Usage

Consider two CSV files:

`one.csv`

    id,name,age
    1,Cleo,4
    2,Pancakes,2

`two.csv`

    id,name,age
    1,Cleo,5
    3,Bailey,1

`csv-diff` can show a human-readable summary of differences between the files:

    $ csv-diff one.csv two.csv --key=id
    1 row changed, 1 row added, 1 row removed

    1 row changed

      Row 1
        age: ""4"" => ""5""

    1 row added

      id: 3
      name: Bailey
      age: 1

    1 row removed

      id: 2
      name: Pancakes
      age: 2

The `--key=id` option means that the `id` column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using `--format=tsv` or `--format=csv`.

You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use `--format=json` if your input files are JSON.

Use `--show-unchanged` to include full details of the unchanged values for rows with at least one change in the diff output:

    % csv-diff one.csv two.csv --key=id --show-unchanged
    1 row changed

      id: 1
        age: ""4"" => ""5""

        Unchanged:
          name: ""Cleo""

You can use the `--json` option to get a machine-readable difference:

    $ csv-diff one.csv two.csv --key=id --json
    {
        ""added"": [
            {
                ""id"": ""3"",
                ""name"": ""Bailey"",
                ""age"": ""1""
            }
        ],
        ""removed"": [
            {
                ""id"": ""2"",
                ""name"": ""Pancakes"",
                ""age"": ""2""
            }
        ],
        ""changed"": [
            {
                ""key"": ""1"",
                ""changes"": {
                    ""age"": [
                        ""4"",
                        ""5""
                    ]
                }
            }
        ],
        ""columns_added"": [],
        ""columns_removed"": []
    }

## As a Python library

You can also import the Python library into your own code like so:

    from csv_diff import load_csv, compare
    diff = compare(
        load_csv(open(""one.csv""), key=""id""),
        load_csv(open(""two.csv""), key=""id"")
    )

`diff` will now contain the same data structure as the output in the `--json` example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.

## As a Docker container

### Build the image

    $ docker build -t csvdiff .

### Run the container

    $ docker run --rm -v $(pwd):/files csvdiff

Suppose current directory contains two csv files : one.csv two.csv

    $ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv
    
## Alternatives

- [csvdiff](https://github.com/aswinkarthik/csvdiff) is a ""fast diff tool for comparing CSV files"" - you may get better results from this than from `csv-diff` against larger files.
","<div id=""readme"" class=""md"" data-path=""README.md""><article class=""markdown-body entry-content container-lg"" itemprop=""text""><h1 dir=""auto""><a id=""user-content-csv-diff"" class=""anchor"" aria-hidden=""true"" href=""#user-content-csv-diff""><svg class=""octicon octicon-link"" viewBox=""0 0 16 16"" version=""1.1"" width=""16"" height=""16"" aria-hidden=""true""><path fill-rule=""evenodd"" d=""M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z""></path></svg></a>csv-diff</h1>
<p dir=""auto""><a href=""https://pypi.org/project/csv-diff/"" rel=""nofollow""><img src=""https://camo.githubusercontent.com/75784b8c5ee65df6e894c25c0efe54d360e3b6a33714da9aa0c6bb86ede1f153/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f6373762d646966662e737667"" alt=""PyPI"" data-canonical-src=""https://img.shields.io/pypi/v/csv-diff.svg"" style=""max-width: 100%;""></a>
<a href=""https://github.com/simonw/csv-diff/releases""><img src=""https://camo.githubusercontent.com/a8ece0f4436cb61b1524e3722c02363faf7aa45fe7e62f6634604f2c30421517/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f72656c656173652f73696d6f6e772f6373762d646966663f696e636c7564655f70726572656c6561736573266c6162656c3d6368616e67656c6f67"" alt=""Changelog"" data-canonical-src=""https://img.shields.io/github/v/release/simonw/csv-diff?include_prereleases&amp;label=changelog"" style=""max-width: 100%;""></a>
<a href=""https://github.com/simonw/csv-diff/actions?query=workflow%3ATest""><img src=""https://github.com/simonw/csv-diff/workflows/Test/badge.svg"" alt=""Tests"" style=""max-width: 100%;""></a>
<a href=""https://github.com/simonw/csv-diff/blob/main/LICENSE""><img src=""https://camo.githubusercontent.com/1698104e976c681143eb0841f9675c6f802bb7aa832afc0c7a4e719b1f3cf955/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d417061636865253230322e302d626c75652e737667"" alt=""License"" data-canonical-src=""https://img.shields.io/badge/license-Apache%202.0-blue.svg"" style=""max-width: 100%;""></a></p>
<p dir=""auto"">Tool for viewing the difference between two CSV, TSV or JSON files. See <a href=""https://simonwillison.net/2019/Mar/13/tree-history/"" rel=""nofollow"">Generating a commit log for San Francisco’s official list of trees</a> (and the <a href=""https://github.com/simonw/sf-tree-history/commits"">sf-tree-history repo commit log</a>) for background information on this project.</p>
<h2 dir=""auto""><a id=""user-content-installation"" class=""anchor"" aria-hidden=""true"" href=""#user-content-installation""><svg class=""octicon octicon-link"" viewBox=""0 0 16 16"" version=""1.1"" width=""16"" height=""16"" aria-hidden=""true""><path fill-rule=""evenodd"" d=""M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z""></path></svg></a>Installation</h2>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""pip install csv-diff""><pre class=""notranslate""><code>pip install csv-diff
</code></pre></div>
<h2 dir=""auto""><a id=""user-content-usage"" class=""anchor"" aria-hidden=""true"" href=""#user-content-usage""><svg class=""octicon octicon-link"" viewBox=""0 0 16 16"" version=""1.1"" width=""16"" height=""16"" aria-hidden=""true""><path fill-rule=""evenodd"" d=""M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z""></path></svg></a>Usage</h2>
<p dir=""auto"">Consider two CSV files:</p>
<p dir=""auto""><code>one.csv</code></p>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""id,name,age
1,Cleo,4
2,Pancakes,2""><pre class=""notranslate""><code>id,name,age
1,Cleo,4
2,Pancakes,2
</code></pre></div>
<p dir=""auto""><code>two.csv</code></p>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""id,name,age
1,Cleo,5
3,Bailey,1""><pre class=""notranslate""><code>id,name,age
1,Cleo,5
3,Bailey,1
</code></pre></div>
<p dir=""auto""><code>csv-diff</code> can show a human-readable summary of differences between the files:</p>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed

1 row changed

  Row 1
    age: &quot;4&quot; =&gt; &quot;5&quot;

1 row added

  id: 3
  name: Bailey
  age: 1

1 row removed

  id: 2
  name: Pancakes
  age: 2""><pre class=""notranslate""><code>$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed

1 row changed

  Row 1
    age: ""4"" =&gt; ""5""

1 row added

  id: 3
  name: Bailey
  age: 1

1 row removed

  id: 2
  name: Pancakes
  age: 2
</code></pre></div>
<p dir=""auto"">The <code>--key=id</code> option means that the <code>id</code> column should be treated as the unique key, to identify which records have changed.</p>
<p dir=""auto"">The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using <code>--format=tsv</code> or <code>--format=csv</code>.</p>
<p dir=""auto"">You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use <code>--format=json</code> if your input files are JSON.</p>
<p dir=""auto"">Use <code>--show-unchanged</code> to include full details of the unchanged values for rows with at least one change in the diff output:</p>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed

  id: 1
    age: &quot;4&quot; =&gt; &quot;5&quot;

    Unchanged:
      name: &quot;Cleo&quot;""><pre class=""notranslate""><code>% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed

  id: 1
    age: ""4"" =&gt; ""5""

    Unchanged:
      name: ""Cleo""
</code></pre></div>
<p dir=""auto"">You can use the <code>--json</code> option to get a machine-readable difference:</p>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""$ csv-diff one.csv two.csv --key=id --json
{
    &quot;added&quot;: [
        {
            &quot;id&quot;: &quot;3&quot;,
            &quot;name&quot;: &quot;Bailey&quot;,
            &quot;age&quot;: &quot;1&quot;
        }
    ],
    &quot;removed&quot;: [
        {
            &quot;id&quot;: &quot;2&quot;,
            &quot;name&quot;: &quot;Pancakes&quot;,
            &quot;age&quot;: &quot;2&quot;
        }
    ],
    &quot;changed&quot;: [
        {
            &quot;key&quot;: &quot;1&quot;,
            &quot;changes&quot;: {
                &quot;age&quot;: [
                    &quot;4&quot;,
                    &quot;5&quot;
                ]
            }
        }
    ],
    &quot;columns_added&quot;: [],
    &quot;columns_removed&quot;: []
}""><pre class=""notranslate""><code>$ csv-diff one.csv two.csv --key=id --json
{
    ""added"": [
        {
            ""id"": ""3"",
            ""name"": ""Bailey"",
            ""age"": ""1""
        }
    ],
    ""removed"": [
        {
            ""id"": ""2"",
            ""name"": ""Pancakes"",
            ""age"": ""2""
        }
    ],
    ""changed"": [
        {
            ""key"": ""1"",
            ""changes"": {
                ""age"": [
                    ""4"",
                    ""5""
                ]
            }
        }
    ],
    ""columns_added"": [],
    ""columns_removed"": []
}
</code></pre></div>
<h2 dir=""auto""><a id=""user-content-as-a-python-library"" class=""anchor"" aria-hidden=""true"" href=""#user-content-as-a-python-library""><svg class=""octicon octicon-link"" viewBox=""0 0 16 16"" version=""1.1"" width=""16"" height=""16"" aria-hidden=""true""><path fill-rule=""evenodd"" d=""M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z""></path></svg></a>As a Python library</h2>
<p dir=""auto"">You can also import the Python library into your own code like so:</p>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""from csv_diff import load_csv, compare
diff = compare(
    load_csv(open(&quot;one.csv&quot;), key=&quot;id&quot;),
    load_csv(open(&quot;two.csv&quot;), key=&quot;id&quot;)
)""><pre class=""notranslate""><code>from csv_diff import load_csv, compare
diff = compare(
    load_csv(open(""one.csv""), key=""id""),
    load_csv(open(""two.csv""), key=""id"")
)
</code></pre></div>
<p dir=""auto""><code>diff</code> will now contain the same data structure as the output in the <code>--json</code> example above.</p>
<p dir=""auto"">If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.</p>
<h2 dir=""auto""><a id=""user-content-as-a-docker-container"" class=""anchor"" aria-hidden=""true"" href=""#user-content-as-a-docker-container""><svg class=""octicon octicon-link"" viewBox=""0 0 16 16"" version=""1.1"" width=""16"" height=""16"" aria-hidden=""true""><path fill-rule=""evenodd"" d=""M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z""></path></svg></a>As a Docker container</h2>
<h3 dir=""auto""><a id=""user-content-build-the-image"" class=""anchor"" aria-hidden=""true"" href=""#user-content-build-the-image""><svg class=""octicon octicon-link"" viewBox=""0 0 16 16"" version=""1.1"" width=""16"" height=""16"" aria-hidden=""true""><path fill-rule=""evenodd"" d=""M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z""></path></svg></a>Build the image</h3>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""$ docker build -t csvdiff .""><pre class=""notranslate""><code>$ docker build -t csvdiff .
</code></pre></div>
<h3 dir=""auto""><a id=""user-content-run-the-container"" class=""anchor"" aria-hidden=""true"" href=""#user-content-run-the-container""><svg class=""octicon octicon-link"" viewBox=""0 0 16 16"" version=""1.1"" width=""16"" height=""16"" aria-hidden=""true""><path fill-rule=""evenodd"" d=""M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z""></path></svg></a>Run the container</h3>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""$ docker run --rm -v $(pwd):/files csvdiff""><pre class=""notranslate""><code>$ docker run --rm -v $(pwd):/files csvdiff
</code></pre></div>
<p dir=""auto"">Suppose current directory contains two csv files : one.csv two.csv</p>
<div class=""snippet-clipboard-content notranslate position-relative overflow-auto"" data-snippet-clipboard-copy-content=""$ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv""><pre class=""notranslate""><code>$ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv
</code></pre></div>
<h2 dir=""auto""><a id=""user-content-alternatives"" class=""anchor"" aria-hidden=""true"" href=""#user-content-alternatives""><svg class=""octicon octicon-link"" viewBox=""0 0 16 16"" version=""1.1"" width=""16"" height=""16"" aria-hidden=""true""><path fill-rule=""evenodd"" d=""M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z""></path></svg></a>Alternatives</h2>
<ul dir=""auto"">
<li><a href=""https://github.com/aswinkarthik/csvdiff"">csvdiff</a> is a ""fast diff tool for comparing CSV files"" - you may get better results from this than from <code>csv-diff</code> against larger files.</li>
</ul>
</article></div>",1,public,0,,0,