home / content / repos

repos: 274293597

This data as json

id node_id name full_name private owner html_url description fork created_at updated_at pushed_at homepage size stargazers_count watchers_count language has_issues has_projects has_downloads has_wiki has_pages forks_count archived disabled open_issues_count license topics forks open_issues watchers default_branch permissions temp_clone_token organization network_count subscribers_count readme readme_html allow_forking visibility is_template template_repository web_commit_signoff_required has_discussions
274293597 MDEwOlJlcG9zaXRvcnkyNzQyOTM1OTc= datasette-block-robots simonw/datasette-block-robots 0 9599 https://github.com/simonw/datasette-block-robots Datasette plugin that blocks robots and crawlers using robots.txt 0 2020-06-23T02:52:23Z 2022-08-30T16:13:40Z 2022-08-30T16:25:38Z https://datasette.io/plugins/datasette-block-robots 21 2 2 Python 1 1 1 1 0 0 0 0 0   ["datasette", "datasette-io", "datasette-plugin", "robots-txt"] 0 0 2 main {"admin": false, "maintain": false, "push": false, "triage": false, "pull": false}     0 2 # datasette-block-robots [![PyPI](https://img.shields.io/pypi/v/datasette-block-robots.svg)](https://pypi.org/project/datasette-block-robots/) [![Changelog](https://img.shields.io/github/v/release/simonw/datasette-block-robots?label=changelog)](https://github.com/simonw/datasette-block-robots/releases) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette-block-robots/blob/master/LICENSE) Datasette plugin that blocks robots and crawlers using robots.txt ## Installation Install this plugin in the same environment as Datasette. $ pip install datasette-block-robots ## Usage Having installed the plugin, `/robots.txt` on your Datasette instance will return the following: User-agent: * Disallow: / This will request all robots and crawlers not to visit any of the pages on your site. Here's a demo of the plugin in action: https://sqlite-generate-demo.datasette.io/robots.txt ## Configuration By default the plugin will block all access to the site, using `Disallow: /`. If you want the index page to be indexed by search engines without crawling the database, table or row pages themselves, you can use the following: ```json { "plugins": { "datasette-block-robots": { "allow_only_index": true } } } ``` This will return a `/robots.txt` like so: User-agent: * Disallow: /db1 Disallow: /db2 With a `Disallow` line for every attached database. To block access to specific areas of the site using custom paths, add this to your `metadata.json` configuration file: ```json { "plugins": { "datasette-block-robots": { "disallow": ["/mydatabase/mytable"] } } } ``` This will result in a `/robots.txt` that looks like this: User-agent: * Disallow: /mydatabase/mytable Alternatively you can set the full contents of the `robots.txt` file using the `literal` configuration option. Here's how to do that if you are using YAML rather than JSON and have a `metadata.yml` file: ```yaml plugins: datasette-block-robots: literal: |- User-agent: * Disallow: / User-agent: Bingbot User-agent: Googlebot Disallow: ``` This example would block all crawlers with the exception of Googlebot and Bingbot, which are allowed to crawl the entire site. ## Extending this with other plugins This plugin adds a new [plugin hook](https://docs.datasette.io/en/stable/plugin_hooks.html) to Datasete called `block_robots_extra_lines()` which can be used by other plugins to add their own additional lines to the `robots.txt` file. The hook can optionally accept these parameters: - `datasette`: The current [Datasette instance](https://docs.datasette.io/en/stable/internals.html#datasette-class). You can use this to execute SQL queries or read plugin configuration settings. - `request`: The [Request object](https://docs.datasette.io/en/stable/internals.html#request-object) representing the incoming request to `/robots.txt`. The hook should return a list of strings, each representing a line to be added to the `robots.txt` file. It can also return an `async def` function, which will be awaited and used to generate a list of lines. Use this option if you need to make `await` calls inside you hook implementation. This example uses the hook to add a `Sitemap: http://example.com/sitemap.xml` line to the `robots.txt` file: ```python from datasette import hookimpl @hookimpl def block_robots_extra_lines(datasette, request): return [ "Sitemap: {}".format(datasette.absolute_url(request, "/sitemap.xml")), ] ``` This example blocks access to paths based on a database query: ```python @hookimpl def block_robots_extra_lines(datasette): async def inner(): db = datasette.get_database() result = await db.execute("select path from mytable") return [ "Disallow: /{}".format(row["path"]) for row in result ] return inner ``` [datasette-sitemap](https://datasette.io/plugins/datasette-sitemap) is an example of a plugin that uses this hook. ## Development To set up this plugin locally, first checkout the code. Then create a new virtual environment: cd datasette-block-robots python3 -mvenv venv source venv/bin/activate Or if you are using `pipenv`: pipenv shell Now install the dependencies and tests: pip install -e '.[test]' To run the tests: pytest <div id="readme" class="md" data-path="README.md"><article class="markdown-body entry-content container-lg" itemprop="text"><h1 dir="auto"><a id="user-content-datasette-block-robots" class="anchor" aria-hidden="true" href="#user-content-datasette-block-robots"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a>datasette-block-robots</h1> <p dir="auto"><a href="https://pypi.org/project/datasette-block-robots/" rel="nofollow"><img src="https://camo.githubusercontent.com/f4e772c8056a8f8ea71ddee7622d691ff2292eff02e39a1937dd0acb53ba13f5/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f6461746173657474652d626c6f636b2d726f626f74732e737667" alt="PyPI" data-canonical-src="https://img.shields.io/pypi/v/datasette-block-robots.svg" style="max-width: 100%;"></a> <a href="https://github.com/simonw/datasette-block-robots/releases"><img src="https://camo.githubusercontent.com/bc5b8190e1a2e9d22d6eb0392d076535f3bb7ddf1b4e6f12dc0a2f5a607c948f/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f72656c656173652f73696d6f6e772f6461746173657474652d626c6f636b2d726f626f74733f6c6162656c3d6368616e67656c6f67" alt="Changelog" data-canonical-src="https://img.shields.io/github/v/release/simonw/datasette-block-robots?label=changelog" style="max-width: 100%;"></a> <a href="https://github.com/simonw/datasette-block-robots/blob/master/LICENSE"><img src="https://camo.githubusercontent.com/1698104e976c681143eb0841f9675c6f802bb7aa832afc0c7a4e719b1f3cf955/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d417061636865253230322e302d626c75652e737667" alt="License" data-canonical-src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" style="max-width: 100%;"></a></p> <p dir="auto">Datasette plugin that blocks robots and crawlers using robots.txt</p> <h2 dir="auto"><a id="user-content-installation" class="anchor" aria-hidden="true" href="#user-content-installation"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a>Installation</h2> <p dir="auto">Install this plugin in the same environment as Datasette.</p> <div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="$ pip install datasette-block-robots"><pre class="notranslate"><code>$ pip install datasette-block-robots </code></pre></div> <h2 dir="auto"><a id="user-content-usage" class="anchor" aria-hidden="true" href="#user-content-usage"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a>Usage</h2> <p dir="auto">Having installed the plugin, <code>/robots.txt</code> on your Datasette instance will return the following:</p> <div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="User-agent: * Disallow: /"><pre class="notranslate"><code>User-agent: * Disallow: / </code></pre></div> <p dir="auto">This will request all robots and crawlers not to visit any of the pages on your site.</p> <p dir="auto">Here's a demo of the plugin in action: <a href="https://sqlite-generate-demo.datasette.io/robots.txt" rel="nofollow">https://sqlite-generate-demo.datasette.io/robots.txt</a></p> <h2 dir="auto"><a id="user-content-configuration" class="anchor" aria-hidden="true" href="#user-content-configuration"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a>Configuration</h2> <p dir="auto">By default the plugin will block all access to the site, using <code>Disallow: /</code>.</p> <p dir="auto">If you want the index page to be indexed by search engines without crawling the database, table or row pages themselves, you can use the following:</p> <div class="highlight highlight-source-json notranslate position-relative overflow-auto" dir="auto" data-snippet-clipboard-copy-content="{ &quot;plugins&quot;: { &quot;datasette-block-robots&quot;: { &quot;allow_only_index&quot;: true } } }"><pre>{ <span class="pl-ent">"plugins"</span>: { <span class="pl-ent">"datasette-block-robots"</span>: { <span class="pl-ent">"allow_only_index"</span>: <span class="pl-c1">true</span> } } }</pre></div> <p dir="auto">This will return a <code>/robots.txt</code> like so:</p> <div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="User-agent: * Disallow: /db1 Disallow: /db2"><pre class="notranslate"><code>User-agent: * Disallow: /db1 Disallow: /db2 </code></pre></div> <p dir="auto">With a <code>Disallow</code> line for every attached database.</p> <p dir="auto">To block access to specific areas of the site using custom paths, add this to your <code>metadata.json</code> configuration file:</p> <div class="highlight highlight-source-json notranslate position-relative overflow-auto" dir="auto" data-snippet-clipboard-copy-content="{ &quot;plugins&quot;: { &quot;datasette-block-robots&quot;: { &quot;disallow&quot;: [&quot;/mydatabase/mytable&quot;] } } }"><pre>{ <span class="pl-ent">"plugins"</span>: { <span class="pl-ent">"datasette-block-robots"</span>: { <span class="pl-ent">"disallow"</span>: [<span class="pl-s"><span class="pl-pds">"</span>/mydatabase/mytable<span class="pl-pds">"</span></span>] } } }</pre></div> <p dir="auto">This will result in a <code>/robots.txt</code> that looks like this:</p> <div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="User-agent: * Disallow: /mydatabase/mytable"><pre class="notranslate"><code>User-agent: * Disallow: /mydatabase/mytable </code></pre></div> <p dir="auto">Alternatively you can set the full contents of the <code>robots.txt</code> file using the <code>literal</code> configuration option. Here's how to do that if you are using YAML rather than JSON and have a <code>metadata.yml</code> file:</p> <div class="highlight highlight-source-yaml notranslate position-relative overflow-auto" dir="auto" data-snippet-clipboard-copy-content="plugins: datasette-block-robots: literal: |- User-agent: * Disallow: / User-agent: Bingbot User-agent: Googlebot Disallow:"><pre><span class="pl-ent">plugins</span>: <span class="pl-ent">datasette-block-robots</span>: <span class="pl-ent">literal</span>: <span class="pl-s">|-</span> <span class="pl-s"> User-agent: *</span> <span class="pl-s"> Disallow: /</span> <span class="pl-s"> User-agent: Bingbot</span> <span class="pl-s"> User-agent: Googlebot</span> <span class="pl-s"> Disallow:</span></pre></div> <p dir="auto">This example would block all crawlers with the exception of Googlebot and Bingbot, which are allowed to crawl the entire site.</p> <h2 dir="auto"><a id="user-content-extending-this-with-other-plugins" class="anchor" aria-hidden="true" href="#user-content-extending-this-with-other-plugins"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a>Extending this with other plugins</h2> <p dir="auto">This plugin adds a new <a href="https://docs.datasette.io/en/stable/plugin_hooks.html" rel="nofollow">plugin hook</a> to Datasete called <code>block_robots_extra_lines()</code> which can be used by other plugins to add their own additional lines to the <code>robots.txt</code> file.</p> <p dir="auto">The hook can optionally accept these parameters:</p> <ul dir="auto"> <li><code>datasette</code>: The current <a href="https://docs.datasette.io/en/stable/internals.html#datasette-class" rel="nofollow">Datasette instance</a>. You can use this to execute SQL queries or read plugin configuration settings.</li> <li><code>request</code>: The <a href="https://docs.datasette.io/en/stable/internals.html#request-object" rel="nofollow">Request object</a> representing the incoming request to <code>/robots.txt</code>.</li> </ul> <p dir="auto">The hook should return a list of strings, each representing a line to be added to the <code>robots.txt</code> file.</p> <p dir="auto">It can also return an <code>async def</code> function, which will be awaited and used to generate a list of lines. Use this option if you need to make <code>await</code> calls inside you hook implementation.</p> <p dir="auto">This example uses the hook to add a <code>Sitemap: http://example.com/sitemap.xml</code> line to the <code>robots.txt</code> file:</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" dir="auto" data-snippet-clipboard-copy-content="from datasette import hookimpl @hookimpl def block_robots_extra_lines(datasette, request): return [ &quot;Sitemap: {}&quot;.format(datasette.absolute_url(request, &quot;/sitemap.xml&quot;)), ]"><pre><span class="pl-k">from</span> <span class="pl-s1">datasette</span> <span class="pl-k">import</span> <span class="pl-s1">hookimpl</span> <span class="pl-en">@<span class="pl-s1">hookimpl</span></span> <span class="pl-k">def</span> <span class="pl-en">block_robots_extra_lines</span>(<span class="pl-s1">datasette</span>, <span class="pl-s1">request</span>): <span class="pl-k">return</span> [ <span class="pl-s">"Sitemap: {}"</span>.<span class="pl-en">format</span>(<span class="pl-s1">datasette</span>.<span class="pl-en">absolute_url</span>(<span class="pl-s1">request</span>, <span class="pl-s">"/sitemap.xml"</span>)), ]</pre></div> <p dir="auto">This example blocks access to paths based on a database query:</p> <div class="highlight highlight-source-python notranslate position-relative overflow-auto" dir="auto" data-snippet-clipboard-copy-content="@hookimpl def block_robots_extra_lines(datasette): async def inner(): db = datasette.get_database() result = await db.execute(&quot;select path from mytable&quot;) return [ &quot;Disallow: /{}&quot;.format(row[&quot;path&quot;]) for row in result ] return inner"><pre><span class="pl-en">@<span class="pl-s1">hookimpl</span></span> <span class="pl-k">def</span> <span class="pl-en">block_robots_extra_lines</span>(<span class="pl-s1">datasette</span>): <span class="pl-k">async</span> <span class="pl-k">def</span> <span class="pl-en">inner</span>(): <span class="pl-s1">db</span> <span class="pl-c1">=</span> <span class="pl-s1">datasette</span>.<span class="pl-en">get_database</span>() <span class="pl-s1">result</span> <span class="pl-c1">=</span> <span class="pl-k">await</span> <span class="pl-s1">db</span>.<span class="pl-en">execute</span>(<span class="pl-s">"select path from mytable"</span>) <span class="pl-k">return</span> [ <span class="pl-s">"Disallow: /{}"</span>.<span class="pl-en">format</span>(<span class="pl-s1">row</span>[<span class="pl-s">"path"</span>]) <span class="pl-k">for</span> <span class="pl-s1">row</span> <span class="pl-c1">in</span> <span class="pl-s1">result</span> ] <span class="pl-k">return</span> <span class="pl-s1">inner</span></pre></div> <p dir="auto"><a href="https://datasette.io/plugins/datasette-sitemap" rel="nofollow">datasette-sitemap</a> is an example of a plugin that uses this hook.</p> <h2 dir="auto"><a id="user-content-development" class="anchor" aria-hidden="true" href="#user-content-development"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a>Development</h2> <p dir="auto">To set up this plugin locally, first checkout the code. Then create a new virtual environment:</p> <div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="cd datasette-block-robots python3 -mvenv venv source venv/bin/activate"><pre class="notranslate"><code>cd datasette-block-robots python3 -mvenv venv source venv/bin/activate </code></pre></div> <p dir="auto">Or if you are using <code>pipenv</code>:</p> <div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="pipenv shell"><pre class="notranslate"><code>pipenv shell </code></pre></div> <p dir="auto">Now install the dependencies and tests:</p> <div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="pip install -e '.[test]'"><pre class="notranslate"><code>pip install -e '.[test]' </code></pre></div> <p dir="auto">To run the tests:</p> <div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="pytest"><pre class="notranslate"><code>pytest </code></pre></div> </article></div> 1 public 0   0  

Links from other tables

  • 6 rows from repo in releases
Powered by Datasette · Queries took 1.291ms