diff --git a/README.md b/README.md
index 0de9d15d00b3e5fbb38cc8f6a38e68981fbda8e4..3e4d973caa53132604c65355fe910e0fd480ee4d 100644
--- a/README.md
+++ b/README.md
@@ -29,8 +29,8 @@ as arguments on the command line:
     $ crawl http://example.com/
 
 By default, the tool will store the output WARC file and its own
-database in the current directory. This can be controlled with the
-*--output* and *--state* command-line options.
+temporary crawl database in the current directory. This can be
+controlled with the *--output* and *--state* command-line options.
 
 The crawling scope is controlled with a set of overlapping checks:
 
@@ -44,6 +44,29 @@ The crawling scope is controlled with a set of overlapping checks:
 
 If the program is interrupted, running it again with the same command
 line from the same directory will cause it to resume crawling from
-where it stopped. At the end of a successful crawl, the database will
-be removed (unless you specify the *--keep* option, for debugging
-purposes).
+where it stopped. At the end of a successful crawl, the temporary
+crawl database will be removed (unless you specify the *--keep*
+option, for debugging purposes).
+
+It is possible to tell the crawler to exclude URLs matching specific
+regex patterns by using the *--exclude* or *--exclude-from-file*
+options. These option may be repeated multiple times. The crawler
+comes with its own builtin set of URI regular expressions meant to
+avoid calendars, admin panels of common CMS applications, and other
+well-known pitfalls. This list is sourced from the
+[ArchiveBot](https://github.com/ArchiveTeam/ArchiveBot) project.
+
+## Limitations
+
+Like most crawlers, this one has a number of limitations:
+
+* it completely ignores *robots.txt*. You can make such policy
+  decisions yourself by turning the robots.txt into a list of patterns
+  to be used with *--exclude-file*.
+* it does not embed a Javascript engine, so Javascript-rendered
+  elements will not be detected.
+* CSS parsing is limited (uses regular expressions), so some *url()*
+  resources might not be detected.
+* it expects reasonably well-formed HTML, so it may fail to extract
+  links from particularly broken pages.
+* support for \<object\> and \<video\> tags is limited.