- Jun 19, 2021
-
-
ale authored
This is an internal inconsistency that should be investigated.
-
- Aug 20, 2020
-
-
ale authored
-
- Jul 30, 2020
-
-
ale authored
This allows users of crawl-as-a-library to recover from unexpected errors as a last resort.
-
- Feb 17, 2020
-
-
ale authored
In order to do this we have to plumb it through the queue and the Handler interface, but it should allow fetches of the resources associated with a page via the IncludeRelatedScope even if it's behind a redirect.
-
- Jan 20, 2019
-
-
ale authored
Introduce an interface to decouple the Enqueue functionality from the Crawler implementation.
-
- Jan 19, 2019
-
-
ale authored
The whole URLInfo structure, while neat, is unused except for the purpose of verifying if we have already seen a specific URL. The presence check is also now limited to Enqueue().
-
- Dec 27, 2018
-
-
ale authored
-
- Aug 31, 2018
-
-
ale authored
-
ale authored
Makes it possible to retry requests for temporary HTTP errors (429, 500, etc).
-
ale authored
Handler errors are fatal, so that an error writing the WARC output will cause the crawl to abort.
-
ale authored
Detect write errors (both on the database and to the WARC output) and abort with an error message. Also fix a bunch of harmless lint warnings.
-
- Dec 19, 2017
- Dec 18, 2017
-
-
ale authored
The native Go implementation of LevelDB.
-
- Jul 03, 2015
-
-
ale authored
-
- Jun 29, 2015
- Dec 20, 2014
- Dec 19, 2014
-
-
ale authored
-