1. 26 Aug, 2020 2 commits
  2. 24 Aug, 2020 2 commits
  3. 23 Aug, 2020 4 commits
  4. 20 Aug, 2020 1 commit
  5. 30 Jul, 2020 2 commits
  6. 17 Feb, 2020 2 commits
    • ale's avatar
      Fix the Handler in cmd/links · 082784b7
      ale authored
      082784b7
    • ale's avatar
      Propagate the link tag through redirects · 533f4725
      ale authored
      In order to do this we have to plumb it through the queue and the
      Handler interface, but it should allow fetches of the resources
      associated with a page via the IncludeRelatedScope even if it's behind
      a redirect.
      533f4725
  7. 04 Dec, 2019 1 commit
  8. 13 Nov, 2019 2 commits
  9. 07 Oct, 2019 2 commits
  10. 26 Sep, 2019 3 commits
  11. 20 Jan, 2019 1 commit
  12. 19 Jan, 2019 1 commit
    • ale's avatar
      Replace URLInfo with a simple URL presence check · cce28f44
      ale authored
      The whole URLInfo structure, while neat, is unused except for the
      purpose of verifying if we have already seen a specific URL.
      
      The presence check is also now limited to Enqueue().
      cce28f44
  13. 02 Jan, 2019 1 commit
    • ale's avatar
      Add multi-file output · c5ec7eb8
      ale authored
      The output stage can now write to size-limited, rotating WARC files
      using a user-specified pattern, so that output files are always
      unique.
      c5ec7eb8
  14. 28 Dec, 2018 1 commit
  15. 27 Dec, 2018 2 commits
  16. 06 Dec, 2018 1 commit
  17. 02 Sep, 2018 4 commits
  18. 31 Aug, 2018 7 commits
  19. 30 Aug, 2018 1 commit
    • ale's avatar
      Mention trickle as a possible bandwidth limiter · 86a0bd2d
      ale authored
      Since such bandwidth limiting is not provided by crawl directly, tell
      users there is another solution. Once/if crawl implements that on its
      own, that notice could be removed.
      86a0bd2d