Commit Graph

  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • |\
  • * \
  • |\ \
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • * | |
  • 36aa2e8 (HEAD -> master) Add archivebot-log-extract-ignores by JustAnotherArchivist 2021-02-21 04:40:41 +0000
  • 5b731fb Fix compatibility with wpull 2.x by JustAnotherArchivist 2021-02-09 05:09:38 +0000
  • 743e058 Fix confusing error message when lxml is not installed by JustAnotherArchivist 2021-02-09 05:06:38 +0000
  • 491a80a Add warc-tiny scrape command for parsing HTTP responses using wpull and extracting links by JustAnotherArchivist 2021-02-09 03:35:13 +0000
  • fd2728f Add archivebot-irccloud-paste by JustAnotherArchivist 2021-02-04 23:45:30 +0000
  • 4eff3c3 Refactor, strip query/fragment by JustAnotherArchivist 2021-01-15 17:23:46 +0000
  • 821cacf Add --help by JustAnotherArchivist 2021-01-15 17:23:30 +0000
  • caffeba Add parent-urls by JustAnotherArchivist 2021-01-15 17:09:25 +0000
  • 77ec76b Add --urls and --nodl options by JustAnotherArchivist 2021-01-09 16:51:47 +0000
  • 06cf71f Fix gofile.io download: getServer is not used by the website anymore, and getUpload no longer returns the MD5 by JustAnotherArchivist 2021-01-09 16:49:42 +0000
  • bff1490 Add github-list-repos by JustAnotherArchivist 2020-12-22 01:07:15 +0000
  • bf695d6 Fix channel URLs by JustAnotherArchivist 2020-12-21 16:49:17 +0000
  • dde4464 Cover two more rare URLs by JustAnotherArchivist 2020-11-26 04:44:37 +0000
  • bbf2d2c Be more lenient regarding slashes to catch things with collapsed URLs in paths etc. by JustAnotherArchivist 2020-11-26 04:42:35 +0000
  • 362f66e Handle youtube-nocookie.com and fix removenonyt mode not recognising CC domains by JustAnotherArchivist 2020-11-26 04:05:01 +0000
  • 81e2b4b Refine patterns by JustAnotherArchivist 2020-11-26 03:43:07 +0000
  • 9974d46 Stop trying to rewrite patterns for percent encoding by JustAnotherArchivist 2020-11-26 03:48:59 +0000
  • 0ee83bc Refactor by JustAnotherArchivist 2020-11-26 02:49:20 +0000
  • b66260c Add youtube-extract by JustAnotherArchivist 2020-11-25 22:07:35 +0000
  • d82dff8 Add ETA column by JustAnotherArchivist 2020-11-21 03:40:13 +0000
  • 01274e4 Prevent constantly moving bytes around for better performance on large chunked records by JustAnotherArchivist 2020-11-11 01:50:50 +0000
  • 77d9f61 Colourise output by JustAnotherArchivist 2020-11-07 00:32:51 +0000
  • 6512669 Refactor and compare file list as well by JustAnotherArchivist 2020-11-07 00:27:38 +0000
  • 8e0cb30 Add atdash mode by JustAnotherArchivist 2020-10-20 17:29:58 +0000
  • 5fe595d Record wrapper script in meta WARC as well by JustAnotherArchivist 2020-09-26 19:32:13 +0000
  • c1def0e Fix S3_WITH_LIST_URLS being defined (but empty) when --with-list-urls is not used by JustAnotherArchivist 2020-09-26 18:45:34 +0000
  • 398cbfd Add s3-bucket-list-qwarc, rewritten s3-bucket-list on top of qwarc by JustAnotherArchivist 2020-09-26 17:41:19 +0000
  • 80084e0 Another alternative and performance/memory comparison by JustAnotherArchivist 2020-09-25 18:50:25 +0000
  • 6a288a6 Use grep instead, which is faster but uses more memory by JustAnotherArchivist 2020-09-25 17:45:45 +0000
  • 4d274e6 Add dedupe by JustAnotherArchivist 2020-09-25 17:30:43 +0000
  • a4af8e6 Add IE6 UA by JustAnotherArchivist 2020-09-20 02:14:24 +0000
  • ac27743 Add Googlebot UA by JustAnotherArchivist 2020-09-20 02:06:14 +0000
  • 0181e53 Treat NXDOMAIN and no A/AAAA record errors as ok by JustAnotherArchivist 2020-09-11 16:35:19 +0000
  • 41c2a9d Add support for alternative xmlns by JustAnotherArchivist 2020-09-11 01:36:56 +0000
  • 830e9db Treat redirects as successful retrievals by JustAnotherArchivist 2020-09-11 01:31:36 +0000
  • 7a999c9 Ignore redirects by JustAnotherArchivist 2020-09-11 01:16:17 +0000
  • 579d589 Add a script to extract errors from wpull 2.x logs by JustAnotherArchivist 2020-09-11 01:06:43 +0000
  • d60948e Verbosity by JustAnotherArchivist 2020-08-18 18:02:50 +0000
  • a9a4792 Fix server validation by JustAnotherArchivist 2020-08-18 18:01:53 +0000
  • 57e2e26 Support multi-file uploads by JustAnotherArchivist 2020-08-18 17:59:09 +0000
  • 02c967f Add gofile.io download script by JustAnotherArchivist 2020-08-18 17:43:16 +0000
  • a83d28d Add WARC/1.1 support by JustAnotherArchivist 2020-07-16 00:46:49 +0000
  • ba2f7db Merge warc-peek repository into little-things by JustAnotherArchivist 2020-07-15 22:04:37 +0000
  • 79fc113 Merge kill-wpull-connections repository into little-things by JustAnotherArchivist 2020-07-15 19:59:36 +0000
  • b4bb9ba Switch to HTTPS by JustAnotherArchivist 2020-07-15 16:07:47 +0000
  • 9f3c7b3 Support negative filter values for date columns as relative to the current datetime by JustAnotherArchivist 2020-07-09 18:49:09 +0000
  • c7151ef Add script for checking whether a file on transfer.notkiska.pw was archived correctly with AB by JustAnotherArchivist 2020-07-04 01:40:00 +0000
  • 4c90bac Shield values in colons with angled brackets by JustAnotherArchivist 2020-07-02 02:01:30 +0000
  • f51adcc Add --meta mode for dump-responses which prefixes each line with information about the file and record by JustAnotherArchivist 2020-07-02 01:41:18 +0000
  • 9cc1f41 Pass the filename in NewFile events by JustAnotherArchivist 2020-07-02 01:40:39 +0000
  • a38efc3 Introduce a way to provide additional arguments to processors by JustAnotherArchivist 2020-07-02 01:40:14 +0000
  • ecf6678 Fix deb file URLs by JustAnotherArchivist 2020-06-16 19:14:06 +0000
  • 1e5fbed Fix log message going to stdout by JustAnotherArchivist 2020-06-16 18:37:00 +0000
  • 3a2cea1 Add script for recursing over Debian repos by JustAnotherArchivist 2020-06-16 00:05:56 +0000
  • 4f12f73 Refactor filtering and add --pyfilter by JustAnotherArchivist 2020-06-04 18:15:33 +0000
  • 785f13e Add --replace-{concurrency,delay} by JustAnotherArchivist 2020-06-04 17:53:10 +0000
  • 5067c40 Fix errors on invalid filter or sort values by JustAnotherArchivist 2020-06-04 17:42:17 +0000
  • 7e2befc +x by JustAnotherArchivist 2020-05-31 15:23:24 +0000
  • 49376db Decode HTTP request bodies by JustAnotherArchivist 2020-05-28 22:32:37 +0000
  • 171ca42 Disable truncation when stdout is not a terminal by JustAnotherArchivist 2020-03-29 01:52:40 +0000
  • 9763370 Truncate URLs by default to fit the terminal width by JustAnotherArchivist 2020-03-29 01:47:40 +0000
  • 1bc1487 Add script for extracting remaining wpull 2 queue by JustAnotherArchivist 2020-03-11 21:02:55 +0000
  • d3c0035 Make con-d-commands mode an alias of the corresponding format by JustAnotherArchivist 2020-03-11 20:58:41 +0000
  • b4fe6dd Reorder arguments to make more sense by JustAnotherArchivist 2020-03-11 20:51:17 +0000
  • c547fc6 Add format mode by JustAnotherArchivist 2020-03-11 20:49:44 +0000
  • 7fde199 Add --mode con-d-commands, replace --dashboard-regex with --mode dashboard-regex by JustAnotherArchivist 2020-03-11 02:23:36 +0000
  • 0a6f83b Add --dashboard-regex by JustAnotherArchivist 2020-03-10 23:04:58 +0000
  • 05ed1e0 Add more columns by JustAnotherArchivist 2020-03-07 22:48:32 +0000
  • 3f7d84a Refactor the Bash/Python abomination into a pure Python script so I get to keep my sanity while editing by JustAnotherArchivist 2020-03-07 22:19:35 +0000
  • cf879c8 Refactor into something more flexible to the addition of new columns by JustAnotherArchivist 2020-03-07 19:02:50 +0000
  • d7bd8de Add --dates option by JustAnotherArchivist 2020-02-28 20:27:16 +0000
  • 236278f Fix decoding of links on Facebook profiles by JustAnotherArchivist 2020-02-24 03:10:57 +0000
  • d7a07d1 Normalise domain name to lower-case before further processing by JustAnotherArchivist 2020-02-24 01:18:18 +0000
  • e655080 Add support for Facebook /pages/category/Category/Name-ID URLs by JustAnotherArchivist 2020-02-19 17:53:45 +0000
  • daa1a95 Proper URL decoding by JustAnotherArchivist 2020-02-18 16:53:41 +0000
  • 1bee1cd Add support for Facebook /people/Name/ID URLs by JustAnotherArchivist 2020-02-17 13:26:46 +0000
  • 00107c0 Add support for YouTube /c/X URLs by JustAnotherArchivist 2020-02-17 13:25:19 +0000
  • b59b820 Add support for wiki list entries with options by JustAnotherArchivist 2020-02-12 03:12:30 +0000
  • d5953ca Use old Opera UA for Twitter to force the old design by JustAnotherArchivist 2020-02-10 18:24:27 +0000
  • 1fa57d4 Fix extraction on Wix sites from JSON inside a data attribute by JustAnotherArchivist 2020-02-10 18:23:36 +0000
  • 4a74216 Suppress output if there are no matched jobs by JustAnotherArchivist 2020-02-10 01:18:11 +0000
  • fe72d57 Add filtering based on substrings anywhere in the string and on regex by JustAnotherArchivist 2020-02-10 00:47:19 +0000
  • cf30a53 Add case-insensitive filtering by JustAnotherArchivist 2020-02-10 00:42:36 +0000
  • 711e444 Highlight jobs that have been inactive for over 6 hours by JustAnotherArchivist 2020-02-02 05:28:17 +0000
  • b291903 Fix sorting on numerical columns by JustAnotherArchivist 2020-02-02 05:27:05 +0000
  • 257b578 Add descending sort by JustAnotherArchivist 2020-02-02 05:18:08 +0000
  • 6e7449d Support column names in any capitalisation by JustAnotherArchivist 2020-02-02 05:11:16 +0000
  • e5e7bdf Add more filtering options by JustAnotherArchivist 2020-02-02 05:08:48 +0000
  • c611420 Remove options from usage line by JustAnotherArchivist 2020-02-02 05:03:54 +0000
  • 824eb5e Add script for getting an AB job overview table by JustAnotherArchivist 2020-02-02 04:38:28 +0000
  • 34c1a58 Fix detection of multiple transfer encodings by JustAnotherArchivist 2019-12-11 02:34:56 +0000
  • 195df08 Fix marker loop on some filenames due to lacking HTML entity processing by JustAnotherArchivist 2019-12-03 04:48:21 +0000
  • 3cc3a1e Fix nested tags by JustAnotherArchivist 2019-12-03 04:43:35 +0000
  • 5c90748 Handle broken pipe on stdout by JustAnotherArchivist 2019-11-23 02:41:58 +0000
  • b38349e Fix duplicate slashes by JustAnotherArchivist 2019-11-23 02:36:33 +0000
  • f23e4cc Retry on internal errors by JustAnotherArchivist 2019-11-23 02:31:14 +0000
  • bfe5f59 Add marker loop detection by JustAnotherArchivist 2019-11-22 17:04:07 +0000
  • 66bdef3 Take a bucket URL argument instead of hostname + bucketname by JustAnotherArchivist 2019-11-22 17:00:02 +0000
  • e385c1d Limit curl to 10 seconds by JustAnotherArchivist 2019-11-18 01:40:42 +0000
  • 7416244 Replace curl-archivebot-ua with a more general curl-ua script that supports different UAs selected by aliases by JustAnotherArchivist 2019-11-13 18:00:10 +0000