Commit Graph

  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • *
  • c2f6f50 Handle actual 429 by JustAnotherArchivist 2022-07-17 20:33:29 +0000
  • ccf4d67 Allow negative offsets to peek near the end of the file by JustAnotherArchivist 2022-06-12 01:13:28 +0000
  • 4798154 Fix URLs without a path by JustAnotherArchivist 2022-06-01 04:08:34 +0000
  • 1830d67 Add ia-cdx-search-subdomains by JustAnotherArchivist 2022-06-01 04:05:05 +0000
  • 565be7b Fix by JustAnotherArchivist 2022-05-21 21:27:54 +0000
  • e2085e6 Add cloudflare-email-decode by JustAnotherArchivist 2022-04-27 19:43:08 +0000
  • 73f35f5 Fix infinite loop when file ends with something that is not a WARC record by JustAnotherArchivist 2022-04-25 20:46:49 +0000
  • 06d60a7 Bump read size by JustAnotherArchivist 2022-04-25 20:46:22 +0000
  • 3e0b70b Handle processes with too many open connections by JustAnotherArchivist 2022-04-18 21:45:21 +0000
  • df7b25c Error on unknown options by JustAnotherArchivist 2022-04-10 00:49:04 +0000
  • 4bd4f5a Fix 'Argument list too long' error when using --urls-from-stdin with many URLs by JustAnotherArchivist 2022-04-07 17:55:48 +0000
  • e20d35a Fix crash on 429 by JustAnotherArchivist 2022-04-07 03:48:09 +0000
  • cef6143 Add --urls-from-stdin by JustAnotherArchivist 2022-04-02 17:46:06 +0000
  • b5cf049 Add Wasabi by JustAnotherArchivist 2022-03-26 00:40:05 +0000
  • d2afd13 Add s3-bucket-find-direct-url by JustAnotherArchivist 2022-03-26 00:37:29 +0000
  • 9598846 Make S3 response pattern matching more flexible (so it also works on Scaleway) by JustAnotherArchivist 2022-03-25 20:35:03 +0000
  • a9a03d3 Add urlsort by JustAnotherArchivist 2022-03-19 22:39:04 +0000
  • 9798cc1 Typo by JustAnotherArchivist 2022-03-16 20:10:18 +0000
  • d193637 Add kill-connections by JustAnotherArchivist 2022-03-16 20:06:53 +0000
  • 6cfe8e5 Make job a global variable in --pyfilter expressions so it can be used in genexps by JustAnotherArchivist 2022-03-07 03:14:08 +0000
  • a4627fa Queue derives with `ia tasks` instead of this manual curl rubbish by JustAnotherArchivist 2022-02-11 02:20:28 +0000
  • c68b310 Always print the parts value if there is an upload ID by JustAnotherArchivist 2022-01-18 00:19:48 +0000
  • fdc3c3d Support float values for --partsize with M or G suffix by JustAnotherArchivist 2022-01-18 00:19:38 +0000
  • 002c1eb Wait until item exists by JustAnotherArchivist 2022-01-11 22:26:05 +0000
  • 142a5a9 Get rid of asyncio by JustAnotherArchivist 2022-01-11 04:25:34 +0000
  • b6663ae Add concurrency by JustAnotherArchivist 2022-01-11 04:22:09 +0000
  • 22f2e68 Add JSONL output option for S3 listing by JustAnotherArchivist 2022-01-08 18:02:23 +0000
  • bfebe9a Fix only sending partial file contents on retries by JustAnotherArchivist 2021-12-26 16:46:11 +0000
  • 39b3b77 Add support for IA_CONFIG_FILE environment variable by JustAnotherArchivist 2021-12-17 10:58:10 +0000
  • 7ed2906 Add progress bar by JustAnotherArchivist 2021-12-17 10:39:41 +0000
  • 58f0f0f Fix being unable to resume an upload that crashed in the first part by JustAnotherArchivist 2021-12-17 10:37:43 +0000
  • 74485c3 Require decompressed WARCs with warc-tiny by JustAnotherArchivist 2021-12-06 20:16:51 +0000
  • e247901 Add at-tracker-sample-user-item-size by JustAnotherArchivist 2021-12-06 02:39:33 +0000
  • a14939b Add base64url by JustAnotherArchivist 2021-12-06 02:32:24 +0000
  • 5c2ce7e Add cdx-chunk by JustAnotherArchivist 2021-12-04 07:40:34 +0000
  • fe0b020 Add support for reading from stdin by JustAnotherArchivist 2021-12-04 05:32:51 +0000
  • 1010769 Handle connection errors by JustAnotherArchivist 2021-11-29 15:05:48 +0000
  • 1acdc88 Add ia-upload-stream by JustAnotherArchivist 2021-11-29 14:28:26 +0000
  • 360c4d9 Add youtube-extract-rapid by JustAnotherArchivist 2021-11-26 22:11:41 +0000
  • d07b5a7 Remove debugging prints by JustAnotherArchivist 2021-11-26 20:40:09 +0000
  • bf5e065 Add URL/percent decoding tool by JustAnotherArchivist 2021-11-26 08:59:50 +0000
  • 11485d9 Add infrastructure for simple C-based tools by JustAnotherArchivist 2021-11-26 08:59:05 +0000
  • c50a8fd Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed by JustAnotherArchivist 2021-11-22 23:10:19 +0000
  • 5bc3d4b Fix crash on an empty response by JustAnotherArchivist 2021-11-21 16:00:34 +0000
  • 7f25c09 Catch other connection errors by JustAnotherArchivist 2021-11-19 01:32:25 +0000
  • f835280 Handle ConnectionResetError by JustAnotherArchivist 2021-11-18 17:25:35 +0000
  • 0b34268 Catch socket.timeout, which is a separate exception class from TimeoutError before Python 3.10 by JustAnotherArchivist 2021-11-18 07:47:11 +0000
  • 0f7a2b3 Log number of results on a page by JustAnotherArchivist 2021-11-18 03:37:09 +0000
  • 628aeb0 Handle rate limiting by JustAnotherArchivist 2021-11-18 03:36:29 +0000
  • d3ea3ce Switch from urllib to http.client to reuse connections by JustAnotherArchivist 2021-11-18 03:35:52 +0000
  • 8f7619f Add retries by JustAnotherArchivist 2021-11-17 22:54:26 +0000
  • f98fdd5 Fix printing HTTP response line to stdout instead of stderr by JustAnotherArchivist 2021-11-17 07:42:04 +0000
  • c9400ac Fix recognition of command without optional parts by JustAnotherArchivist 2021-11-17 07:38:53 +0000
  • 5ca15a7 Add concurrency support by JustAnotherArchivist 2021-11-17 04:14:59 +0000
  • 191948c Print number of modified records on requeueing by JustAnotherArchivist 2021-11-15 01:07:32 +0000
  • 5121524 Log retrieval of showNumPages by JustAnotherArchivist 2021-11-11 21:49:47 +0000
  • aba7a1b Replace resumeKey pagination with page number pagination by JustAnotherArchivist 2021-11-11 21:46:18 +0000
  • d57324a Add --where for arbitrary conditions by JustAnotherArchivist 2021-11-10 09:51:10 +0000
  • fed6438 Invert count/write logic by JustAnotherArchivist 2021-11-10 09:49:28 +0000
  • f914b6a Also reset the status_code on requeueing by JustAnotherArchivist 2021-11-10 09:37:00 +0000
  • 303bb69 Add ia-cdx-search by JustAnotherArchivist 2021-11-10 08:40:13 +0000
  • 0b45f7b Swap syntaxes by JustAnotherArchivist 2021-11-02 21:04:51 +0000
  • b2c9ea2 Refactor by JustAnotherArchivist 2021-11-02 20:57:21 +0000
  • eaf53e1 Add alphabetseq by JustAnotherArchivist 2021-11-02 20:44:18 +0000
  • c9c8b7e Add ia-wait-item-tasks by JustAnotherArchivist 2021-10-25 20:19:56 +0000
  • b440b35 Handle ancient /?v= URLs by JustAnotherArchivist 2021-10-07 16:05:46 +0000
  • 0044281 Add YouTube channel listing script by JustAnotherArchivist 2021-09-30 02:23:26 +0000
  • 1686e04 Add a timeout to prevent potentially indefinite blocking by JustAnotherArchivist 2021-09-17 04:08:01 +0000
  • 2fc9652 Add support for other instances and full-instance listing by JustAnotherArchivist 2021-09-11 18:39:51 +0000
  • b72da47 Fix org repo listing on new design/site structure by JustAnotherArchivist 2021-08-19 18:04:09 +0000
  • ce7a069 Add --jsonl option by JustAnotherArchivist 2021-08-17 17:25:20 +0000
  • 9412f0c Add azure-storage-list by JustAnotherArchivist 2021-08-17 17:25:01 +0000
  • 696e221 Add support for password-protected folders by JustAnotherArchivist 2021-08-15 21:39:30 +0000
  • 158c1f1 Fix usage error by JustAnotherArchivist 2021-08-15 21:39:11 +0000
  • 53bfe46 Basic error checks by JustAnotherArchivist 2021-08-15 21:38:26 +0000
  • 8c61208 Restore MD5 check as the API returns it again by JustAnotherArchivist 2021-08-15 21:37:03 +0000
  • 8554c01 Fix gofile.io download to the new getFolder endpoint and download server structure by JustAnotherArchivist 2021-08-15 21:36:19 +0000
  • a246bad Add support for Shorts by JustAnotherArchivist 2021-08-11 22:39:42 +0000
  • 6d019e6 Fix removenonyt performance by using simpler fixed-string patterns instead of a PCRE by JustAnotherArchivist 2021-08-11 22:39:27 +0000
  • b27a428 Fix usage notes from URLs to lines on stdin by JustAnotherArchivist 2021-08-11 22:38:17 +0000
  • c4b62c2 Fix piping when reads return less data than expected by JustAnotherArchivist 2021-07-27 05:16:00 +0000
  • dba6d1f Fix stderr printing by JustAnotherArchivist 2021-07-27 05:04:30 +0000
  • 6e5a019 Always decode stdin with surrogateescape to avoid breaking on binary input by JustAnotherArchivist 2021-07-27 03:33:51 +0000
  • e48fb9d Tighten patterns for user and custom channel URLs so they can handle HTML input more easily by JustAnotherArchivist 2021-07-27 03:24:14 +0000
  • 9cbc3f7 Extract playlist and channel IDs from watch URLs by JustAnotherArchivist 2021-07-27 03:22:08 +0000
  • 80bf010 Percent-decode each line only once by JustAnotherArchivist 2021-07-27 03:20:04 +0000
  • f1fcfab Add support for reading warc.zst from stdin by JustAnotherArchivist 2021-07-27 00:56:46 +0000
  • d5f646f Add zstdwarccat by JustAnotherArchivist 2021-07-26 23:07:56 +0000
  • 4415c8d Add support for img.youtube.com (old thumbnails) by JustAnotherArchivist 2021-07-26 15:23:12 +0000
  • 50a0fcc Fix performance regression due to 479c2684 by JustAnotherArchivist 2021-07-25 19:47:05 +0000
  • 479c268 Fix whitespace handling by JustAnotherArchivist 2021-07-25 19:23:41 +0000
  • 56f21d1 Add aggressive video ID v parameter extraction by JustAnotherArchivist 2021-07-25 18:36:38 +0000
  • 99c83eb Handle optional slash in generic watch matcher by JustAnotherArchivist 2021-07-25 18:32:05 +0000
  • 9f88f76 Handle a few more odd and rare URLs by JustAnotherArchivist 2021-07-25 18:19:16 +0000
  • a0f3b16 Handle youtu.be case variations and port numbers by JustAnotherArchivist 2021-07-25 18:07:52 +0000
  • 273d3ed Handle gaming.youtube.com by JustAnotherArchivist 2021-07-25 17:57:01 +0000
  • 0cb61f4 Add b64grep by JustAnotherArchivist 2021-07-25 04:22:47 +0000
  • 8e6e47d Fix ytimg extraction by JustAnotherArchivist 2021-07-22 19:17:03 +0000
  • 0b13758 Add Bugzilla URL list generator by JustAnotherArchivist 2021-07-11 21:06:55 +0000
  • ce0ae88 Add ia-verify-file by JustAnotherArchivist 2021-07-08 18:53:02 +0000