c2f6f50
Handle actual 429 by
2022-07-17 20:33:29 +0000
ccf4d67
Allow negative offsets to peek near the end of the file by
2022-06-12 01:13:28 +0000
4798154
Fix URLs without a path by
2022-06-01 04:08:34 +0000
1830d67
Add ia-cdx-search-subdomains by
2022-06-01 04:05:05 +0000
565be7b
Fix by
2022-05-21 21:27:54 +0000
e2085e6
Add cloudflare-email-decode by
2022-04-27 19:43:08 +0000
73f35f5
Fix infinite loop when file ends with something that is not a WARC record by
2022-04-25 20:46:49 +0000
06d60a7
Bump read size by
2022-04-25 20:46:22 +0000
3e0b70b
Handle processes with too many open connections by
2022-04-18 21:45:21 +0000
df7b25c
Error on unknown options by
2022-04-10 00:49:04 +0000
4bd4f5a
Fix 'Argument list too long' error when using --urls-from-stdin with many URLs by
2022-04-07 17:55:48 +0000
e20d35a
Fix crash on 429 by
2022-04-07 03:48:09 +0000
cef6143
Add --urls-from-stdin by
2022-04-02 17:46:06 +0000
b5cf049
Add Wasabi by
2022-03-26 00:40:05 +0000
d2afd13
Add s3-bucket-find-direct-url by
2022-03-26 00:37:29 +0000
9598846
Make S3 response pattern matching more flexible (so it also works on Scaleway) by
2022-03-25 20:35:03 +0000
a9a03d3
Add urlsort by
2022-03-19 22:39:04 +0000
9798cc1
Typo by
2022-03-16 20:10:18 +0000
d193637
Add kill-connections by
2022-03-16 20:06:53 +0000
6cfe8e5
Make job a global variable in --pyfilter expressions so it can be used in genexps by
2022-03-07 03:14:08 +0000
a4627fa
Queue derives with `ia tasks` instead of this manual curl rubbish by
2022-02-11 02:20:28 +0000
c68b310
Always print the parts value if there is an upload ID by
2022-01-18 00:19:48 +0000
fdc3c3d
Support float values for --partsize with M or G suffix by
2022-01-18 00:19:38 +0000
002c1eb
Wait until item exists by
2022-01-11 22:26:05 +0000
142a5a9
Get rid of asyncio by
2022-01-11 04:25:34 +0000
b6663ae
Add concurrency by
2022-01-11 04:22:09 +0000
22f2e68
Add JSONL output option for S3 listing by
2022-01-08 18:02:23 +0000
bfebe9a
Fix only sending partial file contents on retries by
2021-12-26 16:46:11 +0000
39b3b77
Add support for IA_CONFIG_FILE environment variable by
2021-12-17 10:58:10 +0000
7ed2906
Add progress bar by
2021-12-17 10:39:41 +0000
58f0f0f
Fix being unable to resume an upload that crashed in the first part by
2021-12-17 10:37:43 +0000
74485c3
Require decompressed WARCs with warc-tiny by
2021-12-06 20:16:51 +0000
e247901
Add at-tracker-sample-user-item-size by
2021-12-06 02:39:33 +0000
a14939b
Add base64url by
2021-12-06 02:32:24 +0000
5c2ce7e
Add cdx-chunk by
2021-12-04 07:40:34 +0000
fe0b020
Add support for reading from stdin by
2021-12-04 05:32:51 +0000
1010769
Handle connection errors by
2021-11-29 15:05:48 +0000
1acdc88
Add ia-upload-stream by
2021-11-29 14:28:26 +0000
360c4d9
Add youtube-extract-rapid by
2021-11-26 22:11:41 +0000
d07b5a7
Remove debugging prints by
2021-11-26 20:40:09 +0000
bf5e065
Add URL/percent decoding tool by
2021-11-26 08:59:50 +0000
11485d9
Add infrastructure for simple C-based tools by
2021-11-26 08:59:05 +0000
c50a8fd
Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed by
2021-11-22 23:10:19 +0000
5bc3d4b
Fix crash on an empty response by
2021-11-21 16:00:34 +0000
7f25c09
Catch other connection errors by
2021-11-19 01:32:25 +0000
f835280
Handle ConnectionResetError by
2021-11-18 17:25:35 +0000
0b34268
Catch socket.timeout, which is a separate exception class from TimeoutError before Python 3.10 by
2021-11-18 07:47:11 +0000
0f7a2b3
Log number of results on a page by
2021-11-18 03:37:09 +0000
628aeb0
Handle rate limiting by
2021-11-18 03:36:29 +0000
d3ea3ce
Switch from urllib to http.client to reuse connections by
2021-11-18 03:35:52 +0000
8f7619f
Add retries by
2021-11-17 22:54:26 +0000
f98fdd5
Fix printing HTTP response line to stdout instead of stderr by
2021-11-17 07:42:04 +0000
c9400ac
Fix recognition of command without optional parts by
2021-11-17 07:38:53 +0000
5ca15a7
Add concurrency support by
2021-11-17 04:14:59 +0000
191948c
Print number of modified records on requeueing by
2021-11-15 01:07:32 +0000
5121524
Log retrieval of showNumPages by
2021-11-11 21:49:47 +0000
aba7a1b
Replace resumeKey pagination with page number pagination by
2021-11-11 21:46:18 +0000
d57324a
Add --where for arbitrary conditions by
2021-11-10 09:51:10 +0000
fed6438
Invert count/write logic by
2021-11-10 09:49:28 +0000
f914b6a
Also reset the status_code on requeueing by
2021-11-10 09:37:00 +0000
303bb69
Add ia-cdx-search by
2021-11-10 08:40:13 +0000
0b45f7b
Swap syntaxes by
2021-11-02 21:04:51 +0000
b2c9ea2
Refactor by
2021-11-02 20:57:21 +0000
eaf53e1
Add alphabetseq by
2021-11-02 20:44:18 +0000
c9c8b7e
Add ia-wait-item-tasks by
2021-10-25 20:19:56 +0000
b440b35
Handle ancient /?v= URLs by
2021-10-07 16:05:46 +0000
0044281
Add YouTube channel listing script by
2021-09-30 02:23:26 +0000
1686e04
Add a timeout to prevent potentially indefinite blocking by
2021-09-17 04:08:01 +0000
2fc9652
Add support for other instances and full-instance listing by
2021-09-11 18:39:51 +0000
b72da47
Fix org repo listing on new design/site structure by
2021-08-19 18:04:09 +0000
ce7a069
Add --jsonl option by
2021-08-17 17:25:20 +0000
9412f0c
Add azure-storage-list by
2021-08-17 17:25:01 +0000
696e221
Add support for password-protected folders by
2021-08-15 21:39:30 +0000
158c1f1
Fix usage error by
2021-08-15 21:39:11 +0000
53bfe46
Basic error checks by
2021-08-15 21:38:26 +0000
8c61208
Restore MD5 check as the API returns it again by
2021-08-15 21:37:03 +0000
8554c01
Fix gofile.io download to the new getFolder endpoint and download server structure by
2021-08-15 21:36:19 +0000
a246bad
Add support for Shorts by
2021-08-11 22:39:42 +0000
6d019e6
Fix removenonyt performance by using simpler fixed-string patterns instead of a PCRE by
2021-08-11 22:39:27 +0000
b27a428
Fix usage notes from URLs to lines on stdin by
2021-08-11 22:38:17 +0000
c4b62c2
Fix piping when reads return less data than expected by
2021-07-27 05:16:00 +0000
dba6d1f
Fix stderr printing by
2021-07-27 05:04:30 +0000
6e5a019
Always decode stdin with surrogateescape to avoid breaking on binary input by
2021-07-27 03:33:51 +0000
e48fb9d
Tighten patterns for user and custom channel URLs so they can handle HTML input more easily by
2021-07-27 03:24:14 +0000
9cbc3f7
Extract playlist and channel IDs from watch URLs by
2021-07-27 03:22:08 +0000
80bf010
Percent-decode each line only once by
2021-07-27 03:20:04 +0000
f1fcfab
Add support for reading warc.zst from stdin by
2021-07-27 00:56:46 +0000
d5f646f
Add zstdwarccat by
2021-07-26 23:07:56 +0000
4415c8d
Add support for img.youtube.com (old thumbnails) by
2021-07-26 15:23:12 +0000
50a0fcc
Fix performance regression due to 479c2684
by
2021-07-25 19:47:05 +0000
479c268
Fix whitespace handling by
2021-07-25 19:23:41 +0000
56f21d1
Add aggressive video ID v parameter extraction by
2021-07-25 18:36:38 +0000
99c83eb
Handle optional slash in generic watch matcher by
2021-07-25 18:32:05 +0000
9f88f76
Handle a few more odd and rare URLs by
2021-07-25 18:19:16 +0000
a0f3b16
Handle youtu.be case variations and port numbers by
2021-07-25 18:07:52 +0000
273d3ed
Handle gaming.youtube.com by
2021-07-25 17:57:01 +0000
0cb61f4
Add b64grep by
2021-07-25 04:22:47 +0000
8e6e47d
Fix ytimg extraction by
2021-07-22 19:17:03 +0000
0b13758
Add Bugzilla URL list generator by
2021-07-11 21:06:55 +0000
ce0ae88
Add ia-verify-file by
2021-07-08 18:53:02 +0000