0668267
s/transfer.notkiska.pw/transfer.archivete.am/ by
2021-04-19 04:55:36 +0000
0f477ca
Replicate the GitHub script interface for convenience by
2021-04-19 04:30:22 +0000
8f50291
Add --git-urls and --gitgud-complete-items by
2021-04-19 04:29:57 +0000
e3b77bf
Ignore TLS issues by
2021-04-01 07:20:41 +0000
d915a18
Pass through datetime, math, re, and time to --pyfilter by
2021-04-01 07:18:51 +0000
5486b6a
Add --grafana-eta by
2021-04-01 06:58:05 +0000
885001a
Fix --no-table help text (leftover from pre-pure-Python) by
2021-04-01 06:57:44 +0000
fd8d502
Nit: replace double quotes with single quotes by
2021-04-01 06:56:25 +0000
ff096bf
Fix display of zero timestamps by
2021-04-01 06:52:29 +0000
db3e79b
Print repository URLs instead of names by
2021-04-01 03:49:23 +0000
22744fe
Add script for listing repos of a user or group on GitLab.com by
2021-04-01 03:46:31 +0000
5678b58
Add script for requeueing skipped URLs due to too many failed attempts on wpull crawls by
2021-03-29 23:43:00 +0000
f05a8a7
Clean up wpull DB commands by
2021-03-29 23:16:40 +0000
cbebafe
Colourise sha1sum output by
2021-03-08 01:46:21 +0000
18a3305
Fix handling of filenames with spaces and ampersands by
2021-03-08 01:45:34 +0000
788b257
Handle more domains and case variations by
2021-02-28 03:30:45 +0000
36aa2e8
Add archivebot-log-extract-ignores by
2021-02-21 04:40:41 +0000
5b731fb
Fix compatibility with wpull 2.x by
2021-02-09 05:09:38 +0000
743e058
Fix confusing error message when lxml is not installed by
2021-02-09 05:06:38 +0000
491a80a
Add warc-tiny scrape command for parsing HTTP responses using wpull and extracting links by
2021-02-09 03:35:13 +0000
fd2728f
Add archivebot-irccloud-paste by
2021-02-04 23:45:30 +0000
4eff3c3
Refactor, strip query/fragment by
2021-01-15 17:23:46 +0000
821cacf
Add --help by
2021-01-15 17:23:30 +0000
caffeba
Add parent-urls by
2021-01-15 17:09:25 +0000
77ec76b
Add --urls and --nodl options by
2021-01-09 16:51:47 +0000
06cf71f
Fix gofile.io download: getServer is not used by the website anymore, and getUpload no longer returns the MD5 by
2021-01-09 16:49:42 +0000
bff1490
Add github-list-repos by
2020-12-22 01:07:15 +0000
bf695d6
Fix channel URLs by
2020-12-21 16:49:17 +0000
dde4464
Cover two more rare URLs by
2020-11-26 04:44:37 +0000
bbf2d2c
Be more lenient regarding slashes to catch things with collapsed URLs in paths etc. by
2020-11-26 04:42:35 +0000
362f66e
Handle youtube-nocookie.com and fix removenonyt mode not recognising CC domains by
2020-11-26 04:05:01 +0000
81e2b4b
Refine patterns by
2020-11-26 03:43:07 +0000
9974d46
Stop trying to rewrite patterns for percent encoding by
2020-11-26 03:48:59 +0000
0ee83bc
Refactor by
2020-11-26 02:49:20 +0000
b66260c
Add youtube-extract by
2020-11-25 22:07:35 +0000
d82dff8
Add ETA column by
2020-11-21 03:40:13 +0000
01274e4
Prevent constantly moving bytes around for better performance on large chunked records by
2020-11-11 01:50:50 +0000
77d9f61
Colourise output by
2020-11-07 00:32:51 +0000
6512669
Refactor and compare file list as well by
2020-11-07 00:27:38 +0000
8e0cb30
Add atdash mode by
2020-10-20 17:29:58 +0000
5fe595d
Record wrapper script in meta WARC as well by
2020-09-26 19:32:13 +0000
c1def0e
Fix S3_WITH_LIST_URLS being defined (but empty) when --with-list-urls is not used by
2020-09-26 18:45:34 +0000
398cbfd
Add s3-bucket-list-qwarc, rewritten s3-bucket-list on top of qwarc by
2020-09-26 17:41:19 +0000
80084e0
Another alternative and performance/memory comparison by
2020-09-25 18:50:25 +0000
6a288a6
Use grep instead, which is faster but uses more memory by
2020-09-25 17:45:45 +0000
4d274e6
Add dedupe by
2020-09-25 17:30:43 +0000
a4af8e6
Add IE6 UA by
2020-09-20 02:14:24 +0000
ac27743
Add Googlebot UA by
2020-09-20 02:06:14 +0000
0181e53
Treat NXDOMAIN and no A/AAAA record errors as ok by
2020-09-11 16:35:19 +0000
41c2a9d
Add support for alternative xmlns by
2020-09-11 01:36:56 +0000
830e9db
Treat redirects as successful retrievals by
2020-09-11 01:31:36 +0000
7a999c9
Ignore redirects by
2020-09-11 01:16:17 +0000
579d589
Add a script to extract errors from wpull 2.x logs by
2020-09-11 01:06:43 +0000
d60948e
Verbosity by
2020-08-18 18:02:50 +0000
a9a4792
Fix server validation by
2020-08-18 18:01:53 +0000
57e2e26
Support multi-file uploads by
2020-08-18 17:59:09 +0000
02c967f
Add gofile.io download script by
2020-08-18 17:43:16 +0000
a83d28d
Add WARC/1.1 support by
2020-07-16 00:46:49 +0000
ba2f7db
Merge warc-peek repository into little-things by
2020-07-15 22:04:37 +0000
79fc113
Merge kill-wpull-connections repository into little-things by
2020-07-15 19:59:36 +0000
b4bb9ba
Switch to HTTPS by
2020-07-15 16:07:47 +0000
9f3c7b3
Support negative filter values for date columns as relative to the current datetime by
2020-07-09 18:49:09 +0000
c7151ef
Add script for checking whether a file on transfer.notkiska.pw was archived correctly with AB by
2020-07-04 01:40:00 +0000
4c90bac
Shield values in colons with angled brackets by
2020-07-02 02:01:30 +0000
f51adcc
Add --meta mode for dump-responses which prefixes each line with information about the file and record by
2020-07-02 01:41:18 +0000
9cc1f41
Pass the filename in NewFile events by
2020-07-02 01:40:39 +0000
a38efc3
Introduce a way to provide additional arguments to processors by
2020-07-02 01:40:14 +0000
ecf6678
Fix deb file URLs by
2020-06-16 19:14:06 +0000
1e5fbed
Fix log message going to stdout by
2020-06-16 18:37:00 +0000
3a2cea1
Add script for recursing over Debian repos by
2020-06-16 00:05:56 +0000
4f12f73
Refactor filtering and add --pyfilter by
2020-06-04 18:15:33 +0000
785f13e
Add --replace-{concurrency,delay} by
2020-06-04 17:53:10 +0000
5067c40
Fix errors on invalid filter or sort values by
2020-06-04 17:42:17 +0000
7e2befc
+x by
2020-05-31 15:23:24 +0000
49376db
Decode HTTP request bodies by
2020-05-28 22:32:37 +0000
171ca42
Disable truncation when stdout is not a terminal by
2020-03-29 01:52:40 +0000
9763370
Truncate URLs by default to fit the terminal width by
2020-03-29 01:47:40 +0000
1bc1487
Add script for extracting remaining wpull 2 queue by
2020-03-11 21:02:55 +0000
d3c0035
Make con-d-commands mode an alias of the corresponding format by
2020-03-11 20:58:41 +0000
b4fe6dd
Reorder arguments to make more sense by
2020-03-11 20:51:17 +0000
c547fc6
Add format mode by
2020-03-11 20:49:44 +0000
7fde199
Add --mode con-d-commands, replace --dashboard-regex with --mode dashboard-regex by
2020-03-11 02:23:36 +0000
0a6f83b
Add --dashboard-regex by
2020-03-10 23:04:58 +0000
05ed1e0
Add more columns by
2020-03-07 22:48:32 +0000
3f7d84a
Refactor the Bash/Python abomination into a pure Python script so I get to keep my sanity while editing by
2020-03-07 22:19:35 +0000
cf879c8
Refactor into something more flexible to the addition of new columns by
2020-03-07 19:02:50 +0000
d7bd8de
Add --dates option by
2020-02-28 20:27:16 +0000
236278f
Fix decoding of links on Facebook profiles by
2020-02-24 03:10:57 +0000
d7a07d1
Normalise domain name to lower-case before further processing by
2020-02-24 01:18:18 +0000
e655080
Add support for Facebook /pages/category/Category/Name-ID URLs by
2020-02-19 17:53:45 +0000
daa1a95
Proper URL decoding by
2020-02-18 16:53:41 +0000
1bee1cd
Add support for Facebook /people/Name/ID URLs by
2020-02-17 13:26:46 +0000
00107c0
Add support for YouTube /c/X URLs by
2020-02-17 13:25:19 +0000
b59b820
Add support for wiki list entries with options by
2020-02-12 03:12:30 +0000
d5953ca
Use old Opera UA for Twitter to force the old design by
2020-02-10 18:24:27 +0000
1fa57d4
Fix extraction on Wix sites from JSON inside a data attribute by
2020-02-10 18:23:36 +0000
4a74216
Suppress output if there are no matched jobs by
2020-02-10 01:18:11 +0000
fe72d57
Add filtering based on substrings anywhere in the string and on regex by
2020-02-10 00:47:19 +0000
cf30a53
Add case-insensitive filtering by
2020-02-10 00:42:36 +0000
711e444
Highlight jobs that have been inactive for over 6 hours by
2020-02-02 05:28:17 +0000