The little things give you away... A collection of various small helper stuff
選択できるのは25トピックまでです。 トピックは、先頭が英数字で、英数字とダッシュ('-')を使用した35文字以内のものにしてください。
JustAnotherArchivist ca206e162e Disable deletion by default 1週間前
.gitignore Add infrastructure for simple C-based tools 2年前
.make-and-exec Warnings are bad, mmkay? 11ヶ月前
.urldecode-test Get rid of Makefile for more control; add proper debug build support 11ヶ月前
.warc-dump-responses-test Add test for warc-dump-responses 5ヶ月前
.youtube-extract-rapid-test Get rid of Makefile for more control; add proper debug build support 11ヶ月前
LICENSE Initial commit 5年前
README.md Initial commit 5年前
alphabetseq Swap syntaxes 2年前
archivebot-blogspot Fix HTTPS handling 5年前
archivebot-compress-db Add archivebot-compress-db 4ヶ月前
archivebot-fix-queue-counters Fix TypeError 1年前
archivebot-high-resources Replace archivebot-high-memory with more capable archivebot-high-resources 4ヶ月前
archivebot-irccloud-paste Add archivebot-irccloud-paste 3年前
archivebot-jobid-calculation More snscrape helper tools 5年前
archivebot-jobs Not-so-new new ArchiveBot domain 1年前
archivebot-list-stuck-requests Fix line endings 5年前
archivebot-log-extract-ignores Add archivebot-log-extract-ignores 3年前
archivebot-monitor-job-queue First set of little things 5年前
archivebot-pipelines-count-jobs Add archivebot-pipelines-count-jobs 5ヶ月前
archivebot-youtube Add helper for AB/chromebot-ing YouTube channels and users 5年前
at-tracker-sample-user-item-size Add at-tracker-sample-user-item-size 2年前
azure-storage-list Add --jsonl option 2年前
b64grep Add b64grep 2年前
base64url Add base64url 2年前
bencode2json Add bencode2json 1年前
bing-scrape Fix extraction of search results 6ヶ月前
bugzilla-url-list Add Bugzilla URL list generator 2年前
cdx-chunk Add cdx-chunk 2年前
cloudflare-email-decode Add cloudflare-email-decode 2年前
combine-by-prefix Add combine-by-prefix 2年前
curl-ia Add header mode (e.g. for tasks API) 1年前
curl-ua Add IE6 UA 3年前
deb-repo-urls Fix deb file URLs 3年前
dedupe Another alternative and performance/memory comparison 3年前
dir-to-ia Disable deletion by default 1週間前
europarl-meps-collect Add script for scraping MEP links from europarl.europa.eu 4年前
extract-urls-for-archiveteam-projects Add wpull2-extract-ignored-offsite and extract-urls-for-archiveteam-projects 4ヶ月前
foolfuuka-search Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up 5年前
format-size Split out size formatting 5年前
fos-ftp-upload First set of little things 5年前
get-crx4chrome-urls First set of little things 5年前
github-list-repos Fix org listings not including archived repos 1週間前
gitlab-list-repos Add support for other instances and full-instance listing 2年前
gofile.io-dl Add support for password-protected folders 2年前
html-extract-stupid Handle 
 and 
 10ヶ月前
http-response-bodies Add http-response-bodies 1年前
http-response-bodies.c Fix extra LF between chunks 9ヶ月前
ia-cdx-search Fix error when no arguments are provided 1年前
ia-cdx-search-subdomains Fix URLs without a path 1年前
ia-derive Queue derives with `ia tasks` instead of this manual curl rubbish 2年前
ia-files-xml-to-jsonl Guarantee stable output order 2年前
ia-upload-progress Proper script for tracking size of uploaded data 5年前
ia-upload-stream Fix single-part upload 1週間前
ia-verify-file Add a timeout to prevent potentially indefinite blocking 2年前
ia-wait-item-tasks Handle error tasks by exiting non-zero 11ヶ月前
iasha1check Fix output sometimes appearing after prompt 1年前
ix.io-upload Allow overriding the "remote filename" 5年前
kill-connections Handle processes with too many open connections 2年前
kill-wpull-connections Merge kill-wpull-connections repository into little-things 3年前
killcx-all-https First set of little things 5年前
mastodon-enumerate-users Enumerate users on a Mastodon instance 5年前
mastodon-outdated Finding outdated Mastodon instances 5年前
moinmoin-url-list Add moinmoin-url-list 3ヶ月前
parent-urls Refactor, strip query/fragment 3年前
pipelines-launch-in-tmux-windows First set of little things 5年前
pipelines-monitor-tmux-wget-outcomes Monitor how a pipeline's wget processes are faring 5年前
pipelines-stop-gracefully First set of little things 5年前
reddit-pushshift-search Add Bing, Reddit/Pushshift, and FoolFuuka scrapers 5年前
run-every-five-minutes First set of little things 5年前
s3-bucket-find-direct-url Add support for PermanentRedirect error responses 6ヶ月前
s3-bucket-list Enable line buffering on list URLs FD 7ヶ月前
s3-bucket-list-qwarc Add JSONL output option for S3 listing 2年前
snscrape-extract Add support for Twitter hashtag extraction 4年前
snscrape-facebook-user Silence by default 5年前
snscrape-instagram-user Silence by default 5年前
snscrape-prepare-commands Add support for Twitter hashtag extraction 4年前
snscrape-tmux Update tmux session commands 4年前
snscrape-twitter-filter Filter Twitter hashtag scrapes based on account scrapes 5年前
snscrape-twitter-hashtag Extract external links from Twitter 4年前
snscrape-twitter-user Extract external links from Twitter 4年前
snscrape-upload Print Instagram ignore immediately after upload instead of at the end 5年前
snscrape-vk-user Silence by default 5年前
snscrape-wiki-transfer-merge Helper tools for snscrape and the wiki pages 5年前
social-media-extract-profile-link Fix decoding of links on Facebook profiles 4年前
sum-sizes Avoid float roundtrip for integer values 5ヶ月前
tar-many-files-progress First set of little things 5年前
tcp-closer Add tcp-closer command 5年前
torrent-tiny Fix negative ints 1年前
transfer.archivete.am-upload Handle HTTP/2 lowercase headers 3年前
transfer.notkiska.pw-check-ia Switch to HTTPS 3年前
uniqify Add uniqify 5年前
uniqify-recent Add uniqify-recent 2ヶ月前
url-normalise Normalise domain name to lower-case before further processing 4年前
urldecode Add URL/percent decoding tool 2年前
urldecode.c Fix unused argc and argv error 11ヶ月前
urlsort Add urlsort 2年前
warc-dump-responses Add warc-dump-responses 1年前
warc-dump-responses.c Fix error when the terminating CRLFCRLF of a record is truncated 5ヶ月前
warc-peek Allow negative offsets to peek near the end of the file 1年前
warc-size Split out size formatting 5年前
warc-tiny Fix empty files being considered valid WARCs 9ヶ月前
website-extract-social-media Add support for Facebook /pages/category/Category/Name-ID URLs 4年前
wget-spider-estimate-size First set of little things 5年前
wiki-list-to-main Add ArchiveBot wiki list helper 4年前
wiki-recursive-extract-normalise Fix deduplication within each section processing 4年前
wiki-sections-sort Add wiki-sections-sort 4年前
wiki-website-extract-social-media Add script for automatic social media discovery 4年前
wpull1-parallel-progress-monitor First set of little things 5年前
wpull1-progress-monitor First set of little things 5年前
wpull2-extract-ignored Remove filtering of onsite URLs because it's unreliable 4ヶ月前
wpull2-extract-remaining Clean up wpull DB commands 3年前
wpull2-log-colourise Add wpull2-log-colourise 1年前
wpull2-log-extract-errors Treat NXDOMAIN and no A/AAAA record errors as ok 3年前
wpull2-requeue Error on unknown options 2年前
wpull2-url-origin Clean up wpull DB commands 3年前
youtube-channel-list.py Use _type instead of key check hack 1年前
youtube-extract Exclude backslashes in channel patterns 1年前
youtube-extract-rapid Add youtube-extract-rapid 2年前
youtube-extract-rapid.c Add youtube-extract-rapid 2年前
youtube-filter-autogen-channels Add youtube-filter-autogen-channels 4年前
zstdwarccat Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed 2年前

README.md

Over the past few years, I’ve written and accumulated a number of useful little things to help with archival-related tasks. This repository collects them. I hope someone finds some of them useful.

License (applies to all programs in this repository)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.