The little things give you away... A collection of various small helper stuff
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
JustAnotherArchivist c50a8fd796 Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed 2 年之前
LICENSE Initial commit 5 年之前
README.md Initial commit 5 年之前
alphabetseq Swap syntaxes 2 年之前
archivebot-blogspot Fix HTTPS handling 5 年之前
archivebot-high-memory Support python3 in any directory instead of just /usr/bin 4 年之前
archivebot-irccloud-paste Add archivebot-irccloud-paste 3 年之前
archivebot-jobid-calculation More snscrape helper tools 5 年之前
archivebot-jobs Pass through datetime, math, re, and time to --pyfilter 3 年之前
archivebot-list-stuck-requests Fix line endings 5 年之前
archivebot-log-extract-ignores Add archivebot-log-extract-ignores 3 年之前
archivebot-monitor-job-queue First set of little things 5 年之前
archivebot-youtube Add helper for AB/chromebot-ing YouTube channels and users 5 年之前
azure-storage-list Add --jsonl option 2 年之前
b64grep Add b64grep 2 年之前
bing-scrape Add Bing, Reddit/Pushshift, and FoolFuuka scrapers 5 年之前
bugzilla-url-list Add Bugzilla URL list generator 2 年之前
combine-by-prefix Add combine-by-prefix 2 年之前
curl-ua Add IE6 UA 3 年之前
deb-repo-urls Fix deb file URLs 3 年之前
dedupe Another alternative and performance/memory comparison 3 年之前
europarl-meps-collect Add script for scraping MEP links from europarl.europa.eu 5 年之前
foolfuuka-search Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up 5 年之前
format-size Split out size formatting 5 年之前
fos-ftp-upload First set of little things 5 年之前
get-crx4chrome-urls First set of little things 5 年之前
github-list-repos Fix org repo listing on new design/site structure 2 年之前
gitlab-list-repos Add support for other instances and full-instance listing 2 年之前
gofile.io-dl Add support for password-protected folders 2 年之前
ia-cdx-search Fix crash on an empty response 2 年之前
ia-derive Add script to queue derive on IA 5 年之前
ia-files-xml-to-jsonl Guarantee stable output order 3 年之前
ia-upload-progress Proper script for tracking size of uploaded data 5 年之前
ia-verify-file Add a timeout to prevent potentially indefinite blocking 2 年之前
ia-wait-item-tasks Add ia-wait-item-tasks 2 年之前
iasha1check Colourise sha1sum output 3 年之前
ix.io-upload Allow overriding the "remote filename" 5 年之前
kill-wpull-connections Merge kill-wpull-connections repository into little-things 3 年之前
killcx-all-https First set of little things 5 年之前
mastodon-enumerate-users Enumerate users on a Mastodon instance 5 年之前
mastodon-outdated Finding outdated Mastodon instances 5 年之前
parent-urls Refactor, strip query/fragment 3 年之前
pipelines-launch-in-tmux-windows First set of little things 5 年之前
pipelines-monitor-tmux-wget-outcomes Monitor how a pipeline's wget processes are faring 5 年之前
pipelines-stop-gracefully First set of little things 5 年之前
reddit-pushshift-search Add Bing, Reddit/Pushshift, and FoolFuuka scrapers 5 年之前
run-every-five-minutes First set of little things 5 年之前
s3-bucket-list Ignore TLS issues 3 年之前
s3-bucket-list-qwarc Record wrapper script in meta WARC as well 3 年之前
snscrape-extract Add support for Twitter hashtag extraction 4 年之前
snscrape-facebook-user Silence by default 5 年之前
snscrape-instagram-user Silence by default 5 年之前
snscrape-prepare-commands Add support for Twitter hashtag extraction 4 年之前
snscrape-tmux Update tmux session commands 4 年之前
snscrape-twitter-filter Filter Twitter hashtag scrapes based on account scrapes 5 年之前
snscrape-twitter-hashtag Extract external links from Twitter 5 年之前
snscrape-twitter-user Extract external links from Twitter 5 年之前
snscrape-upload Print Instagram ignore immediately after upload instead of at the end 5 年之前
snscrape-vk-user Silence by default 5 年之前
snscrape-wiki-transfer-merge Helper tools for snscrape and the wiki pages 5 年之前
social-media-extract-profile-link Fix decoding of links on Facebook profiles 4 年之前
sum-sizes Add sum-sizes 2 年之前
tar-many-files-progress First set of little things 5 年之前
tcp-closer Add tcp-closer command 5 年之前
transfer.archivete.am-upload Handle HTTP/2 lowercase headers 3 年之前
transfer.notkiska.pw-check-ia Switch to HTTPS 3 年之前
uniqify Add uniqify 5 年之前
url-normalise Normalise domain name to lower-case before further processing 4 年之前
warc-peek Add WARC/1.1 support 3 年之前
warc-size Split out size formatting 5 年之前
warc-tiny Fix compatibility with wpull 2.x 3 年之前
website-extract-social-media Add support for Facebook /pages/category/Category/Name-ID URLs 4 年之前
wget-spider-estimate-size First set of little things 5 年之前
wiki-list-to-main Add ArchiveBot wiki list helper 5 年之前
wiki-recursive-extract-normalise Fix deduplication within each section processing 4 年之前
wiki-sections-sort Add wiki-sections-sort 4 年之前
wiki-website-extract-social-media Add script for automatic social media discovery 4 年之前
wpull1-parallel-progress-monitor First set of little things 5 年之前
wpull1-progress-monitor First set of little things 5 年之前
wpull2-extract-remaining Clean up wpull DB commands 3 年之前
wpull2-log-extract-errors Treat NXDOMAIN and no A/AAAA record errors as ok 3 年之前
wpull2-requeue Print number of modified records on requeueing 2 年之前
wpull2-url-origin Clean up wpull DB commands 3 年之前
youtube-channel-list.py Add YouTube channel listing script 2 年之前
youtube-extract Handle ancient /?v= URLs 2 年之前
youtube-filter-autogen-channels Add youtube-filter-autogen-channels 4 年之前
zstdwarccat Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed 2 年之前

README.md

Over the past few years, I’ve written and accumulated a number of useful little things to help with archival-related tasks. This repository collects them. I hope someone finds some of them useful.

License (applies to all programs in this repository)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.