The little things give you away... A collection of various small helper stuff
Du kannst nicht mehr als 25 Themen auswählen Themen müssen entweder mit einem Buchstaben oder einer Ziffer beginnen. Sie können Bindestriche („-“) enthalten und bis zu 35 Zeichen lang sein.
 
 
 
JustAnotherArchivist c50a8fd796 Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed vor 2 Jahren
LICENSE Initial commit vor 5 Jahren
README.md Initial commit vor 5 Jahren
alphabetseq Swap syntaxes vor 2 Jahren
archivebot-blogspot Fix HTTPS handling vor 5 Jahren
archivebot-high-memory Support python3 in any directory instead of just /usr/bin vor 4 Jahren
archivebot-irccloud-paste Add archivebot-irccloud-paste vor 3 Jahren
archivebot-jobid-calculation More snscrape helper tools vor 5 Jahren
archivebot-jobs Pass through datetime, math, re, and time to --pyfilter vor 3 Jahren
archivebot-list-stuck-requests Fix line endings vor 5 Jahren
archivebot-log-extract-ignores Add archivebot-log-extract-ignores vor 3 Jahren
archivebot-monitor-job-queue First set of little things vor 5 Jahren
archivebot-youtube Add helper for AB/chromebot-ing YouTube channels and users vor 5 Jahren
azure-storage-list Add --jsonl option vor 2 Jahren
b64grep Add b64grep vor 2 Jahren
bing-scrape Add Bing, Reddit/Pushshift, and FoolFuuka scrapers vor 5 Jahren
bugzilla-url-list Add Bugzilla URL list generator vor 2 Jahren
combine-by-prefix Add combine-by-prefix vor 2 Jahren
curl-ua Add IE6 UA vor 3 Jahren
deb-repo-urls Fix deb file URLs vor 3 Jahren
dedupe Another alternative and performance/memory comparison vor 3 Jahren
europarl-meps-collect Add script for scraping MEP links from europarl.europa.eu vor 5 Jahren
foolfuuka-search Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up vor 5 Jahren
format-size Split out size formatting vor 5 Jahren
fos-ftp-upload First set of little things vor 5 Jahren
get-crx4chrome-urls First set of little things vor 5 Jahren
github-list-repos Fix org repo listing on new design/site structure vor 2 Jahren
gitlab-list-repos Add support for other instances and full-instance listing vor 2 Jahren
gofile.io-dl Add support for password-protected folders vor 2 Jahren
ia-cdx-search Fix crash on an empty response vor 2 Jahren
ia-derive Add script to queue derive on IA vor 5 Jahren
ia-files-xml-to-jsonl Guarantee stable output order vor 3 Jahren
ia-upload-progress Proper script for tracking size of uploaded data vor 5 Jahren
ia-verify-file Add a timeout to prevent potentially indefinite blocking vor 2 Jahren
ia-wait-item-tasks Add ia-wait-item-tasks vor 2 Jahren
iasha1check Colourise sha1sum output vor 3 Jahren
ix.io-upload Allow overriding the "remote filename" vor 5 Jahren
kill-wpull-connections Merge kill-wpull-connections repository into little-things vor 3 Jahren
killcx-all-https First set of little things vor 5 Jahren
mastodon-enumerate-users Enumerate users on a Mastodon instance vor 5 Jahren
mastodon-outdated Finding outdated Mastodon instances vor 5 Jahren
parent-urls Refactor, strip query/fragment vor 3 Jahren
pipelines-launch-in-tmux-windows First set of little things vor 5 Jahren
pipelines-monitor-tmux-wget-outcomes Monitor how a pipeline's wget processes are faring vor 5 Jahren
pipelines-stop-gracefully First set of little things vor 5 Jahren
reddit-pushshift-search Add Bing, Reddit/Pushshift, and FoolFuuka scrapers vor 5 Jahren
run-every-five-minutes First set of little things vor 5 Jahren
s3-bucket-list Ignore TLS issues vor 3 Jahren
s3-bucket-list-qwarc Record wrapper script in meta WARC as well vor 3 Jahren
snscrape-extract Add support for Twitter hashtag extraction vor 4 Jahren
snscrape-facebook-user Silence by default vor 5 Jahren
snscrape-instagram-user Silence by default vor 5 Jahren
snscrape-prepare-commands Add support for Twitter hashtag extraction vor 4 Jahren
snscrape-tmux Update tmux session commands vor 4 Jahren
snscrape-twitter-filter Filter Twitter hashtag scrapes based on account scrapes vor 5 Jahren
snscrape-twitter-hashtag Extract external links from Twitter vor 5 Jahren
snscrape-twitter-user Extract external links from Twitter vor 5 Jahren
snscrape-upload Print Instagram ignore immediately after upload instead of at the end vor 5 Jahren
snscrape-vk-user Silence by default vor 5 Jahren
snscrape-wiki-transfer-merge Helper tools for snscrape and the wiki pages vor 5 Jahren
social-media-extract-profile-link Fix decoding of links on Facebook profiles vor 4 Jahren
sum-sizes Add sum-sizes vor 2 Jahren
tar-many-files-progress First set of little things vor 5 Jahren
tcp-closer Add tcp-closer command vor 5 Jahren
transfer.archivete.am-upload Handle HTTP/2 lowercase headers vor 3 Jahren
transfer.notkiska.pw-check-ia Switch to HTTPS vor 3 Jahren
uniqify Add uniqify vor 5 Jahren
url-normalise Normalise domain name to lower-case before further processing vor 4 Jahren
warc-peek Add WARC/1.1 support vor 3 Jahren
warc-size Split out size formatting vor 5 Jahren
warc-tiny Fix compatibility with wpull 2.x vor 3 Jahren
website-extract-social-media Add support for Facebook /pages/category/Category/Name-ID URLs vor 4 Jahren
wget-spider-estimate-size First set of little things vor 5 Jahren
wiki-list-to-main Add ArchiveBot wiki list helper vor 5 Jahren
wiki-recursive-extract-normalise Fix deduplication within each section processing vor 4 Jahren
wiki-sections-sort Add wiki-sections-sort vor 4 Jahren
wiki-website-extract-social-media Add script for automatic social media discovery vor 4 Jahren
wpull1-parallel-progress-monitor First set of little things vor 5 Jahren
wpull1-progress-monitor First set of little things vor 5 Jahren
wpull2-extract-remaining Clean up wpull DB commands vor 3 Jahren
wpull2-log-extract-errors Treat NXDOMAIN and no A/AAAA record errors as ok vor 3 Jahren
wpull2-requeue Print number of modified records on requeueing vor 2 Jahren
wpull2-url-origin Clean up wpull DB commands vor 3 Jahren
youtube-channel-list.py Add YouTube channel listing script vor 2 Jahren
youtube-extract Handle ancient /?v= URLs vor 2 Jahren
youtube-filter-autogen-channels Add youtube-filter-autogen-channels vor 4 Jahren
zstdwarccat Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed vor 2 Jahren

README.md

Over the past few years, I’ve written and accumulated a number of useful little things to help with archival-related tasks. This repository collects them. I hope someone finds some of them useful.

License (applies to all programs in this repository)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.