The little things give you away... A collection of various small helper stuff
Nevar pievienot vairāk kā 25 tēmas Tēmai ir jāsākas ar burtu vai ciparu, tā var saturēt domu zīmes ('-') un var būt līdz 35 simboliem gara.
 
 
 
JustAnotherArchivist c50a8fd796 Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed pirms 2 gadiem
LICENSE Initial commit pirms 5 gadiem
README.md Initial commit pirms 5 gadiem
alphabetseq Swap syntaxes pirms 2 gadiem
archivebot-blogspot Fix HTTPS handling pirms 5 gadiem
archivebot-high-memory Support python3 in any directory instead of just /usr/bin pirms 4 gadiem
archivebot-irccloud-paste Add archivebot-irccloud-paste pirms 3 gadiem
archivebot-jobid-calculation More snscrape helper tools pirms 5 gadiem
archivebot-jobs Pass through datetime, math, re, and time to --pyfilter pirms 3 gadiem
archivebot-list-stuck-requests Fix line endings pirms 5 gadiem
archivebot-log-extract-ignores Add archivebot-log-extract-ignores pirms 3 gadiem
archivebot-monitor-job-queue First set of little things pirms 5 gadiem
archivebot-youtube Add helper for AB/chromebot-ing YouTube channels and users pirms 5 gadiem
azure-storage-list Add --jsonl option pirms 2 gadiem
b64grep Add b64grep pirms 2 gadiem
bing-scrape Add Bing, Reddit/Pushshift, and FoolFuuka scrapers pirms 5 gadiem
bugzilla-url-list Add Bugzilla URL list generator pirms 2 gadiem
combine-by-prefix Add combine-by-prefix pirms 2 gadiem
curl-ua Add IE6 UA pirms 3 gadiem
deb-repo-urls Fix deb file URLs pirms 3 gadiem
dedupe Another alternative and performance/memory comparison pirms 3 gadiem
europarl-meps-collect Add script for scraping MEP links from europarl.europa.eu pirms 5 gadiem
foolfuuka-search Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up pirms 5 gadiem
format-size Split out size formatting pirms 5 gadiem
fos-ftp-upload First set of little things pirms 5 gadiem
get-crx4chrome-urls First set of little things pirms 5 gadiem
github-list-repos Fix org repo listing on new design/site structure pirms 2 gadiem
gitlab-list-repos Add support for other instances and full-instance listing pirms 2 gadiem
gofile.io-dl Add support for password-protected folders pirms 2 gadiem
ia-cdx-search Fix crash on an empty response pirms 2 gadiem
ia-derive Add script to queue derive on IA pirms 5 gadiem
ia-files-xml-to-jsonl Guarantee stable output order pirms 3 gadiem
ia-upload-progress Proper script for tracking size of uploaded data pirms 5 gadiem
ia-verify-file Add a timeout to prevent potentially indefinite blocking pirms 2 gadiem
ia-wait-item-tasks Add ia-wait-item-tasks pirms 2 gadiem
iasha1check Colourise sha1sum output pirms 3 gadiem
ix.io-upload Allow overriding the "remote filename" pirms 5 gadiem
kill-wpull-connections Merge kill-wpull-connections repository into little-things pirms 3 gadiem
killcx-all-https First set of little things pirms 5 gadiem
mastodon-enumerate-users Enumerate users on a Mastodon instance pirms 5 gadiem
mastodon-outdated Finding outdated Mastodon instances pirms 5 gadiem
parent-urls Refactor, strip query/fragment pirms 3 gadiem
pipelines-launch-in-tmux-windows First set of little things pirms 5 gadiem
pipelines-monitor-tmux-wget-outcomes Monitor how a pipeline's wget processes are faring pirms 5 gadiem
pipelines-stop-gracefully First set of little things pirms 5 gadiem
reddit-pushshift-search Add Bing, Reddit/Pushshift, and FoolFuuka scrapers pirms 5 gadiem
run-every-five-minutes First set of little things pirms 5 gadiem
s3-bucket-list Ignore TLS issues pirms 3 gadiem
s3-bucket-list-qwarc Record wrapper script in meta WARC as well pirms 3 gadiem
snscrape-extract Add support for Twitter hashtag extraction pirms 4 gadiem
snscrape-facebook-user Silence by default pirms 5 gadiem
snscrape-instagram-user Silence by default pirms 5 gadiem
snscrape-prepare-commands Add support for Twitter hashtag extraction pirms 4 gadiem
snscrape-tmux Update tmux session commands pirms 4 gadiem
snscrape-twitter-filter Filter Twitter hashtag scrapes based on account scrapes pirms 5 gadiem
snscrape-twitter-hashtag Extract external links from Twitter pirms 5 gadiem
snscrape-twitter-user Extract external links from Twitter pirms 5 gadiem
snscrape-upload Print Instagram ignore immediately after upload instead of at the end pirms 5 gadiem
snscrape-vk-user Silence by default pirms 5 gadiem
snscrape-wiki-transfer-merge Helper tools for snscrape and the wiki pages pirms 5 gadiem
social-media-extract-profile-link Fix decoding of links on Facebook profiles pirms 4 gadiem
sum-sizes Add sum-sizes pirms 2 gadiem
tar-many-files-progress First set of little things pirms 5 gadiem
tcp-closer Add tcp-closer command pirms 5 gadiem
transfer.archivete.am-upload Handle HTTP/2 lowercase headers pirms 3 gadiem
transfer.notkiska.pw-check-ia Switch to HTTPS pirms 3 gadiem
uniqify Add uniqify pirms 5 gadiem
url-normalise Normalise domain name to lower-case before further processing pirms 4 gadiem
warc-peek Add WARC/1.1 support pirms 3 gadiem
warc-size Split out size formatting pirms 5 gadiem
warc-tiny Fix compatibility with wpull 2.x pirms 3 gadiem
website-extract-social-media Add support for Facebook /pages/category/Category/Name-ID URLs pirms 4 gadiem
wget-spider-estimate-size First set of little things pirms 5 gadiem
wiki-list-to-main Add ArchiveBot wiki list helper pirms 5 gadiem
wiki-recursive-extract-normalise Fix deduplication within each section processing pirms 4 gadiem
wiki-sections-sort Add wiki-sections-sort pirms 4 gadiem
wiki-website-extract-social-media Add script for automatic social media discovery pirms 4 gadiem
wpull1-parallel-progress-monitor First set of little things pirms 5 gadiem
wpull1-progress-monitor First set of little things pirms 5 gadiem
wpull2-extract-remaining Clean up wpull DB commands pirms 3 gadiem
wpull2-log-extract-errors Treat NXDOMAIN and no A/AAAA record errors as ok pirms 3 gadiem
wpull2-requeue Print number of modified records on requeueing pirms 2 gadiem
wpull2-url-origin Clean up wpull DB commands pirms 3 gadiem
youtube-channel-list.py Add YouTube channel listing script pirms 2 gadiem
youtube-extract Handle ancient /?v= URLs pirms 2 gadiem
youtube-filter-autogen-channels Add youtube-filter-autogen-channels pirms 4 gadiem
zstdwarccat Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed pirms 2 gadiem

README.md

Over the past few years, I’ve written and accumulated a number of useful little things to help with archival-related tasks. This repository collects them. I hope someone finds some of them useful.

License (applies to all programs in this repository)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.