The little things give you away... A collection of various small helper stuff
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
JustAnotherArchivist c50a8fd796 Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed пре 2 година
LICENSE Initial commit пре 5 година
README.md Initial commit пре 5 година
alphabetseq Swap syntaxes пре 2 година
archivebot-blogspot Fix HTTPS handling пре 5 година
archivebot-high-memory Support python3 in any directory instead of just /usr/bin пре 4 година
archivebot-irccloud-paste Add archivebot-irccloud-paste пре 3 година
archivebot-jobid-calculation More snscrape helper tools пре 5 година
archivebot-jobs Pass through datetime, math, re, and time to --pyfilter пре 3 година
archivebot-list-stuck-requests Fix line endings пре 5 година
archivebot-log-extract-ignores Add archivebot-log-extract-ignores пре 3 година
archivebot-monitor-job-queue First set of little things пре 5 година
archivebot-youtube Add helper for AB/chromebot-ing YouTube channels and users пре 5 година
azure-storage-list Add --jsonl option пре 2 година
b64grep Add b64grep пре 2 година
bing-scrape Add Bing, Reddit/Pushshift, and FoolFuuka scrapers пре 5 година
bugzilla-url-list Add Bugzilla URL list generator пре 2 година
combine-by-prefix Add combine-by-prefix пре 2 година
curl-ua Add IE6 UA пре 3 година
deb-repo-urls Fix deb file URLs пре 3 година
dedupe Another alternative and performance/memory comparison пре 3 година
europarl-meps-collect Add script for scraping MEP links from europarl.europa.eu пре 5 година
foolfuuka-search Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up пре 5 година
format-size Split out size formatting пре 5 година
fos-ftp-upload First set of little things пре 5 година
get-crx4chrome-urls First set of little things пре 5 година
github-list-repos Fix org repo listing on new design/site structure пре 2 година
gitlab-list-repos Add support for other instances and full-instance listing пре 2 година
gofile.io-dl Add support for password-protected folders пре 2 година
ia-cdx-search Fix crash on an empty response пре 2 година
ia-derive Add script to queue derive on IA пре 5 година
ia-files-xml-to-jsonl Guarantee stable output order пре 3 година
ia-upload-progress Proper script for tracking size of uploaded data пре 5 година
ia-verify-file Add a timeout to prevent potentially indefinite blocking пре 2 година
ia-wait-item-tasks Add ia-wait-item-tasks пре 2 година
iasha1check Colourise sha1sum output пре 3 година
ix.io-upload Allow overriding the "remote filename" пре 5 година
kill-wpull-connections Merge kill-wpull-connections repository into little-things пре 3 година
killcx-all-https First set of little things пре 5 година
mastodon-enumerate-users Enumerate users on a Mastodon instance пре 5 година
mastodon-outdated Finding outdated Mastodon instances пре 5 година
parent-urls Refactor, strip query/fragment пре 3 година
pipelines-launch-in-tmux-windows First set of little things пре 5 година
pipelines-monitor-tmux-wget-outcomes Monitor how a pipeline's wget processes are faring пре 5 година
pipelines-stop-gracefully First set of little things пре 5 година
reddit-pushshift-search Add Bing, Reddit/Pushshift, and FoolFuuka scrapers пре 5 година
run-every-five-minutes First set of little things пре 5 година
s3-bucket-list Ignore TLS issues пре 3 година
s3-bucket-list-qwarc Record wrapper script in meta WARC as well пре 3 година
snscrape-extract Add support for Twitter hashtag extraction пре 4 година
snscrape-facebook-user Silence by default пре 5 година
snscrape-instagram-user Silence by default пре 5 година
snscrape-prepare-commands Add support for Twitter hashtag extraction пре 4 година
snscrape-tmux Update tmux session commands пре 4 година
snscrape-twitter-filter Filter Twitter hashtag scrapes based on account scrapes пре 5 година
snscrape-twitter-hashtag Extract external links from Twitter пре 5 година
snscrape-twitter-user Extract external links from Twitter пре 5 година
snscrape-upload Print Instagram ignore immediately after upload instead of at the end пре 5 година
snscrape-vk-user Silence by default пре 5 година
snscrape-wiki-transfer-merge Helper tools for snscrape and the wiki pages пре 5 година
social-media-extract-profile-link Fix decoding of links on Facebook profiles пре 4 година
sum-sizes Add sum-sizes пре 2 година
tar-many-files-progress First set of little things пре 5 година
tcp-closer Add tcp-closer command пре 5 година
transfer.archivete.am-upload Handle HTTP/2 lowercase headers пре 3 година
transfer.notkiska.pw-check-ia Switch to HTTPS пре 3 година
uniqify Add uniqify пре 5 година
url-normalise Normalise domain name to lower-case before further processing пре 4 година
warc-peek Add WARC/1.1 support пре 3 година
warc-size Split out size formatting пре 5 година
warc-tiny Fix compatibility with wpull 2.x пре 3 година
website-extract-social-media Add support for Facebook /pages/category/Category/Name-ID URLs пре 4 година
wget-spider-estimate-size First set of little things пре 5 година
wiki-list-to-main Add ArchiveBot wiki list helper пре 5 година
wiki-recursive-extract-normalise Fix deduplication within each section processing пре 4 година
wiki-sections-sort Add wiki-sections-sort пре 4 година
wiki-website-extract-social-media Add script for automatic social media discovery пре 4 година
wpull1-parallel-progress-monitor First set of little things пре 5 година
wpull1-progress-monitor First set of little things пре 5 година
wpull2-extract-remaining Clean up wpull DB commands пре 3 година
wpull2-log-extract-errors Treat NXDOMAIN and no A/AAAA record errors as ok пре 3 година
wpull2-requeue Print number of modified records on requeueing пре 2 година
wpull2-url-origin Clean up wpull DB commands пре 3 година
youtube-channel-list.py Add YouTube channel listing script пре 2 година
youtube-extract Handle ancient /?v= URLs пре 2 година
youtube-filter-autogen-channels Add youtube-filter-autogen-channels пре 4 година
zstdwarccat Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed пре 2 година

README.md

Over the past few years, I’ve written and accumulated a number of useful little things to help with archival-related tasks. This repository collects them. I hope someone finds some of them useful.

License (applies to all programs in this repository)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.