The little things give you away... A collection of various small helper stuff
25'ten fazla konu seçemezsiniz Konular bir harf veya rakamla başlamalı, kısa çizgiler ('-') içerebilir ve en fazla 35 karakter uzunluğunda olabilir.
 
 
 
JustAnotherArchivist c50a8fd796 Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed 2 yıl önce
LICENSE Initial commit 5 yıl önce
README.md Initial commit 5 yıl önce
alphabetseq Swap syntaxes 2 yıl önce
archivebot-blogspot Fix HTTPS handling 5 yıl önce
archivebot-high-memory Support python3 in any directory instead of just /usr/bin 4 yıl önce
archivebot-irccloud-paste Add archivebot-irccloud-paste 3 yıl önce
archivebot-jobid-calculation More snscrape helper tools 5 yıl önce
archivebot-jobs Pass through datetime, math, re, and time to --pyfilter 3 yıl önce
archivebot-list-stuck-requests Fix line endings 5 yıl önce
archivebot-log-extract-ignores Add archivebot-log-extract-ignores 3 yıl önce
archivebot-monitor-job-queue First set of little things 5 yıl önce
archivebot-youtube Add helper for AB/chromebot-ing YouTube channels and users 5 yıl önce
azure-storage-list Add --jsonl option 2 yıl önce
b64grep Add b64grep 2 yıl önce
bing-scrape Add Bing, Reddit/Pushshift, and FoolFuuka scrapers 5 yıl önce
bugzilla-url-list Add Bugzilla URL list generator 2 yıl önce
combine-by-prefix Add combine-by-prefix 2 yıl önce
curl-ua Add IE6 UA 3 yıl önce
deb-repo-urls Fix deb file URLs 3 yıl önce
dedupe Another alternative and performance/memory comparison 3 yıl önce
europarl-meps-collect Add script for scraping MEP links from europarl.europa.eu 5 yıl önce
foolfuuka-search Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up 5 yıl önce
format-size Split out size formatting 5 yıl önce
fos-ftp-upload First set of little things 5 yıl önce
get-crx4chrome-urls First set of little things 5 yıl önce
github-list-repos Fix org repo listing on new design/site structure 2 yıl önce
gitlab-list-repos Add support for other instances and full-instance listing 2 yıl önce
gofile.io-dl Add support for password-protected folders 2 yıl önce
ia-cdx-search Fix crash on an empty response 2 yıl önce
ia-derive Add script to queue derive on IA 5 yıl önce
ia-files-xml-to-jsonl Guarantee stable output order 3 yıl önce
ia-upload-progress Proper script for tracking size of uploaded data 5 yıl önce
ia-verify-file Add a timeout to prevent potentially indefinite blocking 2 yıl önce
ia-wait-item-tasks Add ia-wait-item-tasks 2 yıl önce
iasha1check Colourise sha1sum output 3 yıl önce
ix.io-upload Allow overriding the "remote filename" 5 yıl önce
kill-wpull-connections Merge kill-wpull-connections repository into little-things 3 yıl önce
killcx-all-https First set of little things 5 yıl önce
mastodon-enumerate-users Enumerate users on a Mastodon instance 5 yıl önce
mastodon-outdated Finding outdated Mastodon instances 5 yıl önce
parent-urls Refactor, strip query/fragment 3 yıl önce
pipelines-launch-in-tmux-windows First set of little things 5 yıl önce
pipelines-monitor-tmux-wget-outcomes Monitor how a pipeline's wget processes are faring 5 yıl önce
pipelines-stop-gracefully First set of little things 5 yıl önce
reddit-pushshift-search Add Bing, Reddit/Pushshift, and FoolFuuka scrapers 5 yıl önce
run-every-five-minutes First set of little things 5 yıl önce
s3-bucket-list Ignore TLS issues 3 yıl önce
s3-bucket-list-qwarc Record wrapper script in meta WARC as well 3 yıl önce
snscrape-extract Add support for Twitter hashtag extraction 4 yıl önce
snscrape-facebook-user Silence by default 5 yıl önce
snscrape-instagram-user Silence by default 5 yıl önce
snscrape-prepare-commands Add support for Twitter hashtag extraction 4 yıl önce
snscrape-tmux Update tmux session commands 4 yıl önce
snscrape-twitter-filter Filter Twitter hashtag scrapes based on account scrapes 5 yıl önce
snscrape-twitter-hashtag Extract external links from Twitter 5 yıl önce
snscrape-twitter-user Extract external links from Twitter 5 yıl önce
snscrape-upload Print Instagram ignore immediately after upload instead of at the end 5 yıl önce
snscrape-vk-user Silence by default 5 yıl önce
snscrape-wiki-transfer-merge Helper tools for snscrape and the wiki pages 5 yıl önce
social-media-extract-profile-link Fix decoding of links on Facebook profiles 4 yıl önce
sum-sizes Add sum-sizes 2 yıl önce
tar-many-files-progress First set of little things 5 yıl önce
tcp-closer Add tcp-closer command 5 yıl önce
transfer.archivete.am-upload Handle HTTP/2 lowercase headers 3 yıl önce
transfer.notkiska.pw-check-ia Switch to HTTPS 3 yıl önce
uniqify Add uniqify 5 yıl önce
url-normalise Normalise domain name to lower-case before further processing 4 yıl önce
warc-peek Add WARC/1.1 support 3 yıl önce
warc-size Split out size formatting 5 yıl önce
warc-tiny Fix compatibility with wpull 2.x 3 yıl önce
website-extract-social-media Add support for Facebook /pages/category/Category/Name-ID URLs 4 yıl önce
wget-spider-estimate-size First set of little things 5 yıl önce
wiki-list-to-main Add ArchiveBot wiki list helper 5 yıl önce
wiki-recursive-extract-normalise Fix deduplication within each section processing 4 yıl önce
wiki-sections-sort Add wiki-sections-sort 4 yıl önce
wiki-website-extract-social-media Add script for automatic social media discovery 4 yıl önce
wpull1-parallel-progress-monitor First set of little things 5 yıl önce
wpull1-progress-monitor First set of little things 5 yıl önce
wpull2-extract-remaining Clean up wpull DB commands 3 yıl önce
wpull2-log-extract-errors Treat NXDOMAIN and no A/AAAA record errors as ok 3 yıl önce
wpull2-requeue Print number of modified records on requeueing 2 yıl önce
wpull2-url-origin Clean up wpull DB commands 3 yıl önce
youtube-channel-list.py Add YouTube channel listing script 2 yıl önce
youtube-extract Handle ancient /?v= URLs 2 yıl önce
youtube-filter-autogen-channels Add youtube-filter-autogen-channels 4 yıl önce
zstdwarccat Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed 2 yıl önce

README.md

Over the past few years, I’ve written and accumulated a number of useful little things to help with archival-related tasks. This repository collects them. I hope someone finds some of them useful.

License (applies to all programs in this repository)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.