The little things give you away... A collection of various small helper stuff
Vous ne pouvez pas sélectionner plus de 25 sujets Les noms de sujets doivent commencer par une lettre ou un nombre, peuvent contenir des tirets ('-') et peuvent comporter jusqu'à 35 caractères.
JustAnotherArchivist dde4464555 Cover two more rare URLs il y a 1 semaine
LICENSE Initial commit il y a 1 an
README.md Initial commit il y a 1 an
archivebot-blogspot Fix HTTPS handling il y a 1 an
archivebot-high-memory Support python3 in any directory instead of just /usr/bin il y a 1 an
archivebot-jobid-calculation More snscrape helper tools il y a 1 an
archivebot-jobs Add ETA column il y a 1 semaine
archivebot-list-stuck-requests Fix line endings il y a 1 an
archivebot-monitor-job-queue First set of little things il y a 1 an
archivebot-youtube Add helper for AB/chromebot-ing YouTube channels and users il y a 1 an
bing-scrape Add Bing, Reddit/Pushshift, and FoolFuuka scrapers il y a 1 an
curl-ua Add IE6 UA il y a 2 mois
deb-repo-urls Fix deb file URLs il y a 5 mois
dedupe Another alternative and performance/memory comparison il y a 2 mois
europarl-meps-collect Add script for scraping MEP links from europarl.europa.eu il y a 1 an
foolfuuka-search Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up il y a 1 an
format-size Split out size formatting il y a 1 an
fos-ftp-upload First set of little things il y a 1 an
get-crx4chrome-urls First set of little things il y a 1 an
gofile.io-dl Verbosity il y a 3 mois
ia-derive Add script to queue derive on IA il y a 1 an
ia-upload-progress Proper script for tracking size of uploaded data il y a 1 an
iasha1check Colourise output il y a 3 semaines
ix.io-upload Allow overriding the "remote filename" il y a 1 an
kill-wpull-connections Merge kill-wpull-connections repository into little-things il y a 4 mois
killcx-all-https First set of little things il y a 1 an
mastodon-enumerate-users Enumerate users on a Mastodon instance il y a 1 an
mastodon-outdated Finding outdated Mastodon instances il y a 1 an
pipelines-launch-in-tmux-windows First set of little things il y a 1 an
pipelines-monitor-tmux-wget-outcomes Monitor how a pipeline's wget processes are faring il y a 1 an
pipelines-stop-gracefully First set of little things il y a 1 an
reddit-pushshift-search Add Bing, Reddit/Pushshift, and FoolFuuka scrapers il y a 1 an
run-every-five-minutes First set of little things il y a 1 an
s3-bucket-list Add support for alternative xmlns il y a 2 mois
s3-bucket-list-qwarc Record wrapper script in meta WARC as well il y a 2 mois
snscrape-extract Add support for Twitter hashtag extraction il y a 1 an
snscrape-facebook-user Silence by default il y a 1 an
snscrape-instagram-user Silence by default il y a 1 an
snscrape-prepare-commands Add support for Twitter hashtag extraction il y a 1 an
snscrape-tmux Update tmux session commands il y a 1 an
snscrape-twitter-filter Filter Twitter hashtag scrapes based on account scrapes il y a 1 an
snscrape-twitter-hashtag Extract external links from Twitter il y a 1 an
snscrape-twitter-user Extract external links from Twitter il y a 1 an
snscrape-upload Print Instagram ignore immediately after upload instead of at the end il y a 1 an
snscrape-vk-user Silence by default il y a 1 an
snscrape-wiki-transfer-merge Helper tools for snscrape and the wiki pages il y a 1 an
social-media-extract-profile-link Fix decoding of links on Facebook profiles il y a 9 mois
tar-many-files-progress First set of little things il y a 1 an
tcp-closer Add tcp-closer command il y a 1 an
transfer.notkiska.pw-check-ia Switch to HTTPS il y a 4 mois
transfer.notkiska.pw-upload Print deletion URL on stderr il y a 1 an
uniqify Add uniqify il y a 1 an
url-normalise Normalise domain name to lower-case before further processing il y a 9 mois
warc-peek Add WARC/1.1 support il y a 4 mois
warc-size Split out size formatting il y a 1 an
warc-tiny Prevent constantly moving bytes around for better performance on large chunked records il y a 3 semaines
website-extract-social-media Add support for Facebook /pages/category/Category/Name-ID URLs il y a 9 mois
wget-spider-estimate-size First set of little things il y a 1 an
wiki-list-to-main Add ArchiveBot wiki list helper il y a 1 an
wiki-recursive-extract-normalise Fix deduplication within each section processing il y a 1 an
wiki-sections-sort Add wiki-sections-sort il y a 1 an
wiki-website-extract-social-media Add script for automatic social media discovery il y a 1 an
wpull1-parallel-progress-monitor First set of little things il y a 1 an
wpull1-progress-monitor First set of little things il y a 1 an
wpull2-extract-remaining Add script for extracting remaining wpull 2 queue il y a 8 mois
wpull2-log-extract-errors Treat NXDOMAIN and no A/AAAA record errors as ok il y a 2 mois
wpull2-url-origin Fixed version which handles multiple roots correctly il y a 1 an
youtube-extract Cover two more rare URLs il y a 1 semaine
youtube-filter-autogen-channels Add youtube-filter-autogen-channels il y a 1 an

README.md

Over the past few years, I’ve written and accumulated a number of useful little things to help with archival-related tasks. This repository collects them. I hope someone finds some of them useful.

License (applies to all programs in this repository)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.