JustAnotherArchivist
5ca90c3b7d
Update tmux session commands
4年前
JustAnotherArchivist
679923d37d
Add support for Twitter hashtag extraction
4年前
JustAnotherArchivist
663383830c
Add support for lists
4年前
JustAnotherArchivist
d85d142def
Handle parameters on Twitter URLs
5年前
JustAnotherArchivist
5984565417
Handle Twitter URLs with trailing slash
5年前
JustAnotherArchivist
8647ccaa8f
Support subdomain-less Facebook URLs
5年前
JustAnotherArchivist
66ec0c93c4
Handle more Facebook URLs
5年前
JustAnotherArchivist
baa8a566bd
Add script for scraping MEP links from europarl.europa.eu
5年前
JustAnotherArchivist
c2413b2c4f
Add ArchiveBot wiki list helper
5年前
JustAnotherArchivist
72818019bc
Extract external links from Twitter
5年前
JustAnotherArchivist
b262d893da
Silence by default
5年前
JustAnotherArchivist
6fb9587a2b
More flexible normalisation
5年前
JustAnotherArchivist
06be216f4c
Print Instagram ignore immediately after upload instead of at the end
5年前
JustAnotherArchivist
1be4ed829b
Add helper for AB/chromebot-ing YouTube channels and users
5年前
JustAnotherArchivist
2a7a4ea6dc
Fix HTTPS handling
5年前
JustAnotherArchivist
a812cb5fc2
More snscrape helper tools
5年前
JustAnotherArchivist
3ee3ffc340
Generate commands for Blogspot
5年前
JustAnotherArchivist
5090a8ad02
Enumerate users on a Mastodon instance
5年前
JustAnotherArchivist
0000d8ffd9
Add script to queue derive on IA
5年前
JustAnotherArchivist
6dc711c54e
Further helper scripts for snscrape: normalising usernames and extracting them from a list of URLs
5年前
JustAnotherArchivist
e3a37455ba
Add uniqify
5年前
JustAnotherArchivist
321067819c
Proper script for tracking size of uploaded data
5年前
JustAnotherArchivist
5c654cb16b
Split out size formatting
5年前
JustAnotherArchivist
de2cdc0aae
curl with ArchiveBot UA
5年前
JustAnotherArchivist
89ccd68b59
Helper tools for snscrape and the wiki pages
5年前
JustAnotherArchivist
f2e836d2e9
Add support for differently formatted digests
5年前
JustAnotherArchivist
94c4f76570
Fix crash when a digest is missing from a record
5年前
JustAnotherArchivist
ef78a3318c
Colour only the header field names but not the values
5年前
JustAnotherArchivist
9ce4653094
Document colouring and usage
5年前
JustAnotherArchivist
e7c5d82254
Coloured WARCs?!
5年前
JustAnotherArchivist
70b413f5c1
Better events: include raw WARC header data and separate HTTP requests into headers and body
5年前
JustAnotherArchivist
641bc7a207
Fix infinite loop at end of WARC
5年前
JustAnotherArchivist
a700e8e2fe
Add tcp-closer command
5年前
JustAnotherArchivist
859c75a591
Add tool for WARC verification and extraction
5年前
JustAnotherArchivist
e867a2327f
Replace urlencoded @ symbol
The fix for https://github.com/dutchcoders/transfer.sh/issues/215 led to @ being encoded as %40 in filenames in the URL returned, which is awkward when working with social media scrapes since ArchiveBot normalises it to @ again.
5年前
JustAnotherArchivist
cbd952024b
Workaround for hash no longer needed with current transfer.sh code
5年前
JustAnotherArchivist
61431c2054
Add VK scraping helper
5年前
JustAnotherArchivist
d6ff566c4d
Instagram always uses lower-case usernames
5年前
JustAnotherArchivist
138c2a2d39
Get rid of post-processing now that snscrape (dev version) has clean URLs
Keep the dirty URLs on Instagram because they're not that dirty and are linked from the profile pages. I usually throw it into ArchiveBot anyway such that it grabs the non-"taken-by" URLs as well.
5年前
JustAnotherArchivist
27b0d2da75
Better username capitalisation extraction method
5年前
JustAnotherArchivist
3aa828a0ac
transfer.kiska.pw -> transfer.notkiska.pw
5年前
JustAnotherArchivist
63f4a8b3d3
transfer.sh -> transfer.kiska.pw
5年前
JustAnotherArchivist
0168d50f62
Automatically fix capitalisation of Facebook and Twitter usernames
5年前
JustAnotherArchivist
db0104b3c8
Get correct capitalisation for a Facebook username
5年前
JustAnotherArchivist
4a1a9a10e0
Allow overriding the "remote filename"
5年前
JustAnotherArchivist
769f95808e
Add ix.io upload script
5年前
JustAnotherArchivist
c79721337b
+x
5年前
JustAnotherArchivist
c30dcf5985
Finding outdated Mastodon instances
5年前
JustAnotherArchivist
1748a6b607
Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up
5年前
JustAnotherArchivist
fd680551df
Add Bing, Reddit/Pushshift, and FoolFuuka scrapers
5年前