JustAnotherArchivist
a700e8e2fe
Add tcp-closer command
5年前
JustAnotherArchivist
859c75a591
Add tool for WARC verification and extraction
5年前
JustAnotherArchivist
e867a2327f
Replace urlencoded @ symbol
The fix for https://github.com/dutchcoders/transfer.sh/issues/215 led to @ being encoded as %40 in filenames in the URL returned, which is awkward when working with social media scrapes since ArchiveBot normalises it to @ again.
5年前
JustAnotherArchivist
cbd952024b
Workaround for hash no longer needed with current transfer.sh code
5年前
JustAnotherArchivist
61431c2054
Add VK scraping helper
5年前
JustAnotherArchivist
d6ff566c4d
Instagram always uses lower-case usernames
5年前
JustAnotherArchivist
138c2a2d39
Get rid of post-processing now that snscrape (dev version) has clean URLs
Keep the dirty URLs on Instagram because they're not that dirty and are linked from the profile pages. I usually throw it into ArchiveBot anyway such that it grabs the non-"taken-by" URLs as well.
5年前
JustAnotherArchivist
27b0d2da75
Better username capitalisation extraction method
5年前
JustAnotherArchivist
3aa828a0ac
transfer.kiska.pw -> transfer.notkiska.pw
5年前
JustAnotherArchivist
63f4a8b3d3
transfer.sh -> transfer.kiska.pw
5年前
JustAnotherArchivist
0168d50f62
Automatically fix capitalisation of Facebook and Twitter usernames
5年前
JustAnotherArchivist
db0104b3c8
Get correct capitalisation for a Facebook username
5年前
JustAnotherArchivist
4a1a9a10e0
Allow overriding the "remote filename"
5年前
JustAnotherArchivist
769f95808e
Add ix.io upload script
5年前
JustAnotherArchivist
c79721337b
+x
5年前
JustAnotherArchivist
c30dcf5985
Finding outdated Mastodon instances
5年前
JustAnotherArchivist
1748a6b607
Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up
5年前
JustAnotherArchivist
fd680551df
Add Bing, Reddit/Pushshift, and FoolFuuka scrapers
5年前
JustAnotherArchivist
ede77ad142
Filter Twitter hashtag scrapes based on account scrapes
5年前
JustAnotherArchivist
57ef544c6c
Fix line endings
5年前
JustAnotherArchivist
07c3e7baaa
Add snscrape helpers
5年前
JustAnotherArchivist
b7e3a703d8
Monitor how a pipeline's wget processes are faring
5年前
JustAnotherArchivist
168f61b39a
Quote filename so it works with any weird characters in the paths
(Last reconstructed commit from text file full of different versions)
5年前
JustAnotherArchivist
8f77c8c72a
xargs -r flag to not run the second find if the first produces no results (GNU extension)
5年前
JustAnotherArchivist
9d7a4096f9
Pipe into second find directly
5年前
JustAnotherArchivist
e3a4bf6a47
Replace slow lsof with procfs access
5年前
JustAnotherArchivist
4a83a54616
Print host for each stuck request
5年前
JustAnotherArchivist
2b2c65f034
Print PID
5年前
JustAnotherArchivist
fadb70e297
Fixed version which handles multiple roots correctly
5年前
JustAnotherArchivist
d10a1d3675
First set of little things
5年前
JustAnotherArchivist
a00607f28e
Initial commit
5年前