131 Commits (236278f0b49ed70789209d3fcde223f21041f052)
 

Author SHA1 Message Date
  JustAnotherArchivist 236278f0b4 Fix decoding of links on Facebook profiles 4 years ago
  JustAnotherArchivist d7a07d1d99 Normalise domain name to lower-case before further processing 4 years ago
  JustAnotherArchivist e655080e20 Add support for Facebook /pages/category/Category/Name-ID URLs 4 years ago
  JustAnotherArchivist daa1a95792 Proper URL decoding 4 years ago
  JustAnotherArchivist 1bee1cdcc7 Add support for Facebook /people/Name/ID URLs 4 years ago
  JustAnotherArchivist 00107c0ef0 Add support for YouTube /c/X URLs 4 years ago
  JustAnotherArchivist b59b82041c Add support for wiki list entries with options 4 years ago
  JustAnotherArchivist d5953ca95c Use old Opera UA for Twitter to force the old design 4 years ago
  JustAnotherArchivist 1fa57d41a3 Fix extraction on Wix sites from JSON inside a data attribute 4 years ago
  JustAnotherArchivist 4a742162d0 Suppress output if there are no matched jobs 4 years ago
  JustAnotherArchivist fe72d57d7e Add filtering based on substrings anywhere in the string and on regex 4 years ago
  JustAnotherArchivist cf30a53f82 Add case-insensitive filtering 4 years ago
  JustAnotherArchivist 711e444e8e Highlight jobs that have been inactive for over 6 hours 4 years ago
  JustAnotherArchivist b2919030ab Fix sorting on numerical columns 4 years ago
  JustAnotherArchivist 257b578fbe Add descending sort 4 years ago
  JustAnotherArchivist 6e7449d137 Support column names in any capitalisation 4 years ago
  JustAnotherArchivist e5e7bdf8af Add more filtering options 4 years ago
  JustAnotherArchivist c611420be9 Remove options from usage line 4 years ago
  JustAnotherArchivist 824eb5e353 Add script for getting an AB job overview table 4 years ago
  JustAnotherArchivist 34c1a58034 Fix detection of multiple transfer encodings 4 years ago
  JustAnotherArchivist 195df08cd5 Fix marker loop on some filenames due to lacking HTML entity processing 4 years ago
  JustAnotherArchivist 3cc3a1ed38 Fix nested tags 4 years ago
  JustAnotherArchivist 5c907488e1 Handle broken pipe on stdout 4 years ago
  JustAnotherArchivist b38349e91f Fix duplicate slashes 4 years ago
  JustAnotherArchivist f23e4cc71e Retry on internal errors 4 years ago
  JustAnotherArchivist bfe5f59e25 Add marker loop detection 4 years ago
  JustAnotherArchivist 66bdef3247 Take a bucket URL argument instead of hostname + bucketname 4 years ago
  JustAnotherArchivist e385c1d302 Limit curl to 10 seconds 4 years ago
  JustAnotherArchivist 74162445aa Replace curl-archivebot-ua with a more general curl-ua script that supports different UAs selected by aliases 4 years ago
  JustAnotherArchivist 9d712d64d7 Ignore certain URLs on Twitter and Instagram entirely 4 years ago
  JustAnotherArchivist 87826d4844 Use line variable instead of prefix+url 4 years ago
  JustAnotherArchivist 163aacf13c Print deletion URL on stderr 4 years ago
  JustAnotherArchivist 486a593f15 Add support for more weird Facebook URLs 4 years ago
  JustAnotherArchivist 256a94443e Fix deduplication within each section processing 4 years ago
  JustAnotherArchivist 98d77ecc96 Deduplicate output 4 years ago
  JustAnotherArchivist 6ce64baf87 Remove redundant url-normalise after the extraction 4 years ago
  JustAnotherArchivist 318183148e Fix URL extraction from Facebook profile overview pages 4 years ago
  JustAnotherArchivist 869ade27eb Separate names in stderr annotations for the various url-normalise processes 4 years ago
  JustAnotherArchivist 79f0bd4332 Normalise URLs everywhere to reduce duplicates 4 years ago
  JustAnotherArchivist dc4efcfbfb One URL normalisation script to rule them all 4 years ago
  JustAnotherArchivist 0f13a1fadd Add verbosity options, and annotate stderr on wiki-recursive-extract 4 years ago
  JustAnotherArchivist 3ec816cd04 Add script for link extraction from social media profiles 4 years ago
  JustAnotherArchivist 5285c406d9 Add script for recursive website and social media discovery 4 years ago
  JustAnotherArchivist 2be9ca922e Ignore more useless Facebook links 4 years ago
  JustAnotherArchivist c3b0e5543e Add support for facebook.com/pg/something 4 years ago
  JustAnotherArchivist 7c389f1fef Add support for hashbang fragments on Twitter links 4 years ago
  JustAnotherArchivist c56736bc4a Ignore /intent on Twitter 4 years ago
  JustAnotherArchivist 4f34753788 Add support for Instagram posts and ignore spurious links from the CDN 4 years ago
  JustAnotherArchivist ad030f5d21 Add support for Facebook pages and groups 4 years ago
  JustAnotherArchivist cd0b3f6214 Ignore /vi/* on YouTube (video thumbnails) 4 years ago