JustAnotherArchivist
9763370976
Truncate URLs by default to fit the terminal width
4年前
JustAnotherArchivist
1bc1487ecc
Add script for extracting remaining wpull 2 queue
4年前
JustAnotherArchivist
d3c00353da
Make con-d-commands mode an alias of the corresponding format
4年前
JustAnotherArchivist
b4fe6dd754
Reorder arguments to make more sense
4年前
JustAnotherArchivist
c547fc6c6b
Add format mode
4年前
JustAnotherArchivist
7fde199151
Add --mode con-d-commands, replace --dashboard-regex with --mode dashboard-regex
4年前
JustAnotherArchivist
0a6f83b1b8
Add --dashboard-regex
4年前
JustAnotherArchivist
05ed1e004b
Add more columns
4年前
JustAnotherArchivist
3f7d84ab12
Refactor the Bash/Python abomination into a pure Python script so I get to keep my sanity while editing
4年前
JustAnotherArchivist
cf879c86c9
Refactor into something more flexible to the addition of new columns
4年前
JustAnotherArchivist
d7bd8de09d
Add --dates option
4年前
JustAnotherArchivist
236278f0b4
Fix decoding of links on Facebook profiles
4年前
JustAnotherArchivist
d7a07d1d99
Normalise domain name to lower-case before further processing
4年前
JustAnotherArchivist
e655080e20
Add support for Facebook /pages/category/Category/Name-ID URLs
4年前
JustAnotherArchivist
daa1a95792
Proper URL decoding
4年前
JustAnotherArchivist
1bee1cdcc7
Add support for Facebook /people/Name/ID URLs
4年前
JustAnotherArchivist
00107c0ef0
Add support for YouTube /c/X URLs
4年前
JustAnotherArchivist
b59b82041c
Add support for wiki list entries with options
4年前
JustAnotherArchivist
d5953ca95c
Use old Opera UA for Twitter to force the old design
4年前
JustAnotherArchivist
1fa57d41a3
Fix extraction on Wix sites from JSON inside a data attribute
Example: https://www.martinedocourt.ch/
4年前
JustAnotherArchivist
4a742162d0
Suppress output if there are no matched jobs
4年前
JustAnotherArchivist
fe72d57d7e
Add filtering based on substrings anywhere in the string and on regex
4年前
JustAnotherArchivist
cf30a53f82
Add case-insensitive filtering
4年前
JustAnotherArchivist
711e444e8e
Highlight jobs that have been inactive for over 6 hours
4年前
JustAnotherArchivist
b2919030ab
Fix sorting on numerical columns
4年前
JustAnotherArchivist
257b578fbe
Add descending sort
4年前
JustAnotherArchivist
6e7449d137
Support column names in any capitalisation
4年前
JustAnotherArchivist
e5e7bdf8af
Add more filtering options
4年前
JustAnotherArchivist
c611420be9
Remove options from usage line
4年前
JustAnotherArchivist
824eb5e353
Add script for getting an AB job overview table
4年前
JustAnotherArchivist
34c1a58034
Fix detection of multiple transfer encodings
4年前
JustAnotherArchivist
195df08cd5
Fix marker loop on some filenames due to lacking HTML entity processing
E.g. https://audio-market-dev.s3.amazonaws.com/?marker=media/23/Hard%20Style%20Producer
4年前
JustAnotherArchivist
3cc3a1ed38
Fix nested tags
E.g. <Owner> tag which has <ID> and <DisplayName>, e.g. https://appengage-video.s3.amazonaws.com/
4年前
JustAnotherArchivist
5c907488e1
Handle broken pipe on stdout
4年前
JustAnotherArchivist
b38349e91f
Fix duplicate slashes
4年前
JustAnotherArchivist
f23e4cc71e
Retry on internal errors
4年前
JustAnotherArchivist
bfe5f59e25
Add marker loop detection
4年前
JustAnotherArchivist
66bdef3247
Take a bucket URL argument instead of hostname + bucketname
4年前
JustAnotherArchivist
e385c1d302
Limit curl to 10 seconds
4年前
JustAnotherArchivist
74162445aa
Replace curl-archivebot-ua with a more general curl-ua script that supports different UAs selected by aliases
4年前
JustAnotherArchivist
9d712d64d7
Ignore certain URLs on Twitter and Instagram entirely
4年前
JustAnotherArchivist
87826d4844
Use line variable instead of prefix+url
4年前
JustAnotherArchivist
163aacf13c
Print deletion URL on stderr
4年前
JustAnotherArchivist
486a593f15
Add support for more weird Facebook URLs
4年前
JustAnotherArchivist
256a94443e
Fix deduplication within each section processing
4年前
JustAnotherArchivist
98d77ecc96
Deduplicate output
This uses mawk's extensions `-W interactive` and `delete array`; it will probably work with certain other AWK implementations as well, but for now it depends on mawk explicitly.
4年前
JustAnotherArchivist
6ce64baf87
Remove redundant url-normalise after the extraction
Since all input is run through url-normalise before processing and all output of website and social media extraction is also normalised, it's not necessary to re-normalise again at the end.
4年前
JustAnotherArchivist
318183148e
Fix URL extraction from Facebook profile overview pages
4年前
JustAnotherArchivist
869ade27eb
Separate names in stderr annotations for the various url-normalise processes
4年前
JustAnotherArchivist
79f0bd4332
Normalise URLs everywhere to reduce duplicates
4年前