The little things give you away... A collection of various small helper stuff
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 

10 lines
709 B

  1. #!/bin/bash
  2. # When scraping accounts and hashtags which have some overlap, this can be used to filter out the accounts' tweets from the hashtag scrapes
  3. # Starting with account and hashtag scrapes in twitter-@* and twitter-#*, respectively:
  4. for f in twitter-#*; do comm -23 <(sort <$f) <(cat twitter-@* | sort) > "${f}-fixed"; done
  5. for f in *-fixed; do { grep -vF '/status/' $f; grep -F '/status/' $f | sort -t'/' -k6,6n | tac; } > "${f}-sorted"; done
  6. for f in *-fixed-sorted; do mv $f ${f/-fixed-sorted/-filtered}; done
  7. # sort -r should work, but for some reason it doesn't, hence the tac...
  8. # There's certainly a cleaner way which doesn't involve sorting and then restoring the inverse chronological order.