소스 검색

Filter Twitter hashtag scrapes based on account scrapes

master
JustAnotherArchivist 5 년 전
부모
커밋
ede77ad142
1개의 변경된 파일9개의 추가작업 그리고 0개의 파일을 삭제
  1. +9
    -0
      snscrape-twitter-filter

+ 9
- 0
snscrape-twitter-filter 파일 보기

@@ -0,0 +1,9 @@
#!/bin/bash
# When scraping accounts and hashtags which have some overlap, this can be used to filter out the accounts' tweets from the hashtag scrapes
# Starting with account and hashtag scrapes in twitter-@* and twitter-#*, respectively:
for f in twitter-#*; do comm -23 <(sort <$f) <(cat twitter-@* | sort) > "${f}-fixed"; done
for f in *-fixed; do { grep -vF '/status/' $f; grep -F '/status/' $f | sort -t'/' -k6,6n | tac; } > "${f}-sorted"; done
for f in *-fixed-sorted; do mv $f ${f/-fixed-sorted/-filtered}; done

# sort -r should work, but for some reason it doesn't, hence the tac...
# There's certainly a cleaner way which doesn't involve sorting and then restoring the inverse chronological order.

불러오는 중...
취소
저장