A method to grab the comments from YouTube videos
Du kannst nicht mehr als 25 Themen auswählen Themen müssen entweder mit einem Buchstaben oder einer Ziffer beginnen. Sie können Bindestriche („-“) enthalten und bis zu 35 Zeichen lang sein.
 
 
JustAnotherArchivist 660f315d43 Fix trailing percent-encoded equals signs on initial extraction vor 3 Jahren
LICENSE Initial commit vor 4 Jahren
README.md Fix deep recursion issues on videos with many comments vor 4 Jahren
comments Initial commit vor 4 Jahren
comments.py Fix trailing percent-encoded equals signs on initial extraction vor 3 Jahren

README.md

A method to grab the comments from YouTube videos

  • Requires qwarc and realpath from GNU coreutils 8.23+.
  • Execute as comments VIDEOID where VIDEOID is the 11-character video ID from YouTube.
  • You can pass multiple video IDs at once as well: comments VIDEOID1 VIDEOID2 .... They get executed sequentially.
  • Comments are grabbed in all available sort orders (i.e. “top” and “new”, though “top” is retrieved twice since YT returns two continuation tokens for it), including replies and nesting.
  • The “top” sort order sometimes doesn’t return all comments but might be missing some. The reason for this is unclear.
  • On videos with many comments, the “top” sort order retrieval will fail rapidly due to the continuation token being too long and causing an HTTP 413 Request Entity Too Large error. (The “new” sort order should succeed.)
  • Everything’s written to a few files in the current directory called youtube-comments-VIDEOID-DATE*.
  • After the retrieval finished cleanly and you’re satisfied with the results, you can delete the .db and .log files. The former is essentially useless, and the latter is contained in the -meta.warc.gz file.