77 次程式碼提交 (6bdcfe71f03c83ea41a2b1e5a9683d90a404663c)
 

作者 SHA1 備註 提交日期
  JustAnotherArchivist 6bdcfe71f0 Refactor database creation and item generation: call `Item.generate()` on every qwarc run and dedupe its output, allowing the addition of further items by modifying the spec file 3 年之前
  JustAnotherArchivist c878241f24 Switch from concurrent.futures.CancelledError to asyncio.CancelledError 3 年之前
  JustAnotherArchivist 749158b97a Use the Future's result directly rather than awaiting again 3 年之前
  JustAnotherArchivist 5c6169ee4d Bump Python version classifiers 3 年之前
  JustAnotherArchivist a85e80ffa2 Configurable request timeout 3 年之前
  JustAnotherArchivist 429ac94689 Make it possible to override and remove headers 3 年之前
  JustAnotherArchivist e40be54578 Document verify_ssl parameter 3 年之前
  JustAnotherArchivist d3437bde19 Move default headers to qwarc.const 3 年之前
  JustAnotherArchivist 17fc3499ff Fix infinite loop in workaround for aiohttp issue 4630 3 年之前
  JustAnotherArchivist b6003af1e5 Work around aiohttp bug on parsing chunked transfer encoding responses when the buffer ends in an unfortunate spot 4 年之前
  JustAnotherArchivist 1678075a89 Log traceback on exceptions raised from an item 4 年之前
  JustAnotherArchivist 4ff8b260a1 Don't close raw data tempfiles until the response gets GC'd 4 年之前
  JustAnotherArchivist 4d9e4d8fe8 Fix ClientResponse._read returning more than nbytes if the entire response fits into the first block fed into the parser 4 年之前
  JustAnotherArchivist 2895f4bfdf Catch TypeError in Content-Length parsing 4 年之前
  JustAnotherArchivist 8358ba9131 Add support for only reading part of the response into memory 4 年之前
  JustAnotherArchivist 939978beec Handle EOF from the HTTP payload parser correctly 4 年之前
  JustAnotherArchivist b1a1c03f7e Handle STOP file and high memory usage before full disk to allow stopping while the disk is above the limit 4 年之前
  JustAnotherArchivist dd44d9b174 Adjust logging levels: log individual request failures only at WARNING and cancelled tasks at ERROR level 4 年之前
  JustAnotherArchivist 820384fe1e Stop deduping small responses 4 年之前
  JustAnotherArchivist 91035d769c Catch exceptions in Item.process and mark the items as errors instead of crashing 4 年之前
  JustAnotherArchivist 69984765b3 Fix taskType typo silencing cancellation warnings 4 年之前
  JustAnotherArchivist 461cedbbde Avoid temporary files created by warcio due to not knowing the record payload length 4 年之前
  JustAnotherArchivist c263ad0b03 Return ClientResponse object from fetch only if the retrieval was successful 4 年之前
  JustAnotherArchivist cb0d11284e Write only successful retrievals (i.e. ones that don't cause an exception) to WARC 4 年之前
  JustAnotherArchivist 1214409a0b Flush big responses to a temporary file instead of trying to keep everything in-memory 4 年之前
  JustAnotherArchivist 37dbcfad21 Don't write responses to WARC that triggered an exception 4 年之前
  JustAnotherArchivist 93df9cd18d Get rid of the temporary extra log file and read the plain file instead 4 年之前
  JustAnotherArchivist 08c3d55376 Add comment on block digest workaround (cf. f14a664b) 4 年之前
  JustAnotherArchivist 413435b7fb Work around warcio not writing the correct WARC-Profile header for revisit records on WARC/1.1 4 年之前
  JustAnotherArchivist 08d96b37c5 Support deep/multiple inheritance from Item 4 年之前
  JustAnotherArchivist 9d8de13775 Add Item.flush_subitems to flush the new subitems to the database while the item is still being processed 4 年之前
  JustAnotherArchivist 50b936b18c Refactor QWARC class to keep relevant variables in instance attributes instead of local variables 4 年之前
  JustAnotherArchivist c5d8d93166 Remove stray whitespace 4 年之前
  JustAnotherArchivist 8ee9b20718 Remove WARC-Target-URI header from warcinfo record 4 年之前
  JustAnotherArchivist f14a664b1c Work around warcio not writing a block digest for warcinfo records (https://github.com/webrecorder/warcio/issues/87) 4 年之前
  JustAnotherArchivist 7d53577522 Add parameter for disabling SSL/TLS certificate validation 4 年之前
  JustAnotherArchivist 7e049423a4 The memory leak has vanished as of CPython 3.7.3 4 年之前
  JustAnotherArchivist bd14ab3901 Fix crash due to closing the log handler on reaching the max WARC size 4 年之前
  JustAnotherArchivist 08117630b0 Remove warcinfo record in each data WARC and refer to the process's warcinfo record in the meta WARC instead 4 年之前
  JustAnotherArchivist 26aab15605 urn:X-qwarc instead of urn:qwarc 4 年之前
  JustAnotherArchivist 50d46ad51c Use log filename in the target URI of the log resource record 4 年之前
  JustAnotherArchivist e093211496 Set content type for resource records 4 年之前
  JustAnotherArchivist ae46b53401 Always write a WARC-Warcinfo-ID header 4 年之前
  JustAnotherArchivist 23fcdd4026 Write microsecond dates for request and response records 4 年之前
  JustAnotherArchivist 3030ad10ab Mark private API accordingly 4 年之前
  JustAnotherArchivist e0b4104d21 Remove log handler before writing log record since that requires closing the stream 4 年之前
  JustAnotherArchivist 6cfd352f68 Write WARC/1.1 files 4 年之前
  JustAnotherArchivist e1ad5c232e Write warcinfo and resource records in meta WARC on firing up qwarc rather than at the end 4 年之前
  JustAnotherArchivist f038cf91db Fix unfound distribution handling 4 年之前
  JustAnotherArchivist a5dfd5c805 Write spec file + its dependencies and command line to meta WARC 4 年之前