- Run the cleanup code on exceptions (e.g. ^C). There were several effects of that not happening previously; most notably, the log file was not written to the meta WARC.
- Cancel remaining tasks, which avoids a pile of asyncio warnings and errors on crashes.
- Close the DB before the WARC, or rather, close the WARC last. This is mostly a semantic change to further ensure that the log written to the meta WARC is as complete as possible.
As a consequence, SpecDependencies.extra can now be any data type that can be put into JSON; unhashable types previously caused a crash due to the lru_cache.
The DB creation operates with a DB lock, so that code can't run while another process is filling the DB; it would block on obtaining the lock a few lines prior instead.
It is rarely necessary to access the history, and the tuple return value clutters the spec file code.
As a consequence, it's no longer possible to return None if an error occurred without losing the history.
To replace that, this also introduces a DummyClientResponse, which is kind of ClientResponse-like, has the same qhistory attribute, and evaluates to False when cast to bool (such that the intuitive `if response` works as expected).
If a response ends with '0\r\n' or '0\r\n\r', ClientResponse._read loops forever trying to read 4 more bytes.
In addition, bump that read to 1 KiB for better worst-case performance.
Closing the raw data tempfiles immediately on connection reuse caused any response reading to fail with an I/O error if another request started on the same connection in the meantime. Delaying the closing until the response object falls out of scope and gets GC'd ensures that as long as there is a reference to that object, it can be read from, at the expense of a possibly larger memory overhead.
For small responses, the additional headers for the revisit outweigh the payload truncation savings. The chosen limit of 100 bytes is completely arbitrary and not backed by any real-world data.