JustAnotherArchivist
f025c4e9f3
Add extensive debug logging
3 years ago
JustAnotherArchivist
dbe1ed71ab
"Freeze" log file object before writing to WARC to ensure that further log messages aren't picked up
This is a workaround for https://github.com/webrecorder/warcio/issues/90
3 years ago
JustAnotherArchivist
a4cf1a4225
Fix str_get_all_between yielding half-overlapping matches
3 years ago
JustAnotherArchivist
15203bd991
Handle redirect traps/loops
3 years ago
JustAnotherArchivist
f8f5258197
Track redirect depth
3 years ago
JustAnotherArchivist
a3d6fb35f8
Turn response handlers into kwarg-only functions for easier extendability without breaking existing code
3 years ago
JustAnotherArchivist
a91cc23d47
Simplify get_software_info's signature to just the extra dependency packages
As a consequence, SpecDependencies.extra can now be any data type that can be put into JSON; unhashable types previously caused a crash due to the lru_cache.
3 years ago
JustAnotherArchivist
b30ccf8bf8
Move response/exception history to ClientResponse.qhistory
It is rarely necessary to access the history, and the tuple return value clutters the spec file code.
As a consequence, it's no longer possible to return None if an error occurred without losing the history.
To replace that, this also introduces a DummyClientResponse, which is kind of ClientResponse-like, has the same qhistory attribute, and evaluates to False when cast to bool (such that the intuitive `if response` works as expected).
3 years ago
JustAnotherArchivist
03336e4988
Add item to response handler arguments (e.g. for logging)
3 years ago
JustAnotherArchivist
cb0d11284e
Write only successful retrievals (i.e. ones that don't cause an exception) to WARC
4 years ago
JustAnotherArchivist
1214409a0b
Flush big responses to a temporary file instead of trying to keep everything in-memory
4 years ago
JustAnotherArchivist
37dbcfad21
Don't write responses to WARC that triggered an exception
For example, if the connection breaks while retrieving a response but after the headers have been parsed, the response body would be incomplete.
4 years ago
JustAnotherArchivist
f038cf91db
Fix unfound distribution handling
4 years ago
JustAnotherArchivist
a5dfd5c805
Write spec file + its dependencies and command line to meta WARC
4 years ago
JustAnotherArchivist
e99e2304c9
Write meta WARC with log file
4 years ago
JustAnotherArchivist
85d78cee13
Add warcinfo record with version information on Python, system, and dependencies
4 years ago
JustAnotherArchivist
6fafd32685
Error when the retries are exceeded
4 years ago
JustAnotherArchivist
8647d6b396
Use f-strings instead of str.format
4 years ago
JustAnotherArchivist
85f6f7bd82
Make qwarc.utils.handle_response_limit_error_retries more useful by passing the deferring handler as an argument
5 years ago
JustAnotherArchivist
2d52e78d85
Fix reference to aiohttp.CientError
5 years ago
JustAnotherArchivist
e892a6b6a7
Initial commit
5 years ago