JustAnotherArchivist
f025c4e9f3
Add extensive debug logging
3年前
JustAnotherArchivist
dbe1ed71ab
"Freeze" log file object before writing to WARC to ensure that further log messages aren't picked up
This is a workaround for https://github.com/webrecorder/warcio/issues/90
3年前
JustAnotherArchivist
a4cf1a4225
Fix str_get_all_between yielding half-overlapping matches
3年前
JustAnotherArchivist
15203bd991
Handle redirect traps/loops
3年前
JustAnotherArchivist
f8f5258197
Track redirect depth
3年前
JustAnotherArchivist
a3d6fb35f8
Turn response handlers into kwarg-only functions for easier extendability without breaking existing code
3年前
JustAnotherArchivist
a91cc23d47
Simplify get_software_info's signature to just the extra dependency packages
As a consequence, SpecDependencies.extra can now be any data type that can be put into JSON; unhashable types previously caused a crash due to the lru_cache.
3年前
JustAnotherArchivist
b30ccf8bf8
Move response/exception history to ClientResponse.qhistory
It is rarely necessary to access the history, and the tuple return value clutters the spec file code.
As a consequence, it's no longer possible to return None if an error occurred without losing the history.
To replace that, this also introduces a DummyClientResponse, which is kind of ClientResponse-like, has the same qhistory attribute, and evaluates to False when cast to bool (such that the intuitive `if response` works as expected).
3年前
JustAnotherArchivist
03336e4988
Add item to response handler arguments (e.g. for logging)
3年前
JustAnotherArchivist
cb0d11284e
Write only successful retrievals (i.e. ones that don't cause an exception) to WARC
4年前
JustAnotherArchivist
1214409a0b
Flush big responses to a temporary file instead of trying to keep everything in-memory
4年前
JustAnotherArchivist
37dbcfad21
Don't write responses to WARC that triggered an exception
For example, if the connection breaks while retrieving a response but after the headers have been parsed, the response body would be incomplete.
4年前
JustAnotherArchivist
f038cf91db
Fix unfound distribution handling
4年前
JustAnotherArchivist
a5dfd5c805
Write spec file + its dependencies and command line to meta WARC
4年前
JustAnotherArchivist
e99e2304c9
Write meta WARC with log file
4年前
JustAnotherArchivist
85d78cee13
Add warcinfo record with version information on Python, system, and dependencies
4年前
JustAnotherArchivist
6fafd32685
Error when the retries are exceeded
4年前
JustAnotherArchivist
8647d6b396
Use f-strings instead of str.format
4年前
JustAnotherArchivist
85f6f7bd82
Make qwarc.utils.handle_response_limit_error_retries more useful by passing the deferring handler as an argument
5年前
JustAnotherArchivist
2d52e78d85
Fix reference to aiohttp.CientError
5年前
JustAnotherArchivist
e892a6b6a7
Initial commit
5年前