Bladeren bron

Treat redirects as successful retrievals

master
JustAnotherArchivist 3 jaren geleden
bovenliggende
commit
830e9dbc43
1 gewijzigde bestanden met toevoegingen van 3 en 1 verwijderingen
  1. +3
    -1
      wpull2-log-extract-errors

+ 3
- 1
wpull2-log-extract-errors Bestand weergeven

@@ -27,6 +27,8 @@ then
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-second-retry’: 429 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-second-retry’: 429 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-second-retry’: 200 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-retry-with-redirect’: 429 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-retry-with-redirect’: 302 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.base - ERROR - Fetching ‘https://example.org/error-dns-successful-retry’ encountered an error: DNS resolution failed: [Errno -2] Name or service not known
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-dns-successful-retry’: 200 OK. Length: 1234 [text/html; charset=utf-8].
EOF
@@ -55,6 +57,6 @@ fi

# Logic: extract all lines of interest, process them such that they only contain a + or - indicating success or error plus the URL, filter the errors with the successes in awk.
# The output order is as each URL appears for the first time in the log. Since awk doesn't preserve the insertion order on iteration, keep the line number and sort the output on that.
grep -F -e ' - ERROR - Fetching ‘' -e ' - INFO - Fetched ‘' | grep -Fv '’: 30' | sed 's,^.*‘\(.*\)’: \(200\|204\|304\|401\|403\|404\|405\|410\) .*$,+ \1,; s,^.*‘\(.*\)’.*$,- \1,' | awk '/^\+ / { successes[$2] = 1; } /^- / && ! ($2 in successes) { errors[$2] = NR; } END { for (url in errors) { if (! (url in successes)) { print errors[url] " " url; } } }' | sort -n | cut -d' ' -f2-
grep -F -e ' - ERROR - Fetching ‘' -e ' - INFO - Fetched ‘' | sed 's,^.*‘\(.*\)’: \(200\|204\|30[0-8]\|401\|403\|404\|405\|410\) .*$,+ \1,; s,^.*‘\(.*\)’.*$,- \1,' | awk '/^\+ / { successes[$2] = 1; } /^- / && ! ($2 in successes) { errors[$2] = NR; } END { for (url in errors) { if (! (url in successes)) { print errors[url] " " url; } } }' | sort -n | cut -d' ' -f2-

# Faster version without preserving order: grep -F -e ' - ERROR - Fetching ‘' -e ' - INFO - Fetched ‘' | sed 's,^.*‘\(.*\)’: \(200\|204\|304\|401\|403\|404\|405\|410\) .*$,+ \1,; s,^.*‘\(.*\)’.*$,- \1,' | awk '/^\+ / { successes[$2] = 1; } /^- / && ! ($2 in successes) { errors[$2] = 1; } END { for (url in errors) { if (! (url in successes)) { print url; } } }'

Laden…
Annuleren
Opslaan