Quellcode durchsuchen

Treat redirects as successful retrievals

master
JustAnotherArchivist vor 3 Jahren
Ursprung
Commit
830e9dbc43
1 geänderte Dateien mit 3 neuen und 1 gelöschten Zeilen
  1. +3
    -1
      wpull2-log-extract-errors

+ 3
- 1
wpull2-log-extract-errors Datei anzeigen

@@ -27,6 +27,8 @@ then
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-second-retry’: 429 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-second-retry’: 429 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-second-retry’: 200 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-retry-with-redirect’: 429 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-retry-with-redirect’: 302 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.base - ERROR - Fetching ‘https://example.org/error-dns-successful-retry’ encountered an error: DNS resolution failed: [Errno -2] Name or service not known
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-dns-successful-retry’: 200 OK. Length: 1234 [text/html; charset=utf-8].
EOF
@@ -55,6 +57,6 @@ fi

# Logic: extract all lines of interest, process them such that they only contain a + or - indicating success or error plus the URL, filter the errors with the successes in awk.
# The output order is as each URL appears for the first time in the log. Since awk doesn't preserve the insertion order on iteration, keep the line number and sort the output on that.
grep -F -e ' - ERROR - Fetching ‘' -e ' - INFO - Fetched ‘' | grep -Fv '’: 30' | sed 's,^.*‘\(.*\)’: \(200\|204\|304\|401\|403\|404\|405\|410\) .*$,+ \1,; s,^.*‘\(.*\)’.*$,- \1,' | awk '/^\+ / { successes[$2] = 1; } /^- / && ! ($2 in successes) { errors[$2] = NR; } END { for (url in errors) { if (! (url in successes)) { print errors[url] " " url; } } }' | sort -n | cut -d' ' -f2-
grep -F -e ' - ERROR - Fetching ‘' -e ' - INFO - Fetched ‘' | sed 's,^.*‘\(.*\)’: \(200\|204\|30[0-8]\|401\|403\|404\|405\|410\) .*$,+ \1,; s,^.*‘\(.*\)’.*$,- \1,' | awk '/^\+ / { successes[$2] = 1; } /^- / && ! ($2 in successes) { errors[$2] = NR; } END { for (url in errors) { if (! (url in successes)) { print errors[url] " " url; } } }' | sort -n | cut -d' ' -f2-

# Faster version without preserving order: grep -F -e ' - ERROR - Fetching ‘' -e ' - INFO - Fetched ‘' | sed 's,^.*‘\(.*\)’: \(200\|204\|304\|401\|403\|404\|405\|410\) .*$,+ \1,; s,^.*‘\(.*\)’.*$,- \1,' | awk '/^\+ / { successes[$2] = 1; } /^- / && ! ($2 in successes) { errors[$2] = 1; } END { for (url in errors) { if (! (url in successes)) { print url; } } }'

Laden…
Abbrechen
Speichern