Преглед изворни кода

Treat redirects as successful retrievals

master
JustAnotherArchivist пре 3 година
родитељ
комит
830e9dbc43
1 измењених фајлова са 3 додато и 1 уклоњено
  1. +3
    -1
      wpull2-log-extract-errors

+ 3
- 1
wpull2-log-extract-errors Прегледај датотеку

@@ -27,6 +27,8 @@ then
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-second-retry’: 429 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-second-retry’: 429 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-second-retry’: 200 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-retry-with-redirect’: 429 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-429-successful-retry-with-redirect’: 302 OK. Length: 1234 [text/html; charset=utf-8].
2020-09-10 23:54:25,000 - wpull.processor.base - ERROR - Fetching ‘https://example.org/error-dns-successful-retry’ encountered an error: DNS resolution failed: [Errno -2] Name or service not known
2020-09-10 23:54:25,000 - wpull.processor.web - INFO - Fetched ‘https://example.org/error-dns-successful-retry’: 200 OK. Length: 1234 [text/html; charset=utf-8].
EOF
@@ -55,6 +57,6 @@ fi

# Logic: extract all lines of interest, process them such that they only contain a + or - indicating success or error plus the URL, filter the errors with the successes in awk.
# The output order is as each URL appears for the first time in the log. Since awk doesn't preserve the insertion order on iteration, keep the line number and sort the output on that.
grep -F -e ' - ERROR - Fetching ‘' -e ' - INFO - Fetched ‘' | grep -Fv '’: 30' | sed 's,^.*‘\(.*\)’: \(200\|204\|304\|401\|403\|404\|405\|410\) .*$,+ \1,; s,^.*‘\(.*\)’.*$,- \1,' | awk '/^\+ / { successes[$2] = 1; } /^- / && ! ($2 in successes) { errors[$2] = NR; } END { for (url in errors) { if (! (url in successes)) { print errors[url] " " url; } } }' | sort -n | cut -d' ' -f2-
grep -F -e ' - ERROR - Fetching ‘' -e ' - INFO - Fetched ‘' | sed 's,^.*‘\(.*\)’: \(200\|204\|30[0-8]\|401\|403\|404\|405\|410\) .*$,+ \1,; s,^.*‘\(.*\)’.*$,- \1,' | awk '/^\+ / { successes[$2] = 1; } /^- / && ! ($2 in successes) { errors[$2] = NR; } END { for (url in errors) { if (! (url in successes)) { print errors[url] " " url; } } }' | sort -n | cut -d' ' -f2-

# Faster version without preserving order: grep -F -e ' - ERROR - Fetching ‘' -e ' - INFO - Fetched ‘' | sed 's,^.*‘\(.*\)’: \(200\|204\|304\|401\|403\|404\|405\|410\) .*$,+ \1,; s,^.*‘\(.*\)’.*$,- \1,' | awk '/^\+ / { successes[$2] = 1; } /^- / && ! ($2 in successes) { errors[$2] = 1; } END { for (url in errors) { if (! (url in successes)) { print url; } } }'

Loading…
Откажи
Сачувај