Procházet zdrojové kódy

Deduplicate output

This uses mawk's extensions `-W interactive` and `delete array`; it will probably work with certain other AWK implementations as well, but for now it depends on mawk explicitly.
master
JustAnotherArchivist před 4 roky
rodič
revize
98d77ecc96
1 změnil soubory, kde provedl 2 přidání a 2 odebrání
  1. +2
    -2
      wiki-recursive-extract-normalise

+ 2
- 2
wiki-recursive-extract-normalise Zobrazit soubor

@@ -3,7 +3,7 @@
# Everything that looks like a social media link (including YouTube) is run through social-media-extract-profile-link. # Everything that looks like a social media link (including YouTube) is run through social-media-extract-profile-link.
# Everything else is run through website-extract-social-media. # Everything else is run through website-extract-social-media.
# This is done recursively until no new links are discovered anymore. # This is done recursively until no new links are discovered anymore.
# The output is further fed through url-normalise before and during processing to avoid equivalent but slightly different duplicates.
# The output is further fed through url-normalise before and during processing to avoid equivalent but slightly different duplicates, and the output is deduplicated within each section at the end.


verbose= verbose=
while [[ $# -gt 0 ]] while [[ $# -gt 0 ]]
@@ -80,4 +80,4 @@ do
done done
done done
fi fi
done
done | mawk -W interactive '! /^\*/ { print; } /^\*/ && !seen[$0]++ { print; } /^==/ { delete seen; }'

Načítá se…
Zrušit
Uložit