This repository contains scripts for collecting metadata on Docker Hub images. It further contains the metadata itself for the ArchiveTeam-related Docker Hub profiles on the `data` branch. # Background Docker Hub only exposes the latest build for each tag. Even though the data for the previous builds still exists and can be `docker pull`ed using the digest (`docker pull namespace/name@sha256:DIGEST`), it is impossible to discover that digest. The only option is therefore to keep a record of those digests while they are displayed on the web interface. Rather than only targeting the digests, this project attempts to collect all relevant, publicly available metadata for Docker Hub repositories. # Structure The `master` branch contains the collection code. The `data` branch contains the metadata with its history. On the `data` branch, you will find the following structure: * `dockerhub-metadata.profiles` contains a list of profiles whose repositories are being monitored. * For each profile, there is a directory with the same name. It contains profile-wide metadata and a `repositories` directory, which in turn has the repository and tag metadata for each repository. * `dockerhub-metadata.retrieve.log` is the log of the last run. # Requirements * Bash 4 or higher * Python 3.6 or higher * Python Requests # Usage The script requires the abovementioned structure in the directory where it's executed to safeguard against accidental execution in the wrong path. The main reason for this is that the script deletes everything in the current directory to replace it with the new version. Yes, it's hacky. No, I don't care to change it; doing it properly would require diffing the list of targeted Docker Hub profiles and deleting directories and files as appropriate. Ultimately, some deletions have to take place, and they'll never be completely foolproof. Just run it in the right path, and everything is fine. However, this safeguard implies that setting up the collection initially and changing the list of profiles is slightly annoying. ## Initial setup 1. In an empty directory, initialise a git repository (or clone an existing repo, create a new orphan branch, and delete the leftover files from the default branch). Set up a remote branch (or remove the `git push` in the script). 2. Create a file `dockerhub-metadata.profiles` listing one profile to be covered per line. Commit. 3. Create an empty file `dockerhub-metadata.retrieve.log` (e.g. with `touch`) and an empty directory for each profile (e.g. with Bash: `readarray -t profiles .