Metadata for the ArchiveTeam Docker Hub repositories
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 3.7 KiB

3 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445
  1. This repository contains scripts for collecting metadata on Docker Hub images. It further contains the metadata itself for the ArchiveTeam-related Docker Hub profiles on the `data` branch.
  2. # Background
  3. Docker Hub only exposes the latest build for each tag. Even though the data for the previous builds still exists and can be `docker pull`ed using the digest (`docker pull namespace/name@sha256:DIGEST`), it is impossible to discover that digest. The only option is therefore to keep a record of those digests while they are displayed on the web interface.
  4. Rather than only targeting the digests, this project attempts to collect all relevant, publicly available metadata for Docker Hub repositories.
  5. # Structure
  6. The `master` branch contains the collection code. The `data` branch contains the metadata with its history.
  7. On the `data` branch, you will find the following structure:
  8. * `dockerhub-metadata.profiles` contains a list of profiles whose repositories are being monitored.
  9. * For each profile, there is a directory with the same name. It contains profile-wide metadata and a `repositories` directory, which in turn has the repository and tag metadata for each repository.
  10. * `dockerhub-metadata.retrieve.log` is the log of the last run.
  11. # Requirements
  12. * Bash 4 or higher
  13. * Python 3.6 or higher
  14. * Python Requests
  15. # Usage
  16. The script requires the abovementioned structure in the directory where it's executed to safeguard against accidental execution in the wrong path. The main reason for this is that the script deletes everything in the current directory to replace it with the new version. Yes, it's hacky. No, I don't care to change it; doing it properly would require diffing the list of targeted Docker Hub profiles and deleting directories and files as appropriate. Ultimately, some deletions have to take place, and they'll never be completely foolproof. Just run it in the right path, and everything is fine.
  17. However, this safeguard implies that setting up the collection initially and changing the list of profiles is slightly annoying.
  18. ## Initial setup
  19. 1. In an empty directory, initialise a git repository (or clone an existing repo, create a new orphan branch, and delete the leftover files from the default branch). Set up a remote branch (or remove the `git push` in the script).
  20. 2. Create a file `dockerhub-metadata.profiles` listing one profile to be covered per line. Commit.
  21. 3. Create an empty file `dockerhub-metadata.retrieve.log` (e.g. with `touch`) and an empty directory for each profile (e.g. with Bash: `readarray -t profiles <dockerhub-metadata.profiles; mkdir "${profiles[@]}"`).
  22. 4. `/path/to/this/directory/run` (without changing the directory)
  23. ## Changes
  24. To add a profile to the list to be covered, add it to the `dockerhub-metadata.profiles` file and create the corresponding directory.
  25. To remove a profile from the list, remove it from that file and delete its directory.
  26. It is recommended to commit any changes to the profiles list manually. The next run will include it anyway, but it makes the intent clearer. It is also recommended to leave the data directory changes uncommitted and let the script handle that part.
  27. # License
  28. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
  29. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
  30. You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.