Commit 048f7940 authored by ale's avatar ale
Browse files

Merge branch 'modular' into 'master'


See merge request ale/gitlab-docker-autodep!1
parents 51747af8 4bdd388c
......@@ -620,55 +620,3 @@ copy of the Program in return for a fee.
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
Copyright (C) 2018 ale
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <>.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
gitlab-docker-autodep Copyright (C) 2018 ale
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
Automatically rebuild all the downstream dependencies of Docker-based
projects on a Gitlab instance.
*Gitlab-deps* is a simple build orchestration toolkit: it tracks
dependencies between projects on a Gitlab instance and it can
automatically rebuild dependencies when a project pipeline completes
It scans the *master* branch of all repositories containing a
Dockerfile looking for FROM lines and navigates the resulting
dependency tree to find all projects that needs to be rebuilt when
their base image (or an upstream image thereof) changes.
It can track dependencies between projects by either of two supported
* projects can add a `.gitlab-deps` file to the root of their
repository, containing the fully qualified project URLs of their
* gitlab-deps can scan Dockerfiles (in the repository root) and
automatically infer dependencies based on FROM lines.
The implementation depends on HTTP hooks triggered by pipeline events:
gitlab-deps runs a small HTTP server to respond to these requests and
trigger new builds.
By default, since it is meant to be used as a trigger as the last step
in a CI script, it will not navigate the dependency tree recursively
......@@ -17,105 +27,130 @@ has been rebuilt.
# Installation
The tools require Python 3.
Install the tool either in a virtualenv of or system-wide with any of
the standard Python installation mechanisms, for instance (using
sudo python install
sudo python3 install
This will install the *gitlab-docker-autodep* command-line tool in
/usr/local/bin. The tool should work with either Python 2 and Python
3, and it has few dependencies (just the [Gitlab
API](, and
This will install the *gitlab-deps* command-line tool in
/usr/local/bin. The tool has few dependencies (just the [Gitlab
[Flask]( and
# Usage
The tool is split into functional components:
The toolkit is split into functional components (all wrapped in a
single executable with different sub-commands):
* scan Gitlab and generate a dependency map (stored as a JSON file)
* scan Gitlab and generate a dependency map
* manually trigger builds using the dependency map
* run a server that listens for Gitlab notifications and trigger
In all cases, the program is configured via command-line options.
The tools talk to Gitlab using its API, so you're going to need an
admin token in order to create new pipelines.
## Common options
The tool must be pointed at your Gitlab instance with the *--url*
command-line option,
command-line option, or alternatively using the `GITLAB_URL`
environment variable.
You can pass an authentication token using the *--token* or
*--token-file* command-line options. This is usually required in order
to trigger CI pipelines: the access token must have the *api* scope.
to trigger CI pipelines, or to access private projects: the access
token must have at least the *api* scope. Credentials can also be
environment variables.
The tool will only examine Docker images hosted on the Docker registry
associated with the Gitlab instance. By default the registry name is
automatically derived from the server URL (adding a *registry*
prefix), but it can be changed with the *--registry* command-line
### Listing projects
### Scope
The *list-projects* sub-command can be used to create a list of
projects (and their branches) in the Gitlab instance. It provides some
basic functionality for filtering (using the *--search* option), but
it generates output suitable for *grep*, e.g. to filter a specific
path prefix (Gitlab group):
On larger Gitlab instances, parsing Dockerfiles for all projects can
be an expensive (long) operation. The program offers two options to
manage the scope of the dependency analysis: *--match* and *--filter*.
gitlab-deps list-projects | grep ^path/to/my/group/
The former, *--match*, allows for filtering the project list on the
server side, using a Gitlab search query. The latter, *--filter*,
applies a regular expression to the project names (including
namespaces) before parsing their dependencies. Combining the two, for
example, it is possible to efficiently limit the scope of the tool to
a specific namespace:
or to only select "master" branches:
gitlab-docker-autodep deps --match myns --filter ^myns/ ...
gitlab-deps list-projects | grep ':master$'
Note that, when building the dependency tree:
The output from this command is just a list of project paths (with
namespaces) and branch names, separated by a ':', one per line:
* tags in FROM lines are ignored
* only the *master* branch of repositories is scanned for Dockerfiles
This might lead to more rebuilds than strictly necessary.
## Computing dependencies
## Command-line
The *deps* sub-command will scan the projects and their repositories,
and it will produce a list of all the edges in the dependency
graph. It takes a list of project_path:branch specs as input (as
produced by the *list-projects* sub-command), and it will produce a
list of edges as whitespace-separated project:branch pairs, e.g.:
The `rebuild` command will trigger a rebuild of all the dependencies
of a given Docker image, possibly waiting for the CI pipelines to
complete. Pass the qualified Docker image name (as it appears on FROM
lines in Dockerfiles) as a command-line argument.
project:master dependency1:master
project:master dependency2:master
The output format is once again meant to be processed with standard
UNIX tools such as *awk* and *grep*.
The tool will print out the project names of the dependencies it
found. The *--recurse* option will traverse the dependency tree
recursively, waiting for CI pipelines to complete so that they are
built in the right order.
## Configuring pipeline_events hooks
It is possible to limit the scope of the initial dependency scan
(which is an expensive operation) to projects matching a Gitlab search
keyword using the *--match* option.
To work, gitlab-deps needs a HTTP hook for pipeline_events on all
projects that have dependencies. Since setting this up in Gitlab is a
manual and laborious process, the *set-hooks* sub-command is provided
to do this automatically using the API. The intended usage is to run
it on the right-hand side of the dependency edges (i.e. the list of
projects/branches that actually have dependencies):
gitlab-deps deps | awk '{print $2}' | gitlab-deps set-hooks
## Gitlab CI integration
## One-off rebuilds
In order to automatically rebuild the dependencies of a Docker image
built using Gitlab CI, it is possible to run *gitlab-docker-autodep*
as a webhook: this way, whenever a successful CI pipeline completes,
you can trigger a rebuild of the dependencies.
The *rebuild* sub-Command will trigger a rebuild of all the
dependencies of a given project, possibly waiting for the CI pipelines
to complete. Pass a qualified project name and branch as a
command-line argument. The dependency graph (list of edges as produced
by the *deps* sub-command) must also be provided, either as a file or
on standard input.
To do this, use the *server* command of *gitlab-docker-autodep*, and
specify the address to bind to using the *--host* and *--port*
options. It is also possible to enforce authentication of the webhook
with a secret token
The *--recurse* option will traverse the dependency tree recursively,
waiting for CI pipelines to complete so that they are built in the
right order.
## Running the server
The gitlab-deps tool has a *server* command to start a simple HTTP
server that receives the pipeline_events webhooks from Gitlab, and
trigger builds for project dependencies.
The *server* command requires an address to bind to, specified using
the *--host* and *--port* options. It is also possible to enforce
authentication of the webhook with a secret token
using the *--webhook-auth-token* option.
When running in this mode, it is assumed that all your Docker-related
projects have webhooks set up to rebuild their dependencies, so
*gitlab-docker-autodep* will only trigger a build of the immediate
dependencies of a project.
Also note that the server does not have any TLS support: if necessary,
it is best to use a dedicated reverse proxy (Apache, NGINX, etc).
......@@ -124,35 +159,13 @@ machine as Gitlab itself, and that the Gitlab authentication token is
stored in */etc/gitlab_docker_token*:
gitlab-docker-autodep \
gitlab-deps deps \
| gitlab-deps server \
--url=https://my.gitlab \
--token-file=/etc/gitlab_docker_token \
server \
--host= --port=14001
You can then configure your project's webhooks with the URL
`http://localhost:14001/`, with the *Trigger* checkbox set only
on *Pipeline events*.
Then you should generate the *deps.json* dependency map periodically,
for instance with a cron job:
*/30 * * * * root gitlab-docker-autodep
deps > deps.json
It can be useful to run the *rebuild* command from a cron job, for
instance in order to rebuild images on a periodic schedule, and
assuming all your projects share a common base image:
50 5 * * * root gitlab-docker-autodep
rebuild $MY_BASE_IMAGE
If configuring webhooks manually (rather than with *set-hooks*),
create a new webhook with the URL `http://localhost:14001/`, and with
the *Trigger* checkbox set only on *Pipeline events*.
import re
def split_project_branch(project_with_branch):
if ':' in project_with_branch:
p, b = project_with_branch.split(':')
return p, b
return project_with_branch, DEFAULT_BRANCH
def list_projects(gl, search_pattern):
projects = gl.projects.list(
for p in projects:
yield p.path_with_namespace
def get_branches(gl, project_names):
for path_with_namespace in project_names:
p = gl.projects.get(path_with_namespace)
for b in p.branches.list():
yield (path_with_namespace,
def has_ci(gl, project_path, branch_name):
p = gl.projects.get(project_path)
p.files.get(file_path='.gitlab-ci.yml', ref=branch_name)
return True
except Exception:
return False
_from_rx = re.compile(r'^FROM\s+(\S+).*$', re.MULTILINE)
def get_docker_deps(gl, project_path, branch_name):
p = gl.projects.get(project_path)
f = p.files.get(file_path='Dockerfile', ref=branch_name)
return _from_rx.findall(f.decode().decode('utf-8'))
except Exception:
return []
def get_explicit_deps(gl, project_path, branch_name):
p = gl.projects.get(project_path)
f = p.files.get(file_path='.gitlab-deps', ref=branch_name)
return f.decode().decode('utf-8').split('\n')
except Exception:
return []
_docker_image_rx = re.compile(r'^([^/]*)(/([^:]*))?(:(.*))?$')
def docker_image_to_project(docker_image, registry_hostname):
m = _docker_image_rx.match(docker_image)
if m and m[1] == registry_hostname:
# The branch is the tag, except for 'latest'
if not m[5] or m[5] == 'latest':
branch = m[5]
return m[3], branch
return None, None
_url_rx = re.compile(r'^(https?://[^/]+/)([^:]+)(:.*)?$')
def url_to_project(url, gitlab_url):
m = _url_rx.match(url)
if m and m[1] == gitlab_url:
return m[2], m[3] or DEFAULT_BRANCH
def not_null(l):
return filter(None, l)
def get_deps(gl, gitlab_url, registry_hostname, project_path, branch_name):
deps = []
url_to_project(url, gitlab_url)
for url in get_explicit_deps(gl, project_path, branch_name)))
docker_image_to_project(img, registry_hostname)
for img in get_docker_deps(gl, project_path, branch_name)))
return deps
def list_deps(gl, gitlab_url, registry_hostname, projects):
for project_path, branch_name in projects:
deps = get_deps(gl, gitlab_url, registry_hostname,
project_path, branch_name)
for dep_path, dep_branch in deps:
print(f'{project_path}:{branch_name} {dep_path}:{dep_branch}')
def read_deps(fd):
deps = {}
for line in fd:
src, dst = line.strip().split()
src_project, src_branch = split_project_branch(src)
dst_project, dst_branch = split_project_branch(dst)
deps.setdefault((src_project, src_branch), []).append(
(dst_project, dst_branch))
return deps
import logging
def check_hook(gl, hook_url, webhook_token, project_path, dry_run):
project = gl.projects.get(project_path)
found = False
for h in project.hooks.list():
if h.url == hook_url and h.pipeline_events:
found = True
if found:
return'adding pipeline_events hook to %s', project_path)
if not dry_run:
import argparse
import gitlab
import logging
import os
import sys
from urllib.parse import urlsplit
from .deps import get_branches, list_projects, list_deps, \
split_project_branch, read_deps
from .hooks import check_hook
from .rebuild import rebuild_deps
from .server import run_app
def _fmtdesc(s):
return s.strip()
def main():
parser = argparse.ArgumentParser(
description='Manage Gitlab project dependencies and trigger pipelines.')
subparsers = parser.add_subparsers(dest='subparser')
# Common options.
common_parser = argparse.ArgumentParser(add_help=False)
'--debug', action='store_true',
help='increase logging level')
'-n', '--dry-run', action='store_true', dest='dry_run',
help='only show what would be done')
gitlab_opts_group = common_parser.add_argument_group('gitlab options')
'--url', metavar='URL', help='Gitlab URL',
'--token-file', metavar='FILE',
help='file containing the Gitlab authentication token')
'--token', metavar='TOKEN',
help='Gitlab authentication token')
# List projects.
list_projects_parser = subparsers.add_parser(
help='list projects',
List all projects and their branches on the Gitlab instance.
The output is a list of project paths with all their branches, separated
by a colon, one per line. Since the Gitlab 'search' API is quite
coarse, you can then filter the output for specific projects or branches
using 'grep', e.g.:
gitlab-deps list-projects | grep ^path/to/my/group/
gitlab-deps list-projects | grep ':master$'
help='search query used to filter project list on the server side')
# Compute deps.
deps_parser = subparsers.add_parser(
help='build dependency map',
Generate a map of dependencies between projects on a
Gitlab instance.
The input must consist of a list of projects along with their
branches, separated by a colon, one per line. If the branch is
unspecified, 'master' is assumed.
The output consists of pairs of project / dependency (so, these are
'forward' dependencies), for all projects/branches specified in the
To obtain a list of reverse dependencies, one can simply swap the
columns in the output, e.g.:
gitlab-deps deps < project.list | awk '{print $2, $1}'
'''), epilog=_fmtdesc('''
Input can be read from a file (if passed as an argument), or
from standard input if a filename is omitted or specified as '-'.
'--registry', metavar='NAME',
help='Docker registry hostname (if empty, it will be '
'automatically derived from --url)')
nargs='?', default=sys.stdin)
# Setup pipeline hooks on the specified projects.
set_hooks_parser = subparsers.add_parser(
help='set pipeline hooks on projects',
Set a HTTP hook for pipeline_events on the specified projects.
Takes a list of projects (optional branch specifiers will be ignored)
as input. Pipeline hooks are required by 'gitlab-deps server' to
trigger dependent builds, so a common way to use this command is to
feed it the right-hand side of the 'gitlab-deps deps' output, e.g.:
gitlab-deps deps < project.list \\
| awk '{print $2}' \\
| gitlab-deps set-hooks --hook-url=...
using --hook-url to point at the URL of 'gitlab-deps server'.
'''), epilog=_fmtdesc('''
Input can be read from a file (if passed as an argument), or
from standard input if a filename is omitted or specified as '-'.
'--hook-url', metavar='URL',
help='URL for the pipeline HTTP hook')
'--webhook-auth-token', metavar='TOKEN',
help='secret X-Gitlab-Token for request authentication')
nargs='?', default=sys.stdin)
# Trigger rebuilds of reverse deps.
rebuild_image_parser = subparsers.add_parser(
help='rebuild dependencies of a project',
Rebuild all projects that depend on the specified project.
Takes a single project path as argument, and triggers a rebuild of its
direct dependencies. Useful for one-off rebuilds.
If the --recurse option is provided, the tool will wait for completion
of the pipeline and recursively trigger its dependencies too,
navigating the entire dependency tree.
'''), epilog=_fmtdesc('''
Project dependencies can be read from a file (if passed as an
argument), or from standard input if a filename is omitted or
specified as '-'.
'--recurse', action='store_true',
help='include all dependencies recursively '
'and wait for completion of the pipelines')
help='project name (relative path, with optional branch)')
nargs='?', default=sys.stdin)
# Server.
server_parser = subparsers.add_parser(
help='start the HTTP server',
Start an HTTP server that listens for Gitlab webhooks.