Skip to content
Snippets Groups Projects
Commit dff62ab2 authored by ale's avatar ale
Browse files

Improve the README

parent 4a9da2d5
No related branches found
No related tags found
1 merge request!34V3
Pipeline #53817 failed
......@@ -31,49 +31,115 @@ cautions are necessary in its development:
* having a way to independently push content to replds (which we do,
by way of the *replds* command itself)
So the advantage of *acmeserver* becomes just the integration
between the various components in a single package / binary (and
monitoring, etc).
So the advantage of *acmeserver* becomes just the integration between
the various components in a single package / binary (and monitoring,
etc).
## Usage
The tool accepts its general configuration as a YAML file, which in
turn points at one or more directories containing more YAML files with
certificate request specifications.
The general configuration controls things such as endpoints, RPC
authentication, and general ACME parameters.
### Configuration parameters
* `db_path`: path to the SQLite database
* `config_dirs`: list of paths from where to read certificate requests
* `default_challenge`: default ACME challenge type to request
(default: http-01), if not otherwise specified in certificate
requests
* `testing`: if true, do not contact the ACME API at all, instead
create self-signed certificates. Useful for those scenarios (like
CI) where validation is not possible.
* `acme` controls ACME client parameters:
* `account_key_path`: path to the private key of the ACME
account. If it does not exist, one will be automatically generated
on the first run.
* `email`: email address for ACME account registration
* `directory_url`: optional custom ACME directory URL (we'll use the
production Letsencrypt endpoint by default)
* `key_type`: key type for certificates (*rsa* or *ecdsa*, the default)
* `http`: parameters for http-01 challenge validation:
* `enabled`: enable http-01 validation mechanism (default true)
* `dns`: parameters for dns-01 challenge validation:
* `enabled`: enable dns-01 validation mechanism (default false)
* `nameservers`: list of nameservers to update
* `tsig_key_name`: TSIG key name
* `tsig_key_algo`: TSIG key algorithm
* `tsig_key_secret`: TSIG key secret
* `output` controls the replds client parameters:
* `endpoint`: replds GRPC endpoint URL
* `prefix`: path prefix for the uploaded data
* `tls`: TLS client parameters:
* `cert`: file with certificate
* `key`: file with the private key
* `ca`: file with the CA certificate
* `http_server`: customize the RPC HTTP server
* TBD
### Certificate request parameters
Each YAML file in *config_dirs* (which must have a *.yml* extension)
can contain one or more certificate requests. Each is a dictionary
with the following attributes:
* `names`: list of DNS names for the certificate, must contain at
least one entry. These will be the certificate's subjectAltNames.
* `path`: output path for the resulting certificate data, by default
this will be the first element of *names*.
* `challenge_type`: optionally specify a custom ACME challenge type
for this certificate.
Certificates are uniquely identified by the *names* list. It is a
syntax error to define multiple requests for the same certificate. It
is also an error to define multiple requests with the same *path*.
## Internals
The purpose of this tool is to obtain certificate renewals from an
ACME API at the right times in order to always maintain validity, and
send the results to a storage system, nothing else. The tool does not
concern itself with any form of persistent storage of the
certificates, nor it looks at the state of the storage system in order
to detect what needs to be done. If data goes missing on the storage
for some reason, acmeserver won't be able to detect it, and manual
intervention will be required (some form of --force-renew invocation).
The software needs to maximize robustness as an ACME client, e.g. we
need to play well with the API rate limits, and avoid ending up in
"rate limit deadlocks" where we're not making progress due to constant
streams of erroneous requests (an occasional problem in the previous
iterations of this tool).
We run a state machine for each certificate (uniquely identified by
its full list of names), where every transition has a minimum
execution time used to support scheduled actions in the future. The
state machine is structured as follows:
* RENEW (CERT_MISSING_OR_EXPIRED) - certificate is either missing or
expired, start the renewal process by attempting validation. If an
error occurs here:
* for rate-limit errors, set the execution time to the rate limit
expiration
* any other error (local or remote), use our safe error retry time
* UPLOAD (CERT_UPDATED) - certificate has been obtained by ACME but
needs to be pushed onto storage. If the upload succeeds, the state
machine terminates by setting the state to CERT_MISSING_OR_EXPIRED
and the execution time to the renewal time. If we've gone beyond the
maximum execution time of this state (cert has expired), set state
to CERT_MISSING_OR_EXPIRED with execution time = now, and restart
the process. Otherwise keep retrying the upload.
Note that these are not *states* they are *transitions*, the state is
something acmeserver doesn't really care about, it only cares about
which operations it needs to perform (another way of saying this is
that the state is embedded in the current state of the transition log
itself).
This effectively manages two separate queues. Database-backed queues
are fine for the numbers being considered here (hundreds of
certificates, not millions), though WAL churn due to state transitions
is still a concern, especially when there are errors.
### Database queues
In order to do so, the implementation focuses on the execution of
scheduled *tasks*, which allow us to accurately implement both
short-term and long-term retry behaviors which are critical to proper
error handling.
There are just two types of tasks:
* RENEW - attempts to renew a certificate. If it succeeds, it will
create another RENEW task scheduled a bit before the certificate
expiration time, and an UPLOAD task that will save the certificate
to storage.
* UPLOAD - upload a certificate to the storage backend. It will try to
do so until either the certificate expires (no point in uploading it
anymore), or a new UPLOAD is triggered by the RENEW task with a new
certificate.
Note that these are just *operations* and they do not map to any
specific *state*. Acmeserver has no idea if you currently have a valid
certificate in storage or not: that job is best left to a monitoring
system that sits at the *end* of the certificate transport chain.
### Database implementation
The task queue is backed by a SQLite database. The configuration is
loaded in memory as a temporary table, so we can use it in JOINs and
save quite a bit of code complexity.
The database consists of a single task table, with ID, state,
execution time and max execution time (optional). To make the storage
......@@ -93,7 +159,7 @@ A *reaper* process could also detect which paths are now no longer
used (not the same as unused IDs!), and delete those from storage as
well.
### Database encryption
### Database encryption (TODO)
Since it's possible that the database data could end up in
less-trusted environments, private key material in the db is
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment