Improve the README

dff62ab2 · ale · 4a9da2d5 · dff62ab2
Commit dff62ab2 authored 2 years ago by ale
--- a/README.md
+++ b/README.md
@@ -31,49 +31,115 @@ cautions are necessary in its development:
  * having a way to independently push content to replds (which we do,
    by way of the *replds* command itself)

-  So the advantage of *acmeserver* becomes just the integration
-  between the various components in a single package / binary (and
-  monitoring, etc).
+So the advantage of *acmeserver* becomes just the integration between
+the various components in a single package / binary (and monitoring,
+etc).
+
+## Usage
+
+The tool accepts its general configuration as a YAML file, which in
+turn points at one or more directories containing more YAML files with
+certificate request specifications.
+
+The general configuration controls things such as endpoints, RPC
+authentication, and general ACME parameters.
+
+### Configuration parameters
+
+* `db_path`: path to the SQLite database
+* `config_dirs`: list of paths from where to read certificate requests
+* `default_challenge`: default ACME challenge type to request
+  (default: http-01), if not otherwise specified in certificate
+  requests
+* `testing`: if true, do not contact the ACME API at all, instead
+  create self-signed certificates. Useful for those scenarios (like
+  CI) where validation is not possible.
+* `acme` controls ACME client parameters:
+  * `account_key_path`: path to the private key of the ACME
+    account. If it does not exist, one will be automatically generated
+    on the first run.
+  * `email`: email address for ACME account registration
+  * `directory_url`: optional custom ACME directory URL (we'll use the
+    production Letsencrypt endpoint by default)
+  * `key_type`: key type for certificates (*rsa* or *ecdsa*, the default)
+  * `http`: parameters for http-01 challenge validation:
+    * `enabled`: enable http-01 validation mechanism (default true)
+  * `dns`: parameters for dns-01 challenge validation:
+    * `enabled`: enable dns-01 validation mechanism (default false)
+    * `nameservers`: list of nameservers to update
+    * `tsig_key_name`: TSIG key name
+    * `tsig_key_algo`: TSIG key algorithm
+    * `tsig_key_secret`: TSIG key secret
+* `output` controls the replds client parameters:
+  * `endpoint`: replds GRPC endpoint URL
+  * `prefix`: path prefix for the uploaded data
+  * `tls`: TLS client parameters:
+    * `cert`: file with certificate
+    * `key`: file with the private key
+    * `ca`: file with the CA certificate
+* `http_server`: customize the RPC HTTP server
+  * TBD
+
+### Certificate request parameters
+
+Each YAML file in *config_dirs* (which must have a *.yml* extension)
+can contain one or more certificate requests. Each is a dictionary
+with the following attributes:
+
+* `names`: list of DNS names for the certificate, must contain at
+  least one entry. These will be the certificate's subjectAltNames.
+* `path`: output path for the resulting certificate data, by default
+  this will be the first element of *names*.
+* `challenge_type`: optionally specify a custom ACME challenge type
+  for this certificate.
+
+Certificates are uniquely identified by the *names* list. It is a
+syntax error to define multiple requests for the same certificate. It
+is also an error to define multiple requests with the same *path*.

 ## Internals

+The purpose of this tool is to obtain certificate renewals from an
+ACME API at the right times in order to always maintain validity, and
+send the results to a storage system, nothing else. The tool does not
+concern itself with any form of persistent storage of the
+certificates, nor it looks at the state of the storage system in order
+to detect what needs to be done. If data goes missing on the storage
+for some reason, acmeserver won't be able to detect it, and manual
+intervention will be required (some form of --force-renew invocation).
+
 The software needs to maximize robustness as an ACME client, e.g. we
 need to play well with the API rate limits, and avoid ending up in
 "rate limit deadlocks" where we're not making progress due to constant
 streams of erroneous requests (an occasional problem in the previous
 iterations of this tool).

-We run a state machine for each certificate (uniquely identified by
-its full list of names), where every transition has a minimum
-execution time used to support scheduled actions in the future. The
-state machine is structured as follows:
-
-* RENEW (CERT_MISSING_OR_EXPIRED) - certificate is either missing or
-  expired, start the renewal process by attempting validation. If an
-  error occurs here:
-  * for rate-limit errors, set the execution time to the rate limit
-    expiration
-  * any other error (local or remote), use our safe error retry time
-* UPLOAD (CERT_UPDATED) - certificate has been obtained by ACME but
-  needs to be pushed onto storage. If the upload succeeds, the state
-  machine terminates by setting the state to CERT_MISSING_OR_EXPIRED
-  and the execution time to the renewal time. If we've gone beyond the
-  maximum execution time of this state (cert has expired), set state
-  to CERT_MISSING_OR_EXPIRED with execution time = now, and restart
-  the process. Otherwise keep retrying the upload.
-
-Note that these are not *states* they are *transitions*, the state is
-something acmeserver doesn't really care about, it only cares about
-which operations it needs to perform (another way of saying this is
-that the state is embedded in the current state of the transition log
-itself).
-
-This effectively manages two separate queues. Database-backed queues
-are fine for the numbers being considered here (hundreds of
-certificates, not millions), though WAL churn due to state transitions
-is still a concern, especially when there are errors.
-
-### Database queues
+In order to do so, the implementation focuses on the execution of
+scheduled *tasks*, which allow us to accurately implement both
+short-term and long-term retry behaviors which are critical to proper
+error handling.
+
+There are just two types of tasks:
+
+* RENEW - attempts to renew a certificate. If it succeeds, it will
+  create another RENEW task scheduled a bit before the certificate
+  expiration time, and an UPLOAD task that will save the certificate
+  to storage.
+* UPLOAD - upload a certificate to the storage backend. It will try to
+  do so until either the certificate expires (no point in uploading it
+  anymore), or a new UPLOAD is triggered by the RENEW task with a new
+  certificate.
+
+Note that these are just *operations* and they do not map to any
+specific *state*. Acmeserver has no idea if you currently have a valid
+certificate in storage or not: that job is best left to a monitoring
+system that sits at the *end* of the certificate transport chain.
+
+### Database implementation
+
+The task queue is backed by a SQLite database. The configuration is
+loaded in memory as a temporary table, so we can use it in JOINs and
+save quite a bit of code complexity.

 The database consists of a single task table, with ID, state,
 execution time and max execution time (optional). To make the storage
@@ -93,7 +159,7 @@ A *reaper* process could also detect which paths are now no longer
 used (not the same as unused IDs!), and delete those from storage as
 well.

-### Database encryption
+### Database encryption (TODO)

 Since it's possible that the database data could end up in
 less-trusted environments, private key material in the db is