README.md 6.09 KB
Newer Older
ale's avatar
ale committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
replds
====

Maintains a (small) set of files, replicated across multiple servers.
It is targeted at small datasets that are managed by automation
workflows and need to be propagated to machines at runtime.

Data replication is eventually consistent, conflict resolution applies
*last-write-wins* semantics. Writes are immediately forwarded to all
peers, but at most one copy must succeed in order for the write to be
acknowleged successfully. The last written data will appear on all
nodes as soon as network partitions are resolved.

Given the replication model, this is not safe to use with multiple
writers on overlapping key space. For read-modify-update workflows, it
is best to implement a separate locking mechanism so that only a
single workflow accesses the data at any given time (since there is no
locking in this service itself, this is necessary to prevent
out-of-order unexpected updates).

There is no dynamic cluster control: the full list of peers must be
provided to each daemon. This suggests the usage of a configuration
management system to generate the daemon configuration.

## Configuration

The *replds* tool requires a YAML-encoded configuration file (which
you can specify using the *--config* command-line option). This file
should contain the following attributes:

* `client` - configuration for the *replds* client commands
  * `url` - service URL (the hostname can resolve to multiple IP addresses)
  * `tls` - TLS configuration for the client
    * `cert` - path to the certificate
    * `key` - path to the private key
    * `ca` - path to the CA file
* `server` - configuration for the *replds* server command
  * `path` - path of the locally managed repository
  * `peers` - list of URLs of cluster peers
  * `tls_client` - TLS configuration for the peer-to-peer client
    * `cert` - path to the certificate
    * `key` - path to the private key
    * `ca` - path to the CA file
* `http_server` - configuration for the HTTP server
  * `tls` - server-side TLS configuration
    * `cert` - path to the server certificate
    * `key` - path to the server's private key
    * `ca` - path to the CA used to validate clients
    * `acl` - TLS-based access controls, a list of entries with the
      following attributes:
      * `path` is a regular expression to match the request URL path
      * `cn` is a regular expression that must match the CommonName
        part of the subject of the client certificate
  * `max_inflight_requests` - maximum number of in-flight requests to
    allow before server-side throttling kicks in

## TLS Setup

For safe usage, you will want to secure peer-to-peer and
client-to-peer communication with TLS, with separate
credentials. Then, you can set ACLs to only allow the */api/internal/*
URL prefix for peers, and everything else under */api/* for all
clients.
ale's avatar
ale committed
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124

## Service integration

The replication strategy adopted by *replds* puts severe limits on how
it can be used, however there are at least two useful use cases that
we'd like to examine in more detail. In both cases, there is a single
*master* server that controls the workflow (i.e. the key space is not
partitioned).

### Letsencrypt automation

In this scenario, SSL certificates are automatically generated at
runtime with Letsencrypt (from a cron job), and we need to propagate
them to front-end servers.

This scenario is relatively simple because the timeouts and delays
involved in the workflow are so much greater than propagation delays
and expected fault durations that data convergence is not an issue:
when we refresh a SSL certificate 30 days before its expiration, it's
fine if it gets picked up by application servers within a day or more.

The workflow is going to look like this:

* A cron job (on a single node) examines the local repository to find
  certificates that are about to expire, and renews them using the
  ACME API. We are ignoring the details of the challenge/response
  validation process as they are not relevant to data propagation
  issues.
* The cron job stores the results in *replds*.
* Periodically, the application servers are reloaded to pick up the
  new certificates, possibly via another cron job.

Using an independent data reload cycle, it is potentially possible to
end up in a situation where the application is reloaded when the
certificate and the private key do not (yet) match. One possible
strategy for handling this situation is for the service to crash, and
rely on an automatic service restart policy to keep trying to start it
again until the data is up to date: not optimal perhaps, but simple
and guaranteed to converge.

### Package repository

Here, we need to propagate a Debian package repository across multiple
servers for redundancy. The incoming packages are sent to the *master*
repository server (in our case, over SSH), where some processing takes
place that results in a bunch of files being updated (the new
packages, and the repository metadata). This processing stage needs to
access the entire repository.

We're wrapping external functionality and tools, and they may be
complex enough that we can't simply make them use the replds API, so
we're going to let the tools use the local filesystem as they normally
would. At the same time, we can't just run the repository tools on the
filesystem copy managed by *replds* itself, because in that case we
would not be able to detect changes. So we use a separate *staging
directory* to run the repository tools on, and the final workflow is:

* rsync data from the replds-managed dir to the staging dir;
* run the metadata-generation tools on the staging dir;
* synchronize the data back to replds using the *sync* command.

125
126
127
128
129
130
131
132
133
## Usage

The Debian package comes with a
[replds-instance-create](debian/replds-instance-create) script that
can be used to set up multiple replds instances. For an instance named
*foo*, the script will setup the *replds@foo* systemd service, and it
will create the *replds-foo* user and group. Add users that need to
read the repository files to that group. The configuration will be
read from */etc/replds/foo.yml*.
134
135
136

Note that files created by the daemon will be world-readable by
default. Set the process umask if you wish to restrict this further.