Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
R
replds2
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Container Registry
Model registry
Monitor
Service Desk
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ai3
tools
replds2
Commits
a008e1cf
Commit
a008e1cf
authored
3 years ago
by
ale
Browse files
Options
Downloads
Patches
Plain Diff
Extend README with info on semantics, authentication, acls, etc
parent
d9105cf3
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+143
-8
143 additions, 8 deletions
README.md
with
143 additions
and
8 deletions
README.md
+
143
−
8
View file @
a008e1cf
replds (v2)
===
Minimalistic filesystem-like replicated data
base, with
*last-write-wins*
semantics, authentication and ACLs. It offers a
Minimalistic filesystem-like replicated data
distribution service,
with
*last-write-wins*
semantics, authentication and ACLs. It offers a
simple client to replicate locally (on the filesystem) parts of the
database, and trigger the execution of shell scripts when they change.
database, and trigger the execution of shell scripts when they change,
which makes it particularly suited for the distribution of
*secrets*
that need to be modified by online automation.
The last-write-wins model makes it suitable for scenarios where there
is either a single writer, or the key space is completely
partitioned. The absence of transactions
makes it not quite a
*real*
database:
it is best to think
about it
as a
*data distribution*
partitioned. The absence of transactions
or other database-like
features is why
it is best to think
of replds
as a
*data distribution*
solution, with a familiar filesystem-like data model.
The current limitations of the internal replication protocols makes it
only suitable for small-ish datasets (i.e., fewer than 100k objects,
less than 10G total size).
only suitable for small-ish datasets (i.e., fewer than 100k objects, a
few GBs total size). See the
*Replication semantics*
section below for
further details.
# Building
...
...
@@ -28,7 +31,22 @@ sufficiently recent Go environment (>= 1.14):
go build ./cmd/replds
```
# Quick tutorial
# Running
All the functionality is available in the
*replds*
binary, which has
subcommands to run servers, clients, etc. You can run
```
replds help
```
to get a quick summary of the available commands and options.
## Quick tutorial
This tutorial will set up a local
*replds*
service cluster with
multiple instances, without authentication or ACLs, and will show how
to perform some simple operations.
Let's start by running the server. We'll use the
*test.sh*
script to
start 3 instances of "replds server", on different ports, that talk to
...
...
@@ -79,3 +97,120 @@ replds pull --store=./sub --server=localhost:12100 /data/sub
The "replds pull" program will keep running to receive incremental
updates, but now the
*sub*
local directory should have the expected
contents (files
*one*
and
*
two).
## Replication semantics
The service implements a
*last-write-wins*
conflict resolution
approach, which favors availability as opposed to consistency, to the
extent that there is
**no**
consistency guarantee whatsoever: there
are no transactions, there is no serialization. The API contract is
the following:
*
Each object in the data store (
*node*
, in internal terminology) has
a
*version*
associated with it.
*
Clients are guaranteed that the version of objects they see will
always increase monotonically: that is, they will never revert to
seeing an "older" version of an object.
*
The above applies to each object separately: if you store objects A
and B in a single API call, there is no guarantee that clients will
see A and B updated at the same time (though the current
implementation tries really hard to make it so).
Replds instances synchronize updates between each other using a
gossip-like protocol, with periodic data exchanges. The current
implementation of this protocol is very simple and
**highly**
inefficient: the periodic data exchange includes a representation of
the entire contents of the data store, so bandwidth usage scales with
the number of objects in the database! Also, due to the randomness
intrinsic to the gossip protocol, update propagation delays can only
be expressed statistically, so it is best to limit the number of
instances to a small value.
There is no concept of consensus, so writes will always succeed: the
service relies on internal synchronization to propagate the update to
all instances. In the presence of a network partition, clients may not
see these updates until the partition is resolved. This makes it
pretty much impossible for clients to implement read / modify / update
cycles, even with a single writer, unless they maintain local state
(possibly using a Watcher).
The instances store node data on disk, but keep all the nodes'
metadata in memory for speed, although using a highly efficient radix
tree (compressed trie) data structure.
The maximum data size is limited by the inefficiency of the
synchronization protocol, and the requirement to keep all the node
metadata in memory at all times. The current implementation is also
not very smart in that it needs to hold the whole dataset (including
node data!) in memory when doing an initial synchronization pass, to
serialize it into protobufs. All this places limits on dataset size
both in terms of number of entities (say, less than 10k) and total
size of the data (a few GBs).
## API
The public replds API is a GRPC API, and is defined in
[
proto/replds.proto
](
proto/replds.proto
)
and consists of two RPC
methods:
*
*Store*
, to upload data to the service. The "replds store" command
implements a simple command-line client for this method.
*
*Watch*
, for incremental synchronization of parts of the data
store. This is a long-lived streaming RPC that implements an
asynchronous replication protocol: the clients sends a summary of
the initial state of its database, then listens for streaming
incremental updates. The "replds pull" command implements this
method.
## Deployment
In order to provide high availability, you're supposed to run multiple
instances of
*replds server*
. The cluster model is static, i.e. each
instance needs to be told explicitly about the others using the
*--peer*
command-line option.
Clients can talk to any instance, both for read and write API calls.
### Authentication
Replds supports authentication, both of clients and server instances,
using mTLS (mutually-authenticated TLS). Clients, including server
instances when they talk to other server instances, need a valid TLS
client certificate, signed by a specific CA.
Furthermore, the identity associated with the client certificate, in
the form of the Common Name (CN) part of the X509 subject, can be used
in
*access control lists*
(ACLs), to limit which clients can access
parts of the data store. In replds, an ACL consists of three parts:
*
identity
*
path
*
operation (read / write)
when ACLs are defined, access is
**denied by default**
, and ACL rules
are used to allow specific accesses. Paths in ACL rules are prefixes:
an ACL for /foo will also allow access to /foo/bar etc.
ACLs are configured in a text file, one per line, with the following
format:
> *identity* *path* *operation*
where operation can be either
`read`
or
`write`
(possibly abbreviated
to
`r`
or
`w`
). Pass this file to "replds server" using the
*--acls*
option. Lines that start with a # are considered comments and ignored.
### Triggers
The "replds pull" command can execute user-defined triggers (shell
commands) when certain parts of the data store tree are updated. This
can be useful, for instance, to restart services with new credentials.
To load triggers, specify a directory with
*--triggers-dir*
. Replds
will load all the files from this directory (whose name does not
contain a dot, so no extensions should be used) as JSON-encoded files,
each one should be a dictionary with two attributes:
*
*path*
, specifying the path prefix that will activate this trigger;
*
*command*
, the shell command to execute.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment