ai3
accountserver

Repository



accountserver
The accountserver is the main interface used to manage the database
of user accounts. Other internal services use it to query and modify
user account information (settings, credentials, etc). It implements
all the validation and business logic related to accounts.
Motivations for this service stem from our experience with previous
account management strategies:

the necessity for a high-level API on top of the database layer
(accounts and resources, instead of nested LDAP objects or SQL
tables), isolating applications from the database structure;
the desire to have a single authoritative implementation of the data
validation logic, and to have every change to the database go
through it;
a wish for a cleaner separation between the business logic of
complex operations ("disable a resource", "move an account between
hosts", etc) and their UI. In our ideal model, user interfaces (web
panels, admin tools) are simply thin clients focused on presentation
and interaction, and the logic is implemented in RPC servers.

The service is implemented as an RPC service, without any user
interface. This approach has been preferred over others (like a
library for clients to embed) for a few reasons:

information on accounts and resources might be aggregated from
multiple backends (an LDAP database, Redis, MySQL for Noblogs, etc)
and we'd like to limit the proliferation of network flows;
privilege separation: only the accountserver needs write credentials
for the backends;
a single centralized service offers a simpler target for
logging and monitoring.


Data model
The data model offered by accountserver is quite
simple: the top-level object is the user (an account). Each user
owns a number of resources, which can be of different
types. Resources have a loose hierarchical structure, and might
themselves contain sub-resources, expressing an association of some
kind (as for instance between a website and the associated MySQL
database).
This data model is meant to work with our legacy user database, but it
is slightly overkill for the simplest cases: for instance a simple
email service might consider users and their email accounts to be the
same thing, while this model would present them as a user object
containing an email resource with the same name. It is however at
least easily adaptable to most use cases.
The schema is explicitly defined in types.go.

API
The service API is documented in API.md.

Extending the service
The business logic (account creation, validation, and all the
high-level operations defined on them) is currently implemented as Go
code within the accountserver itself, in the actions_*.go and
validators.go files.
There are specific notes on how to add and modify functionality in
CONTRIBUTING.md.

Testing
Running the integration tests requires, beyond a working Go
development environment, a JRE: we start a test LDAP server locally
(which is written in Java) in order to test the LDAP backend.
On a Debian system this should be enough:

sudo apt install default-jre-headless


Usage
The accountserver daemon simply listens on a port for HTTP(S)
requests. Specify the address to listen on with the --addr
command-line option.

Configuration
The configuration is stored in a YAML file, by default
/etc/accountserver/config.yml. Known variables include:


shards: map of shards by service, for sharded (partitioned) services


available: map of available shards by service name,
e.g. {"web": ["1", "2"]}. Used in resource creation.

allowed: map of allowed shards by service name


sso:


public_key: path to file with SSO public key

domain: SSO domain

service: SSO service for the accountserver

groups: list of allowed groups

admin_group: a specific group that will be granted admin privileges
(the ability to read/write data about different users than oneself)


http_server: specifies standard parameters for the HTTP server


tls: server-side TLS configuration


cert: path to the server certificate

key: path to the server's private key

ca: path to the CA used to validate clients

acl: TLS-based access controls, a list of entries with the
following attributes:


path is a regular expression to match the request URL path

cn is a regular expression that must match the CommonName
part of the subject of the client certificate


max_inflight_requests: maximum number of in-flight requests to
allow before server-side throttling kicks in


user_meta_server: connection parameters for
the user-meta-server backend
used to store user audit logs


url: URL for the user-meta-server service

sharded: if true, requests to the service will be
partitioned according to the user's shard attribute

tls_config: client TLS configuration


cert: path to the client certificate

key: path to the private key

ca: path to the CA used to validate the server


auto_enable_encryption: if true, automatically enable user-level
encryption when a user changes their primary authentication
(password)

forbidden_usernames / forbidden_usernames_file: list (or file)
containing forbidden usernames

forbidden_passwords / forbidden_passwords_file: list (or file)
containing forbidden passwords

available_domains: list of available domains for email resources

website_root_dir: root directory of user websites

min_password_len: minimum length of passwords (default 8)

max_password_len: maximum length of passwords (default 128)

min_username_len: minimum username length (default 3)

max_username_len: maximum username length (default 64)

min_backend_uid: minimum auto-assigned UID (default 1000)

max_backend_uid: maximum auto-assigned UID (default 0, disabled)

ldap: configuration for the LDAP backend


uri: LDAP URI to connect to

bind_dn: LDAP bind DN

bind_pw / bind_pw_file: LDAP bind password, or file to read
it from

base_dn: base DN for all LDAP queries


pwhash: password hashing parameters


algo: password hashing algorithm, one of argon2 or scrypt


params: parameters for the selected hashing algorithm, a map
whose values will depend on the chosen algorithm: argon2
requires the time, mem and threads parameters (defaults
to 1/4/4); scrypt requires n, r and p (defaults
16384/8/1)


cache: cache configuration


enabled: if set to true, enable a cache for User objects. Very
useful to reduce latencies for backends with complex queries like
LDAP (default false, cache is disabled).


Distributed operation
In a distributed scenario it might make sense to run multiple
instances of the accountserver for reliability. The accountserver
however is not a distributed application and does not include a
mechanism for managing consensus: furthermore it relies on the
characteristics of the underlying storage, which aren't under the
accountserver's control (consider for instance the case of using a SQL
or LDAP database with asynchronous replication).
The accountserver load is heavily skewed towards reads, and the read
and write paths have very different operational characteristics:
writes require a centralized accountserver, due to the presence of
many read-modify-update cycles in our API. The only way to improve
this is to use some highly-available, serialized storage. Writes,
however, are infrequent, and are not critical to the operation of the
accountserver clients. Reads, instead, are very frequent and require
caching for performance and latency reasons. It follows that
prioritizing reads over writes would be a reasonable graceful
degradation policy for the service.
If the storage layer has any form of read-only high-availability (much
easier to achieve, this would be the case for most
asynchronously-replicated setups for instance), this is exploitable by
the accountserver by providing high-availability for the read path,
which is basically a distributed caching problem. Given the tolerances
of the upstream applications, the only real issue is the necessary
connection between write path and read path which is required for
cache invalidation on writes.
The simplest way to make this work is the following:

assume that the full configuration is available to each
accountserver at all times: that is, each accountserver instance has
a list of all the other accountserver instances;
one of the accountserver instances is selected as the leader by
some external mechanism (including manually);
write requests are always forwarded to the leader: this keeps the
client API simple, requiring no awareness of the accountserver
topology;
every accountserver instance maintains its own read cache, and reads
are always served by the local accountserver, never forwarded;
the leader accountserver, whenever it accepts a write, sends cache
invalidation requests to every other accountserver instance.

The performance of the above is strictly not worse than that of the
underlying storage, except for the possibility of serving stale data
whenever we lose an invalidation request due to network trouble. This
is generally an acceptable risk for our upstream applications.

Configuration
To enable distributed operations set attributes below the
replication configuration variable:


replication


leader_url: URL of the leader accountserver instance. When
this field is set, write requests to this instance will be
forwarded (transparently to the caller) to this URL.

peers: list of peer URLs for the other accountserver
instances. Do not include the current instance in this list, or
you will create unexpected feedback loops.

tls: client TLS configuration


cert: path to the server certificate

key: path to the server's private key

ca: path to the CA used to validate clients


Note that setting peers is only necessary if the cache is enabled
(see the Configuration section above). Due to implementation
details, all instances should share the same setting for
cache.enabled.