accountserver
The accountserver is the main interface used to manage the database of user accounts. Other internal services use it to query and modify user account information (settings, credentials, etc). It implements all the validation and business logic related to accounts.
Motivations for this service stem from our experience with previous account management strategies:
- the necessity for a high-level API on top of the database layer (accounts and resources, instead of nested LDAP objects or SQL tables), isolating applications from the database structure;
- the desire to have a single authoritative implementation of the data validation logic, and to have every change to the database go through it;
- a wish for a cleaner separation between the business logic of complex operations ("disable a resource", "move an account between hosts", etc) and their UI. In our ideal model, user interfaces (web panels, admin tools) are simply thin clients focused on presentation and interaction, and the logic is implemented in RPC servers.
The service is implemented as an RPC service, without any user interface. This approach has been preferred over others (like a library for clients to embed) for a few reasons:
- information on accounts and resources might be aggregated from multiple backends (an LDAP database, Redis, MySQL for Noblogs, etc) and we'd like to limit the proliferation of network flows;
- privilege separation: only the accountserver needs write credentials for the backends;
- a single centralized service offers a simpler target for logging and monitoring.
Data model
The data model offered by accountserver is quite simple: the top-level object is the user (an account). Each user owns a number of resources, which can be of different types. Resources have a loose hierarchical structure, and might themselves contain sub-resources, expressing an association of some kind (as for instance between a website and the associated MySQL database).
This data model is meant to work with our legacy user database, but it is slightly overkill for the simplest cases: for instance a simple email service might consider users and their email accounts to be the same thing, while this model would present them as a user object containing an email resource with the same name. It is however at least easily adaptable to most use cases.
The schema is explicitly defined in types.go.
API
The service API is documented in API.md.
Extending the service
The business logic (account creation, validation, and all the high-level operations defined on them) is currently implemented as Go code within the accountserver itself, in the actions_*.go and validators.go files.
There are specific notes on how to add and modify functionality in CONTRIBUTING.md.
Testing
Running the integration tests requires, beyond a working Go development environment, a JRE: we start a test LDAP server locally (which is written in Java) in order to test the LDAP backend.
On a Debian system this should be enough:
sudo apt install default-jre-headless
Usage
The accountserver daemon simply listens on a port for HTTP(S) requests. Specify the address to listen on with the --addr command-line option.
Configuration
The configuration is stored in a YAML file, by default /etc/accountserver/config.yml. Known variables include:
-
shards
: map of shards by service, for sharded (partitioned) services-
available
: map of available shards by service name, e.g.{"web": ["1", "2"]}
. Used in resource creation. -
allowed
: map of allowed shards by service name
-
-
sso
:-
public_key
: path to file with SSO public key -
domain
: SSO domain -
service
: SSO service for the accountserver -
groups
: list of allowed groups -
admin_group
: a specific group that will be granted admin privileges (the ability to read/write data about different users than oneself)
-
-
http_server
: specifies standard parameters for the HTTP server-
tls
: server-side TLS configuration-
cert
: path to the server certificate -
key
: path to the server's private key -
ca
: path to the CA used to validate clients -
acl
: TLS-based access controls, a list of entries with the following attributes:-
path
is a regular expression to match the request URL path -
cn
is a regular expression that must match the CommonName part of the subject of the client certificate
-
-
-
max_inflight_requests
: maximum number of in-flight requests to allow before server-side throttling kicks in
-
-
user_meta_server
: connection parameters for the user-meta-server backend used to store user audit logs-
url
: URL for the user-meta-server service -
sharded
: if true, requests to the service will be partitioned according to the user's shard attribute -
tls_config
: client TLS configuration-
cert
: path to the client certificate -
key
: path to the private key -
ca
: path to the CA used to validate the server
-
-
-
auto_enable_encryption
: if true, automatically enable user-level encryption when a user changes their primary authentication (password) -
forbidden_usernames
/forbidden_usernames_file
: list (or file) containing forbidden usernames -
forbidden_passwords
/forbidden_passwords_file
: list (or file) containing forbidden passwords -
available_domains
: list of available domains for email resources -
website_root_dir
: root directory of user websites -
min_password_len
: minimum length of passwords (default 8) -
max_password_len
: maximum length of passwords (default 128) -
min_username_len
: minimum username length (default 3) -
max_username_len
: maximum username length (default 64) -
min_backend_uid
: minimum auto-assigned UID (default 1000) -
max_backend_uid
: maximum auto-assigned UID (default 0, disabled) -
ldap
: configuration for the LDAP backend-
uri
: LDAP URI to connect to -
bind_dn
: LDAP bind DN -
bind_pw
/bind_pw_file
: LDAP bind password, or file to read it from -
base_dn
: base DN for all LDAP queries
-
-
pwhash
: password hashing parameters-
algo
: password hashing algorithm, one of argon2 or scrypt -
params
: parameters for the selected hashing algorithm, a map whose values will depend on the chosen algorithm: argon2 requires the time, mem and threads parameters (defaults to 1/4/4); scrypt requires n, r and p (defaults 16384/8/1)
-
-
cache
: cache configuration-
enabled
: if set to true, enable a cache for User objects. Very useful to reduce latencies for backends with complex queries like LDAP (default false, cache is disabled).
-
Distributed operation
In a distributed scenario it might make sense to run multiple instances of the accountserver for reliability. The accountserver however is not a distributed application and does not include a mechanism for managing consensus: furthermore it relies on the characteristics of the underlying storage, which aren't under the accountserver's control (consider for instance the case of using a SQL or LDAP database with asynchronous replication).
The accountserver load is heavily skewed towards reads, and the read and write paths have very different operational characteristics: writes require a centralized accountserver, due to the presence of many read-modify-update cycles in our API. The only way to improve this is to use some highly-available, serialized storage. Writes, however, are infrequent, and are not critical to the operation of the accountserver clients. Reads, instead, are very frequent and require caching for performance and latency reasons. It follows that prioritizing reads over writes would be a reasonable graceful degradation policy for the service.
If the storage layer has any form of read-only high-availability (much easier to achieve, this would be the case for most asynchronously-replicated setups for instance), this is exploitable by the accountserver by providing high-availability for the read path, which is basically a distributed caching problem. Given the tolerances of the upstream applications, the only real issue is the necessary connection between write path and read path which is required for cache invalidation on writes.
The simplest way to make this work is the following:
- assume that the full configuration is available to each accountserver at all times: that is, each accountserver instance has a list of all the other accountserver instances;
- one of the accountserver instances is selected as the leader by some external mechanism (including manually);
- write requests are always forwarded to the leader: this keeps the client API simple, requiring no awareness of the accountserver topology;
- every accountserver instance maintains its own read cache, and reads are always served by the local accountserver, never forwarded;
- the leader accountserver, whenever it accepts a write, sends cache invalidation requests to every other accountserver instance.
The performance of the above is strictly not worse than that of the underlying storage, except for the possibility of serving stale data whenever we lose an invalidation request due to network trouble. This is generally an acceptable risk for our upstream applications.
Configuration
To enable distributed operations set attributes below the replication configuration variable:
-
replication
-
leader_url
: URL of the leader accountserver instance. When this field is set, write requests to this instance will be forwarded (transparently to the caller) to this URL. -
peers
: list of peer URLs for the other accountserver instances. Do not include the current instance in this list, or you will create unexpected feedback loops. -
tls
: client TLS configuration-
cert
: path to the server certificate -
key
: path to the server's private key -
ca
: path to the CA used to validate clients
-
-
Note that setting peers is only necessary if the cache is enabled (see the Configuration section above). Due to implementation details, all instances should share the same setting for cache.enabled.