Commit e533b12b authored by ale's avatar ale

Initial import from old repository

parents
.vagrant/
ansible.log
*.retry
FLOAT
====
*float* is a configuration management toolkit to manage
container-based services on bare-metal hardware. It is implemented as
a series of Ansible plugins and roles that you should use from your
own Ansible configuration.
# Documentation
More detailed documentation about *float* is available in the *docs/*
subdirectory:
* [Guide to Ansible integration](docs/ansible.md)
* [Configuration reference](docs/configuration.md)
* [Infrastructure services](docs/service_mesh.md)
* [HTTP router](docs/http_router.md)
Ansible
=======
This document describes how the infrastructure uses Ansible, how to
use it as a user, and how to extend it as a service developer.
# Features
The toolkit is implemented as a set of Ansible plugins and roles,
meant to be integrated into your own Ansible configuration. These
plugins and roles augment the Ansible functionality in a few useful
ways:
### GPG integration
You can use GPG to encrypt files containing host and group
variables. Such files must have a `.yml.gpg` extension. Ansible will
then decrypt them at runtime (use of gpg-agent is advised). Same holds
for the *ansible-vault* password file.
Useful when you're using GPG to manage the root trust repository.
This functionality is implemented by the
[gpg_vars](../plugins/vars/gpg_vars.py) plugin.
### Automated credentials management
Service-level credentials are automatically managed by the toolkit and
are encrypted with *ansible-vault*. This includes an internal X509 PKI
for TLS service authentication, a SSH PKI for hosts, and application
credentials defined in *passwords.yml*. See the *Credentials* section
of the [configuration reference](configuration.md) for specific
details.
All autogenerated credentials are stored in the *credentials_dir*
specified in the top-level Float configuration, which will usually
point at a separate git repository (or a temporary directory for test
environments).
### Integration of services and Ansible roles
The toolkit defines Ansible host groups for each service, to make it
easy to customize services with Ansible roles. For instance, suppose
that a service *foo* is defined in *services.yml*, and you have
created your own *foo* Ansible role to go with it (with some
foo-specific host setup). You can then tie the two together in your
playbook by making use of the *foo* host group:
```
- hosts: foo
roles:
- foo
```
# Usage
The toolkit lets you define container-based services, which may not
require any configuration beyond the service definition, as well as
Ansible-based services. It does so primarily by taking over the
Ansible inventory (see the [configuration reference](configuration.md)
for details). It is expected that you will have your own Ansible
configuration, with service-specific roles and configuration,
extending the base roles and making use of the toolkit's features.
So, to use the toolkit, you will have to include it from your own
Ansible configuration, and specifying the inventory and service
configuration in our own format.
There are some (minimal) requirements on how your Ansible environment
should be set up for this to work:
* you must have a *group_vars/all* directory (this is where we'll
write the autogenerated application credentials)
* you must include *playbooks/all.yml* from the toolkit source
directory at the beginning of your playbook
* you must use the *run-playbook* wrapper instead of running
*ansible-playbook*
## Ansible environment setup how-to
Let's walk through creating an example Ansible configuration for your
project.
First, check out the base *float* repository somewhere. We'll store
everything related to this project below a top-level directory called
*~/myproject*. We'll put the *float* repository in the *float*
subdirectory.
$ mkdir ~/myproject
$ git clone ... ~/myproject/float
Let's create the directory with our own Ansible configuration in the
*ansible* subdirectory:
$ mkdir ~/myproject/ansible
And put a top-level Ansible configuration file (*ansible.cfg*) in
there that refers to the toolkit repository location:
```
$ cat >ansible.cfg <<EOF
[defaults]
roles_path = ../float/roles:./roles
inventory_plugins = ../float/plugins/inventory
action_plugins = ../float/plugins/action
vars_plugins = ../float/plugins/vars
[inventory]
enable_plugins = float
EOF
```
This will look for plugins and base roles in *~/myproject/float*, and
it will load our own Ansible roles and config from
*~/myproject/ansible*.
We're going to need a place to store global configuration, *float*
requires to have a *group_vars/all* directory anyway, so we can use
that and put some global variables in
*~/myproject/ansible/group_vars/all/config.yml*:
```
$ mkdir -p ~/myproject/ansible/group_vars/all
$ cat > ~/myproject/ansible/group_vars/all/config.yml <<EOF
---
domain: internal.myproject.org
domain_public:
- myproject.org
EOF
```
Then you can create the main configuration file (*float.yml*), the
host inventory (*hosts.yml*), and the service definition
(*services.yml*). Check out the [configuration
reference](configuration.md) for details.
Finally, we are going to set up a basic playbook in *site.yml* that
will just run all the playbooks in the main repository:
```
---
- import_playbook: ../float/playbooks/all.yml
```
Now you can create your own service-specific Ansible configuration and
roles based on this skeleton.
## Running playbooks
The *run-playbook* wrapper makes some necessary fixes to the
environment and invokes *ansible-playbook* with the same command-line
arguments, so you should use it whenever you would use
*ansible-playbook*.
The ansible-vault setup is mandatory, so you are going to have to pass
the location of the ansible-vault encryption passphrase to Ansible via
the environment. Just use the `ANSIBLE_VAULT_PASSWORD_FILE` variable
as you normally would, with one additional feature: if the filename
ends in *.gpg*, the passphrase will be decrypted using GPG.
With respect to the previous example:
```
$ echo secret > vault-pw
$ ANSIBLE_VAULT_PASSWORD_FILE=vault-pw \
../float/run-playbook -i config.yml site.yml
```
## Initialize the permanent credentials
Before you can run Ansible to set up the services in your config,
there is one more step that needs to be done. In order to bootstrap
the internal PKIs, and generate the application credentials (which are
also valid forever, or until revoked of course), you need to invoke
the playbook in *playbooks/init-credentials.yml*:
```
$ ANSIBLE_VAULT_PASSWORD_FILE=vault-pw \
../float/run-playbook -i config.yml ../float/playbooks/init-credentials.yml
```
This will write a bunch of files in your *credentials_dir*, including
the private keys for the various PKIs (X509, SSH, etc), and a
*secrets.yml* file containing the autogenerated application
credentials.
These files are of course to be kept private when setting up a
production environment.
## Credentials
The system uses two major types of credentials:
* *managed* credentials for accounts, services, etc - these can be
autogenerated automatically, based on the top-level description in
*passwords.yml*. They are stored encrypted with *ansible-vault*.
* *root* credentials, that need to be provided externally, including
for instance the *ansible-vault* password used for managed
credentials, and other third-party secrets (like credentials for a
private Docker registry, etc).
All services that require per-host secrets, such as SSH and the
internal X509 PKI, manage those secrets directly on the hosts
themselves, renewing them when necessary. Those secrets are not stored
in a central location.
This means that for normal usage (i.e. except when new credentials are
added), the credentials repository is read-only, which makes it easier
to integrate deployment with CI systems.
The expectation is that, for production environments, it will be saved
in private git repositories. Temporary setups such as test
environments, on the other hand, need no persistence as managed
credentials can simply be re-created every time.
# Implementation details
These are details of how parts of the infrastructure is implemented,
useful if you want to understand how a service is deployed, or how to
write a new one.
## Scheduler
The *float* service scheduler sets a large number of host variables and
global configuration parameters. In the Ansible host scope, the
following variables will be defined:
* `services` holds all the service metadata, in a dictionary indexed
by service name;
* `service_assignments` is a {service: [hosts]} dictionary with all
the service assignments;
* `enable_<service>` for each service that evaluates to
true on the hosts assigned to the service (note: dashes in the
service name are converted to underscores);
* `enabled_services` contains the list of enabled services on this
host;
* `disabled_services` contains the list of disabled services on this host;
* `enabled_containers` contains a list of dictionaries describing the
containers that should be active on this host. The dictionaries have
the following attributes:
* `service` is the service metadata
* `container` is the container metadata
* `disabled_containers` has the same information as the above, but for
disabled containers;
* `<service>_master` is true on the host where the master instance is
scheduled, and false elsewhere. This variable is only defined for
services using static master election (i.e. where *master_election*
is true in the service metadata).
The scheduler also defines new dynamic Ansible groups based on
service assignments:
* For each service, create a host group named after the service, whose
members are the hosts assigned to the service.
* For each network overlay defined in the inventory, create a host
group named `overlay-<name>` whose members are the hosts on that
overlay.
# Writing new service roles
Configuration
=============
The toolkit configuration is split into two parts, the *service
metadata*, containing definitions of the known services, and a *host
inventory*, with information about each host. A number of global
variables are also required, to customize the results for your
application.
All files are YAML-encoded and should usually have a *.yml* extension.
# Main configuration
The service and host configuration is then turned into an Ansible
inventory and a bunch of Ansible variables by an [inventory
plugin](../plugins/inventory/float.py). The toolkit is configured with
a single main configuration file, pointing at the other resources,
which will be used as the *inventory* in the Ansible command-line
tool.
A minimal example of a working config:
```
---
services_file: services.yml
hosts_file: hosts.yml
credentials_dir: credentials/
plugin: float
```
The attributes supported are:
`services_file` points at the location of the file containing the
service metadata definition.
`hosts_file` points at the location of the hosts inventory.
`passwords_file` points at the configuration of the application
credentials (passwords).
`credentials_dir` points at the directory where autogenerated
service-level credentials (PKI-related) will be stored. This is often
a separate git repository.
`plugin` must always have the literal value *float*.
Relative paths are interpreted as relative to the directory containing
the main configuration file itself.
# Host configuration (inventory)
The inventory file defines *hosts* and *groups*, and custom variables
associated with those. It's just another way of defining an Ansible
inventory that is easy for us to inspect programatically.
The groups defined here can be used in your own Ansible playbook, but
most importantly are used in *services.yml* to make scheduling
decisions (see [Scheduling](#scheduling) below).
The inventory file must contain a dictionary encoded in YAML
format. The top-level attributes supported are:
`hosts` must contain a dictionary of *name*: *attributes* pairs
defining all the hosts in the inventory;
`group_vars` can contain a dictionary of *group\_name*: *attributes*
pairs that define group variables.
## Host variables
Variables can be Ansible variables: SSH parameters, etc., usually with
an *ansible_* prefix. But some host variables are special:
`ip` (mandatory) is the public IPv4 address of this host, the one that
other hosts can use to reach it over the public Internet
`ip6` (optional) is the IPv6 version of the above
`ip_<name>` (optional) defines the IPv4 address for this host on the
overlay network named *name*
`groups` (optional) is a list of groups that this host should be a
member of.
## Example
An example of a valid inventory file (for a hypotetic Vagrant
environment):
```yaml
---
hosts:
host1:
ansible_host: 192.168.10.10
ip: 192.168.10.10
groups: [vagrant]
host2:
ansible_host: 192.168.10.11
ip: 192.168.10.11
groups: [vagrant]
group_vars:
vagrant:
ansible_become: true
ansible_user: vagrant
ansible_ssh_private_key_file: "~/.vagrant.d/insecure_private_key"
```
This defines two hosts (*host1* and *host2*), both part of the
*vagrant* group. Some Ansible variables are defined, both at the host
and the group level, to set Vagrant-specific connection parameters.
# Service metadata
The service metadata file (*services.yml*) is a dictionary encoded in
YAML format, where keys are service names and values contain the
associated metadata. This file is consumed by the static service
scheduler that assigns services to hosts, and by the Ansible
automation in order to define configuration variables.
Metadata for services that are part of the core infrastructure ships
embedded with the software, so when writing your own `services.yml`
file, you only need to add your services to it.
Service metadata is encoded as a dictionary of *service name*:
*service attributes* pairs, each defining a separate
service. Supported attributes can be grouped in categories for
clarity:
### Scheduling
Attributes that affect how a service is scheduled on the available
hosts.
`scheduling_group`: Only schedule the service on hosts of the
specified host group. By default, schedule on all hosts.
`num_instances`: Run a limited number of instances of the service
(selected among the hosts identified by the `scheduling_group`). By
default this is set to `all`, which will run an instance on every
host.
`master_election`: If true, pick one of the hosts as master/leader
(default is false).
### Credentials
`service_credentials`: A list of dictionaries, one for each service
credential that should be generated for this service.
Each credential object supports the following attributes:
`name` (mandatory): Name for this set of credentials, usually the same
as the service name. Certificates will be stored in a directory with
this name below `/etc/credentials/x509`.
`enable_client`: Whether to generate a client certificate (true by
default).
`client_cert_mode`: Key usage bits to set on the client
certificate. One of *client*, *server*, or *both*, the default is
*client*.
`enable_server`: Whether to generate a server certificate (true by
default).
`server_cert_mode`: Key usage bits to set on the server
certificate. One of *client*, *server* or *both*, the default is
*server*.
`extra_san`: Additional DNS domains to add as subjectAltName fields in
the generated server certificate. This should be a list. The internal
domain name will be appended to all entries.
### Monitoring
If monitoring endpoints are defined for a service, the monitoring
infrastructure will automatically scrape them.
`monitoring_endpoints`: List of monitoring endpoints exported by the
service.
Each element in the monitoring endpoints list can have the following
attributes:
`job_name`: Job name in Prometheus, defaults to the service name.
`type`: Selects the service discovery mechanism used by Prometheus to
find the service endpoints. One of *static* or *dynamic*.
`port`: Port where the `/metrics` endpoint is exported.
`scheme`: HTTP scheme for the service endpoint. The default is *https*.
### Global HTTP routing
Services can define *public* HTTP endpoints, that will be exported as
subdomains of the public domain name by the global HTTP
front-ends.
`public_endpoints`: List of endpoints exported by the service that
should be made available to end users via the service HTTP router.
Elements in the public endpoints list can have the following attributes:
`name`: Public name of the service. This can be different from the
service name, for instance you might want to export the internal
*prometheus* service as *monitoring* under the user-facing external
domain name. This name will be prepended to *domain_public* to obtain
the public FQDN to use. Alternatively, you can define one or more
*domains*.
`domains`: List of fully qualified server names for this endpoint, in
alternative to specifying a short *name*.
`port`: Port where the service is running.
`scheme`: HTTP scheme for the service endpoint. The default is *https*.
### Containers
Services can either be configured via an Ansible role, or by deploying
Docker containers (or both). Definitions for these containers are part
of the service metadata.
`containers`: List of containerized instances that make up the
service (for container-based services).
Each object in this list represents a containerized image to be
run. Supported attributes include:
`name`: Name of the container. It is possible to have containers with
the same name in different services.
`image`: Which Docker image to run.
`port`: Port exposed by the Docker image. It will be exposed on the
host network interface.
`docker_options`: Additional options to be passed to the `docker run`
command.
`args`: Arguments to be passed to the container entry point.
`volumes`: Map of *source*:*target* paths to bind-mount in the
container.
Containers will be registered on the DNS-based dynamic service
discovery mechanism as *service*-*container*.
#### Naming scheme for container-based services
In order to avoid having confusing names for Docker containers and
systemd units, it's best to define upfront a naming scheme for
container-based services, or the results will get chaotic really fast.
Some generic guidelines that may be useful:
* keep service names short and meaningful (e.g. *web* is probably not
a good name for a service);
* name containers based on the exposed protocol (*http*, *sql* etc),
to emphasize what they do over what they are.
## Examples
Let's look at some example *services.yml* files:
myservice:
num_instances: 2
service_credentials:
- name: myservice
enable_client: false
public_endpoints:
- name: myservice
type: static
port: 1234
This defines an Ansible-based service, of which we'll run two
instances. The service exposes an HTTP server on port 1234, which,
assuming *domain_public* is set to `mydomain.com`, will be available
at https://myservice.mydomain.com/ on the nginx service gateways of
the *core* role. Communication between the HTTPS gateway and the
service goes over HTTPS internally using auto-generated credentials.