DESIGN.md



Architecture Overview
autoradio makes a number of decisions based on a bunch of assumptions:

scaling up / down should be easy (eventually automated)
individual nodes can be somewhat unreliable / unstable
the bottleneck is bandwidth
we can steer client traffic with redirects (either at the HTTP
level, or with the m3u -> stream link)

Internally, autoradio is split into a small number of components:

the node, which controls a Icecast daemon on the same machine;
the frontend, which responds to client requests and either
redirects them to a different frontend or proxies the stream to a
node;
the transcoder which re-encodes streams by running liquidsoap.

These components coordinate with each other using etcd, which is
also the data store for the application-level configuration (streams
and users etc). Runtime information, which is primarily used by the
load balancing algorithm to determine utilization, is shared via a
separate gossip-like protocol to reduce the load on etcd.
Inter-process coordination is achieved using a small number of
coordination primitives:

A presence primitive, which registers endpoints for specific
services (named ip:port pairs). It is then possible to retrieve the
list of all endpoints, or just one with a specific name.
A leader election primitive, which picks a single leader out of
the participants, and can return at any time the leader endpoint. It
is possible to observe the state of the election without
participating in it.

We use leader election for tasks that should be unique in the entire
cluster, like selecting the Icecast master, or running the transcoder
for a specific stream.
In previous autoradio implementations, the node and the frontend were
split into separate binaries. But if we are trying to optimize
bandwidth usage, it doesn't make much sense for a frontend to proxy a
stream to a different host: why not send the client to that host in
the first place (except for sources)? So nodes and frontends would
always be co-located on the same host, because it was better to always
proxy frontend requests to the local Icecast, and using Icecast
relaying capabilities to cut the inter-node bandwidth.
Autoradio version 2 moves node and frontend in the same binary:
removing the implicit assumption of co-location simplifies the code
quite a bit, reducing the necessary coordination. On the other hand,
the transcoder is moved into a separate binary, allowing the
deployment of transcode-only nodes that do not participate in the
streaming cluster (useful because transcoding is heavily CPU-bound).
In this scheme:

the nodes register presence for the following services:


status, to transfer status information

icecast, for the public Icecast address sent to clients


and they run the following leader elections:


icecast, to determine the Icecast master


The frontend then uses them depending on the incoming request type:

DNS requests (HA-focused) will return all IPs in the icecast
presence set;
HTTP client requests (LB-focused) will return a single IP picked by
the loadbalancing algorithm among the icecast presence set;
HTTP source requests will be forwarded to the icecast leader.

The status presence set is used by the gossip algorithm to pick a
random node to send the next status update to.