Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
A
autoradio
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ale
autoradio
Commits
d6024274
Commit
d6024274
authored
10 years ago
by
ale
Browse files
Options
Downloads
Patches
Plain Diff
documentation on tunable parameters
parent
3f109e1b
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
TUNING.rst
+100
-0
100 additions, 0 deletions
TUNING.rst
with
100 additions
and
0 deletions
TUNING.rst
0 → 100644
+
100
−
0
View file @
d6024274
======================
autoradio Tuning Guide
======================
This document attempts to provide a high-level overview of the
trade-offs involved in tuning the free parameters of an autoradio
cluster. While autoradio works with the default settings out of the
box in testing environments, most real-world deployments will require
some tuning.
Etcd
----
The default settings for etcd are tuned for a local (LAN) network
environment. In the case of a geographically distributed cluster,
the default timeouts are so low that it's unlikely that a consensus
will ever be reached. You'll want to set both the peer heartbeat
interval and the election timeout to higher values. A reasonable value
for the heartbeat interval is 5x to 10x the maximum inter-node latency
in your cluster, while the election timeout should be at least 3 times
the heartbeat interval.
With our etcd package, you can set these values in
``/etc/default/etcd`` (values are milliseconds)::
DAEMON_OPTS="--peer-heartbeat-interval=1000 --peer-election-timeout=3000"
Increasing the etcd timeouts causes a related increase in the time
required to reach consensus and elect a new etcd master in case of
node failure. It is advisable that the radiod master election ttl is
set to a value greater than the etcd peer election timeout.
Radiod timeouts
---------------
Similar considerations, with respect to latency, apply to the presence
and master-election protocols that are run by autoradio itself. These
are controlled by radiod's ``--heartbeat`` and
``--master-election-ttl`` command-line flags. For these time values,
though, there are further considerations to be made:
Presence
~~~~~~~~
The node presence heartbeat sets the lower time bound for peers to
discover that a node is down, and stop sending client requests to it.
It also determines how often node utilization is propagated to the
peers. This is less of a concern if one is using query cost estimators
in the load balancing policy (as it is by default).
Setting this value too low, depending on the number of nodes in the
cluster, will cause excessive churn on etcd, leading to unnecessary
intra-cluster network traffic. As a side effect of the churn, watches
on etcd data will expire more often (due to the log position
increasing beyond the allowed horizon), which will cause more frequent
reloads of the full configuration, causing even more unnecessary
network traffic and increasing the load on etcd.
Master Election
~~~~~~~~~~~~~~~
The node master election timeout determines how quickly a source
(assuming it retries continuously on error) will be able to reconnect
to the cluster if the node that is currently the master becomes
unavailable.
Capacity
--------
One of the nice properties of the autoradio traffic control logic is
the ability to reject incoming traffic when the cluster reaches its
maximum capacity, to prevent overload and ensure that existing
connections are served reliably. This is of course only possible if
the capacity limits are set to match reality. Since these values
usually can't be guessed by autoradio, they must be set using
command-line arguments.
Autoradio models capacity along two separate dimensions: bandwidth
(outbound), and number of connected listeners. CPU/memory are not
included due to their negligible incremental cost per-request. Limits
can be set separately for each node in the cluster, by passing the
``--bwlimit`` and ``--max-clients`` command-line flags to ``radiod``.
The traffic control logic is then able to use utilization metrics to
make decisions about where to send traffic. For details on how this is
done, and how to control it, check the Go source documentation for the
``fe/lbv2`` package.
The default traffic control policy only checks the number of
listeners: this is because it usually makes the most sense to express
the global cluster capacity in those terms (bandwidth is hardly a good
metric in presence of variable bitrate streams, for instance). The
disadvantage is that finding the "real" maximum capacity numbers for a
given node might take some experimentation.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment