Skip to content
Snippets Groups Projects
Commit 08f924d4 authored by ale's avatar ale
Browse files

Immediately run a slave-cmd when losing mastership

Also updated the README with a bit more information.
parent b3473177
No related branches found
No related tags found
No related merge requests found
Pipeline #
...@@ -5,3 +5,54 @@ A simple utility that runs a master-election protocol on top of ...@@ -5,3 +5,54 @@ A simple utility that runs a master-election protocol on top of
[etcd](https://github.com/coreos/etcd), and that can run [etcd](https://github.com/coreos/etcd), and that can run
user-specified commands on state transitions. It is meant as a user-specified commands on state transitions. It is meant as a
building block for highly available services. building block for highly available services.
# Usage
The tool will attempt to acquire an etcd lock (currently
`/me/`*service_name*`/lock`). If it succeeds, it will run the command
specified by *--master-cmd*, and it will consider itself to be the
master until one of the following conditions is true:
* the *masterelection* tool itself is terminated
* the connection to *etcd* becomes unavailable
If the tool fails to acquire the lock, it will run the command
specified by *--slave-cmd*, and it will start monitoring the lockfile
for changes (like TTL expiry), waiting for the opportunity to acquire
the lock again. Whenever some other node acquires the lock, it will
run *--slave-cmd* again with the new master address.
Commands started by *masterelection* can be long-lived (like spawning
a daemon) or short-lived (sending an IPC message), in either case on
every state change event, the tool will kill the previously running
command with SIGTERM if it's still running, and immediately spawn the
new execution.
State is passed to commands via environment variables:
* `IS_MASTER` will be either 1 or 0
* `MASTER_ADDR` will contain the address of the current master
## Failure modes
As long as the connection to etcd is active, the state seen by the
tool will be consistent. The issues arise when there is no longer a
connection with etcd: in this case, the tool favors stability and will
not issue state changes if it had a slave role. However, this behavior
would be problematic for masters: if a master gets isolated by a
network partition, it will continue thinking it is the master, making
later reconciliation difficult if the other nodes constitute a
consensus. So, when the etcd connection is lost, we issue a
*--slave-cmd* with an empty MASTER_ADDR.
## Examples
A simple (and somewhat naive) example to control replication setup for
an already-running MySQL instance, assuming you are using Global
Transaction Identifiers:
$ masterelection --name=$MYHOSTNAME --service-addr=$MYADDR:3306 \
--master-cmd="mysql -e 'STOP SLAVE; RESET MASTER'" \
--slave-cmd="mysql -e 'CHANGE MASTER TO MASTER_HOST=\'\$MASTER_ADDR\''"
Ok, I may have gotten the quoting wrong, but you get the idea :)
...@@ -227,6 +227,16 @@ func runMasterElection(ctx context.Context, api etcdclient.KeysAPI, lockPath, se ...@@ -227,6 +227,16 @@ func runMasterElection(ctx context.Context, api etcdclient.KeysAPI, lockPath, se
} else { } else {
// Success, we are now the master. // Success, we are now the master.
err = runMaster(ctx, api, lockPath, self, stateFn) err = runMaster(ctx, api, lockPath, self, stateFn)
// Once we are not the master anymore, there's
// the possibility that we have lost access to
// etcd. Issue a state change to slave with
// unknown master, for safety.
//
// TODO: it would be better to wait for a
// little while, just in case we can
// successfully reconnect right away and do a
// single master -> slave transition.
stateFn(ctx, stateChangeMsg{isMaster: false})
} }
if err == context.Canceled { if err == context.Canceled {
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment