DJRandom
DJRandom is an online music library application with a web interface, designed for small groups of people who want to share their media collections. On one side, it's a half-joking educational experiment in designing simple distributed systems (which is why it won't support millions of users), but it has also proven to be capable of scaling to a few machines with data in the Terabytes range.
Design
The application is designed do be modular, split into a few key components using minimal interfaces:
- the database, which offers a plain key/value interface, with range-based scanning (the current implementation is single-hosted but very simple);
- the index, for full-text search on song metadata;
- the storage layer (a "blob store"), which holds the song data, again modeled as key/value pairs;
- a queuing system for background processing.
Each of these components can be replaced by clients to other, better, systems. Many of such systems exists and are open source, and the interfaces can adapt to the specific semantics a large number of them.
But the default djrandom implementation is self-contained and does not depend on any external services. See the file services/README.rst for further details on the various service implementations.
All background jobs are run as mapreduces: again the current implementation is very simple and does not run workers on multiple machines.
Building
Base dependencies:
- Go (at least version 1.2)
- LevelDB
Note that the LevelDB version in Debian wheezy is too old with respect to the Go bindings, so you might need to compile it from source.
You'll need to check out the source repository at the right place
in your GOPATH
:
$ go get -d git.autistici.org/ale/djrandom
Once this is done, choose whether you want to build the full suite or just the clients (which have a much smaller set of dependencies).
Building the clients
The DJRandom client suite need the PortAudio library (version 1.9
or higher) to talk to the audio device, and either ffmpeg
or
the libav
suite (avconv
).
On a Debian-based system, apt-get install libav-tools portaudio19-dev
should do.
-
ensure that the GOPATH environment variable is set. For instance, the following command will use a directory called
go
under your $HOME:$ export GOPATH=$HOME/go
-
install the client binaries:
$ cd $GOPATH/src/git.autistici.org/ale/djrandom $ make $ go install ./client/...
You should now have djplay
, djupload
and djmpd
in $GOPATH/bin
.
Building the service
Server-side components have quite a few dependencies on third-party libraries, including one that is not packaged in distributions and has to be built manually (imms):
- a basic C/C++ build environment
- GNU autotools
- GNU libtool
- FFTW (v3)
- PCRE
- IMMS (see below)
On a Debian system, install the required packages with:
$ apt-get install build-essential autoconf automake libtool \
pkg-config libfftw3-dev libpcre3-dev
On OSX, using brew
:
$ brew install portaudio leveldb libtool fftw pcre
To build IMMS:
$ go get -d git.autistici.org/ale/imms
$ cd $GOPATH/src/git.autistici.org/ale/imms
$ aclocal ; libtoolize ; automake --foreign --add-missing ; autoconf
$ ./configure && make && sudo make install
(on OSX, you might have to use glibtoolize
instead of libtoolize
).
You should now be able to build all the server-side tools with:
$ cd ../djrandom
$ make
$ go install ./server/...
$ go install ./mapreduce/...
Running the upload client
Create the ~/.djrandom.conf
configuration file with the location
of your media directory and your authentication credentials (an API
key and a secret), in JSON format:
{
"music_dir": "/home/user/Music",
"auth_key": "abcdefghj....",
"auth_secret": "blahblah...."
}
Then find a way to start djuploader
in the background on every
login, or whenever your machine starts. It will periodically wake up,
check the music_dir
, and upload whatever it finds.
It is possible to limit the bandwidth that the uploader is going to
use by setting the bw_limit
flag to a value in KBytes/sec. For
other options just check djuploader --help
.
Running the search client
The djplay
search client will perform a search on the server and
either print out the results in playlist format, or directly attempt
to play them to the audio device. It uses the same configuration file
as above.
It is best used together with a real audio player of some sort, for instance:
$ djplay --playlist Frank Zappa | vlc
Running the service
A normal deployment usually consists of more than one node (separate machine). Each service runs as a standalone process. The current implementation has two different sets of processes:
-
db_server
andtask_server
must run on a single node (they are in fact not distributed services); -
index_server
,storage_server
,djproc
anddjfe
must run on every node.
Generate two partition tables (one for storage, one for the index)
with a sufficiently large number of partitions, to ensure an
approximately uniform distribution of data, and start the servers
using the djrandom.init
script.
If you modify the partition tables, for instance to add a new node, you should restart all the processes. Some data will be temporarily unavailable while the nodes automatically rebalance in the background.