Skip to content
Snippets Groups Projects
user avatar
sand authored
7f242440
History

DJRandom

DJRandom is an online music library application with a web interface, designed for small groups of people who want to share their media collections. On one side, it's a half-joking educational experiment in designing simple distributed systems (which is why it won't support millions of users), but it has also proven to be capable of scaling to a few machines with data in the Terabytes range.

Design

The application is designed do be modular, split into a few key components using minimal interfaces:

  • the database, which offers a plain key/value interface, with range-based scanning (the current implementation is single-hosted but very simple);
  • the index, for full-text search on song metadata;
  • the storage layer (a "blob store"), which holds the song data, again modeled as key/value pairs;
  • a queuing system for background processing.

Each of these components can be replaced by clients to other, better, systems. Many of such systems exists and are open source, and the interfaces can adapt to the specific semantics a large number of them.

But the default djrandom implementation is self-contained and does not depend on any external services. See the file services/README.rst for further details on the various service implementations.

All background jobs are run as mapreduces: again the current implementation is very simple and does not run workers on multiple machines.

Building

Dependencies:

  • portaudio
  • leveldb
  • libtool
  • fftw
  • pcre

To build djrandom on GNU/Linux:

$ mkdir -p $GOPATH/src/git.autistici.org/ale
$ cd $GOPATH/src/git.autistici.org/ale
$ git clone https://git.autistici.org/ale/imms.git imms
$ cd imms
$ aclocal ; libtoolize ; automake --foreign --add-missing ; autoconf
$ ./configure && make && sudo make install
$ go get -d git.autistici.org/ale/djrandom
$ cd $GOPATH/src/git.autistici.org/ale/djrandom
$ go get -v ./...
$ ./mkclientdist.sh

To build only the client you can run the following command instead of go get -v ./...:

$ go get -v ./client/...

To build djrandom on OS X you can install all the dependencies with homebrew:

$ brew install portaudio leveldb libtool fftw pcre

You also have to use glibtoolize instead of libtoolize:

$ aclocal ; glibtoolize ; automake --foreign --add-missing ; autoconf

Running the upload client

Create the ~/.djrandom.conf configuration file with the location of your media directory and your authentication credentials (an API key and a secret), in JSON format:

{
  "music_dir": "/home/user/Music",
  "auth_key": "abcdefghj....",
  "auth_secret": "blahblah...."
}

Then find a way to start djuploader in the background on every login, or whenever your machine starts. It will periodically wake up, check the music_dir, and upload whatever it finds.

It is possible to limit the bandwidth that the uploader is going to use by setting the bw_limit flag to a value in KBytes/sec. For other options just check djuploader --help.

Running the search client

The djplay search client will perform a search on the server and either print out the results in playlist format, or directly attempt to play them to the audio device. It uses the same configuration file as above.

It is best used together with a real audio player of some sort, for instance:

$ djplay --playlist Frank Zappa | vlc

Running the service

A normal deployment usually consists of more than one node (separate machine). Each service runs as a standalone process. The current implementation has two different sets of processes:

  • db_server and task_server must run on a single node (they are in fact not distributed services);
  • index_server, storage_server, djproc and djfe must run on every node.

Generate two partition tables (one for storage, one for the index) with a sufficiently large number of partitions, to ensure an approximately uniform distribution of data, and start the servers using the djrandom.init script.

If you modify the partition tables, for instance to add a new node, you should restart all the processes. Some data will be temporarily unavailable while the nodes automatically rebalance in the background.