Commit c3a3561e authored by ale's avatar ale

updates to the documentation

parent 64b6e3ec
Overview
--------
Lens is a tool for log collection and analysis. It is meant primarily
for logs in Syslog format, leveraging an existing syslog collection
infrastructure, but it is possible to extend it by writing parsers for
new log formats (as long as they are line-based).
It has a few useful features:
* **Full text indexing and search**, the data store is Elasticsearch_
* **Attribute extraction** using a simple regex-based configuration
language, see `Pattern Extraction`_
* **Web UI** for live analysis
To use Lens, an existing syslog server is required (it can be a
dedicated instance, for remote log collection, for example), which
needs to be able to write to a local FIFO. Pretty much any syslog
server should satisfy this requirement, but it will be necessary to
configure it appropriately. See `Integration With Syslog`_ for details
on how to do so.
The other requirement is an Elasticsearch_ installation. An
Elasticsearch cluster is quite easy to setup and configure (see the
`ES installation docs`_), but for testing purposes all you need is to
download the package and run::
$ bin/elasticsearch
This will start an ES instance on ``localhost:9200``.
Installation
------------
Quick installation steps for a Debian-based system::
To install Lens the only requirement is a Python interpreter (at least
version 2.5), and `setuptools`_. For Debian-based systems, this
amounts to::
$ sudo apt-get install python python-setuptools
Install From Source
~~~~~~~~~~~~~~~~~~~
1. Clone the Lens repository::
$ git clone http://git.autistici.org/lens2.git
2. Run the ``setup.py`` installation script::
$ cd lens2 && sudo python setup.py install
Install From Debian Package
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Add the repository to your APT config by placing the following
contents in a file such as ``/etc/apt/sources.list.d/lens2.list``::
deb http://git.autistici.org/p/lens2/debian unstable main
2. Install the repository key::
$ sudo apt-get install solr-jetty
$ sudo adduser --system --home /var/run/logs logs
$ git clone http://git.autistici.org/git/lens2
$ cd lens2
$ sudo cp schema.xml /etc/solr/conf/schema.xml
$ sudo /etc/init.d/jetty restart
$ sudo cp lens2.init /etc/init.d/lens2
$ wget -O- http://git.autistici.org/p/lens2/debian/repo.key | \
sudo apt-key add -
then you can run::
3. Install the ``lens2`` package::
$ sudo /etc/init.d/lens2 start
$ sudo apt-get install lens2
The Debian package will install init scripts to control the indexing
daemon in ``/etc/init.d/lens2``.
to start the daemon listening on the FIFO. Tweak configuration
parameters as needed in /etc/default/lens2.
Integration With Syslog
-----------------------
The lens2 daemon is meant to be connected to a syslog server,
usually a 'collector' for your remote logs, via a FIFO. Logs
sent to the FIFO must be in a specific format containing the
timestamp in ISO format and the log facility/level. An example
configuration for syslog-ng is provided in
`docs/syslog-ng.conf.sample <http://git.autistici.org/lens2/tree/docs/syslog-ng.conf.sample>`_.
The lens2 daemon is meant to be connected to a syslog server, usually
a 'collector' for your remote logs, via a FIFO.
Lens supports a number of syslog-like formats for its input:
``iso``
A log format with ISO-formatted timestamps and that contains
priority and facility information. For instance, this is the
syslog-ng template::
template("${S_ISODATE} ${HOST} ${FACILITY}.${PRIORITY} ${MSGHDR}${MSG}\n")
and this one is for rsyslog::
$template localFormat,"%timereported:::date-rfc3339% %HOSTNAME% %syslogfacility-text%.%syslogseverity-text% %syslogtag%%msg::space%\n"
This is also the default format.
``standard``
Log format with a traditional syslog timestamp (i.e. "Feb 10
11:12:13") but with facility and priority (separated by a dot)
following the host name.
``dumb``
The default syslog format. Its usage is discouraged, since it lacks
facility and priority information. It is provided to allow importing
traditional syslog log files for testing purposes.
An example configuration file for syslog-ng is provided in
``docs/syslog-ng.conf.sample`` in the source tree.
A note on FIFO delivery: most modern syslog servers provide features
to buffer logs in memory, or on disk, for destinations that are
temporarily unavailable or blocked. This should be enabled for the
Lens FIFO destination, to support seamless restarts of the indexer
daemon. Furthermore, the indexer daemon will block reading from the
FIFO whenever the ES backend is unavailable.
Web Application Deployment
--------------------------
The Lens web UI is a pretty simple WSGI application (written using
`Flask`_) implementing a JSON API to query the underlying
Elasticsearch index. The web UI is mostly implemented in the browser
using Javascript.
The configuration is stored in a Python file containing variable
assignments. Besides various possible Flask configuration options, the
only variable that Lens needs is ``ES_SERVER``, which should contain
either a string with the ``HOST:PORT`` address of the ES server, or a
list of them (the Python ES client library will perform automatic
server failover in this case).
The deployment strategy will vary depending on your web server
environment: check out the `Flask deployment documentation`_ for the
specific details. We do provide some example handlers for common
configurations though:
- **FastCGI** in ``docs/lens.fcgi`` (requires `flup`_)
- **mod_wsgi** in ``docs/lens.wsgi``
The Lens web UI is completely stateless and uses a minimal amount of
resources (the queries are simply forwarded to Elasticsearch), so it
can easily scale horizontally, and it trivially supports HA
configurations.
Pattern Extraction
------------------
You can define patterns with placeholders that, when matched
against a record, will cause it to be tagged with extra
attributes. Those attributes are indexed and can be included
in a search query.
You can define patterns with placeholders that, when matched against a
record, will cause it to be tagged with extra attributes. Those
attributes are indexed and can be included in a search query.
Patterns are regular expressions: placeholders are specified
by wrapping the attribute name (uppercase) within '@' symbols,
and they will be replaced by a '(.*)' match in the final
regular expression.
Patterns are regular expressions: placeholders are specified by
wrapping the attribute name (uppercase) within '@' symbols, and they
will be replaced by a '(.*)' match in the final regular expression.
For example, to match failed SSH logins, and extract the 'user'
and 'ip' attributes, you could use the following pattern::
For example, to match failed SSH logins, and extract the *user* and
*ip* attributes, you could use the following pattern::
Failed password for @USER@ from @IP@
The patterns file simply contains a list of such patterns, one
by row. It is searched for by default in /etc/lens/patterns.conf.
It is also possible to specify the regular expression that will be
substituted for the placeholder variable by using the
``@NAME:REGEXP@`` syntax, i.e.::
Failed password for @USER@ from @IP:[0-9.]+@
A *pattern file* simply contains a list of such patterns, one by row.
Empty lines are ignored, and lines starting with ``#`` are considered
comments. Pattern files are searched for by default in
``/etc/lens/patterns.d``, using *run-dir* semantics (file names
containing dots are ignored).
Scaling
-------
We currently run our logging system on a medium-sized VM with
2G of RAM and 1 core. Our database holds ~40M records, and
performance is quite snappy.
The tool is designed to scale to a reasonably large number of systems:
since it acts as a stateless connector between systems that can be
scaled independently (the syslog collection infrastructure on one
side, and the Elasticsearch cluster on the other), it is trivial to
partition it following the growth of those systems.
To give a general idea of performance, we run a testing instance on a
single machine, with a log database consisting of ~40M records, and it
can reliably insert upwards of 2500 logs/sec.
It's a fairly slow virtual machine though, so we've had to
increase the default SOLR timeouts a little, in
/etc/solr/conf/solrconfig.xml::
<writeLockTimeout>60000</writeLockTimeout>
<commitLockTimeout>120000</commitLockTimeout>
.. _Elasticsearch: http://www.elasticsearch.org/
.. _ES installation docs: http://www.elasticsearch.org/guide/reference/setup/installation.html
.. _setuptools: http://pypi.python.org/pypi/setuptools
.. _Flask: http://flask.pocoo.org/
.. _Flask deployment documentation: http://flask.pocoo.org/docs/deploying/
.. _flup: http://pypi.python.org/pypi/flup
A SOLR-based syslog indexer
===========================
An Elasticsearch-based syslog indexer
=====================================
'lens2' is a tiny application to analyze syslog logs. It is most
useful when your log volume is large enough that simple analysis tools
such as 'grep' and 'awk' just won't cut it anymore.
*lens* is a tool to analyze syslog logs. It is most useful when your
log volume is large enough that simple analysis tools such as 'grep'
and 'awk' just won't cut it anymore.
'lens2' is able to extract attributes from logs (according to rules
that you specify) from your log files, that you can use to filter
*lens* is able to extract attributes from logs (according to rules
that you specify) from your log files, that you can then use to filter
search results.
'lens2' uses `SOLR`_ as its database backend, so you need to have SOLR
already installed and running within a Java application server (we
suggest `Jetty`_ for this purpose).
*lens* uses `Elasticsearch`_ as its database backend and search
engine, which provides for most of its useful features.
.. _SOLR: http://lucene.apache.org/solr/
.. _Jetty: http://jetty.codehaus.org/jetty/
.. _Elasticsearch: http://www.elasticsearch.org/
......@@ -5,18 +5,17 @@
# Example Apache2 configuration snippet:
#
# RewriteEngine On
# FcgidInitialEnv APP_SETTINGS /etc/lens/app.conf
# RewriteCond /var/www/%{REQUEST_FILENAME} !-f
# RewriteRule ^(.*)$ /var/www/app.fcgi$1 [L]
# RewriteRule ^(.*)$ /var/www/lens.fcgi$1 [L]
#
from flup.server.fcgi import WSGIServer
from lens2 import api
from lens2 import wsgiapp
# To serve static contents from Apache just make a link from
# /var/www/static to the lens2/static dir of your lens2 installation.
# Change as needed.
SOLR_URL = 'http://localhost:8080/solr'
from flup.server.fcgi import WSGIServer
from lens2 import www_app
if __name__ == '__main__':
lens = api.LensClient(SOLR_URL)
application = wsgiapp.LensWSGIServer(lens)
application = www_app.make_app()
WSGIServer(application).run()
# Sample mod_wsgi driver for lens2.
#
# Example Apache2 configuration snippet:
#
# Alias /static /var/www/static
# SetEnv APP_SETTINGS /etc/lens/app.conf
# WSGIDaemonProcess lens2 threads=5
# WSGIScriptAlias / /var/www/lens.wsgi
from lens2 import www_app
application = www_app.make_app()
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="lens" version="1.2">
<types>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
</types>
<fields>
<field name="id" type="uuid" indexed="true" stored="true" default="NEW"/>
<field name="host" type="lowercase" indexed="true" stored="true" />
<field name="program" type="lowercase" indexed="true" stored="true" />
<field name="facility" type="lowercase" indexed="true" stored="true" />
<field name="severity" type="lowercase" indexed="true" stored="true" />
<field name="msg" type="text" indexed="true" stored="true" compressed="true" />
<field name="pid" type="int" indexed="true" stored="true" />
<field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
<field name="timestamp" type="tdate" indexed="true" stored="true" default="NOW" multiValued="false"/>
<dynamicField name="attr_*" type="lowercase" stored="true" indexed="true" multiValued="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
<copyField source="facility" dest="text"/>
<copyField source="program" dest="text"/>
<copyField source="severity" dest="text"/>
<copyField source="msg" dest="text"/>
<copyField source="host" dest="text"/>
<copyField source="attr_*" dest="text"/>
</schema>
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment