diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a4fd76cb637dfc04aa3c40fa746f1bc47c2afc03 --- /dev/null +++ b/README.md @@ -0,0 +1,344 @@ +service-prober +=== + +A configurable Prometheus blackbox prober for complex services, +e.g. those services whose conversations are too complex to be modeled +by the simple *expect*-like semantics of the stock +prometheus-blackbox-exporter. + +It currently has modules for IMAP-based tests (round-trip message +delivery checks), and for script-based HTTP tests suitable for +interacting with complex web apps (though there is no Javascript +support). + +There is one major difference in how the service-prober works with +respect to the stock prometheus-blackbox-prober: while the +prometheus-blackbox-prober shares its configuration between the prober +and Prometheus itself (prober defines probe types, the actual targets +are defined in Prometheus configuration), the service-prober +configuration completely defines all probes and all targets. To +Prometheus, service-prober appears just like any other job to scrape. + +Another difference with the prometheus-blackbox-prober is that here +the probe execution intervals are not tied to the Prometheus scraping +interval: instead, probes have their own execution schedule. This is +because service probes can be heavyweight, and we want to stagger them +across time instead of synchronizing them with the scrape. + +# Configuration + +The service-prober configuration is meant to be compact and +expressive, and provides a few features meant to facilitate the +generation of a large number of probe permutations with different +parameters. + +The configuration must be in JSON format, and must consist of a +top-level object containing two attributes: + +* *vars*, to define arbitrary global variables; +* *probes*, a list of probe specifications. + +Each probe specification is an object that can have the following +attributes: + +* *type*, specifying the probe type, which must be one of the + supported probe types (see below); +* *name*, the probe name, must be unique across all probes to + let Prometheus distinguish them; +* *interval*, how often to execute the probe; +* *timeout*, the timeout on each probe execution; +* *loop*, a list of variable names used to generate probe permutations + (see *Permutations* below); +* *params*, an object containing type-specific parameters to configure + the exact behavior of the probe. + +### Variable expansion + +The service-prober configuration supports variable expansion in all +string values. The syntax is shell-like, `${variable}`, and it is +possible to navigate hierarchies of objects using dot-notation to +separate attributes, e.g. given the following variables: + +```json +{ + "servers": { + "foo": { + "ip": "1.2.3.4", + "zone": "us" + }, + "bar": { + "ip": "2.3.4.5", + "zone": "eu" + } + } +} +``` + +one could write `${servers.foo.ip}` to obtain `1.2.3.4`. Note that +there is no syntax to access array (list) elements, only objects. + +Referencing non-existing variables results in a fatal error. + +### Permutations + +The *loop* attribute on probe specifications allows one to generate +permutations of parameters, resulting in multiple probes being created +out of a single (parameterized) specification. + +Each member of the *loop* list should reference a variable from *vars* +(using the dot-notation to navigate hierarchies, as outlined in the +*Variable expansion* section), which **must** be an array. The values +of this array will be used to replace the original variable, each time +generating a new probe. + +This is best clarified with examples. Suppose we have the following +configuration, using a hypothetical *ping* probe type (which does not +exist, but whose behavior should be intuitive) for simplicity: + +```json +{ + "vars": { + "servers": ["1.2.3.4", "2.3.4.5"] + }, + "probes": [ + { + "type": "ping", + "name": "ping/${servers}", + "loop": ["servers"], + "params": { + "addr": "${servers}" + } + } + ] +} +``` + +We have a *servers* variable with a list of IP addresses of our +servers. We also define a *ping* probe, which uses a *loop* attribute +and has some parameterized values: let's assume that *params.addr* +points the probe at the IP address to ping. + +What this does is generate two probes: for each value of the *servers* +variable, service-prober creates a new global variable context where +the *servers* variable is no longer the whole array, but just the +selected value. This lets us use `${servers}` in the probe attributes, +which will evaluate to the different values of the original *servers* +array every time. Thus we'll have a probe named *ping/1.2.3.4*, with +*params.addr* set to *1.2.3.4*, and another probe named *ping/2.3.4.5* +which points instead at *2.3.4.5*. + +(Yes, the pluralization issue with *servers* is a bit annoying). + +The arrays used for permutations don't just have to be strings, they +can be complex objects. + +If multiple *loop* variables are used, probes will be created for all +possible permutations of the combination of parameters. + +To exemplify the previous two points, let's examine a slightly more +complex configuration, this time using the *imap_login* probe (which +actually exists), and trying to test IMAP user login on two different +servers with two separate user accounts: + +```json +{ + "vars": { + "servers": ["imap1.example.com", "imap2.example.com"], + "credentials": [ + { + "username": "user1@example.com", + "password": "password1" + }, { + "username": "user2@example.com", + "password": "password2" + } + ] + }, + "probes": [ + { + "type": "imap_login", + "name": "imap_login/${servers}/${credentials.username}", + "loop": ["servers", "credentials"], + "params": { + "addr": "${servers}:993", + "username": "${credentials.username}", + "password": "${credentials.password}" + } + } + ] +} +``` + +This time we'll get 4 probes, one for each possible combination of +the values of *servers* and *credentials*. + +It is very important, as it is done in these examples, to always add +the loop values to the *name* attribute of the probe specification. If +this is not done, there will be multiple probes with the same name, +violating the metric uniqueness requirements. + +# Probe types + +## imap_roundtrip + +The *imap_roundtrip* probe performs a round-trip check for SMTP and +IMAP: it will try to deliver a test email message via authenticated +SMTP, and will monitor an IMAP mailbox until it is received (or the +timeout expires). + +The configuration requires the following *params*: + +* *dns_map*, to override DNS results (see *DNS overrides* below) +* *imap*, with IMAP-related parameters: + * *addr*, host:port of the IMAP server to connect to + * *username*, for IMAP authentication + * *password*, for IMAP authentication + * *ssl*, SSL configuration for the IMAP connection (see *SSL + options* below) +* *smtp*, with SMTP-related parameters: + * *addr*, host:port of the SMTP server to connect to + * *username*, for SMTP authentication + * *password*, for SMTP authentication + * *ssl*, SSL configuration for the SMTP connection (see *SSL + options* below) + + +## http + +The *http* prober can run a series of HTTP interactions, pretending to +be a browser (to the extent that it keeps track of cookies across +requests), and verify that they proceed according to the configured +expectations. + +The prober executes a *script*, consisting of multiple *steps*. At +every step, it is possible to make new requests, click links in the +page (found using CSS selectors), or submit forms (same). + +The probe configuration requires the following *params*: + +* *ssl*, SSL configuration (see *SSL options* below) +* *dns_map*, to override DNS results (see *DNS overrides* below) +* *script*, the script to execute + +The script is a list of steps, each supporting the following +attributes: + +* *type*, the step type, one of *open* (discard current state, request + a new page), *click* (find a link on the page and click it), or + *submit* (submit a form on the page, with the desired values) +* *url*, the URL to request when the step type is *open* +* *selector*, the CSS selector to use when the step type is *click* or + *submit*, to identify the A or FORM element respectively +* *form_values*, an associative array of form values to submit, when + the step type is *submit* +* *expected_url*, when present, is checked against the URL of the + current page at the end of the step (after redirects etc) +* *expected_data*, when present, should be a string contained in the + body of the current page, at the end of the step. + +So, for instance, we could define a probe to log into a hypothetical +web application: + +```json +{ + "vars": { + "username": "user1@example.com", + "password": "password1" + }, + "probes": [ + { + "type": "http", + "name": "pannello", + "interval": "10s", + "timeout": "10s", + "params": { + "script": [ + { + "type": "open", + "url": "https://webapp.example.com/", + "expected_url": "https://webapp.example.com/login" + }, { + "type": "submit", + "selector": ".form-signin", + "form_values": { + "username": "${username}", + "password": "${password}" + }, + "expected_url": "https://webapp.example.com/", + "expected_data": "My WebApp" + }, { + "type": "click", + "selector": "a.logout-link", + "expected_data": "Successfully Logged Out" + } + ] + } + } + ] +} +``` + +The above probe will try to access webapp.example.com, expect a login +form, fill it in with username and password, navigate to the web app +itself, and log out (of course the example is using sample CSS +selectors, the interaction details will be very specific to the +application). + +# Common configuration + +## DNS overrides + +All probes provide the capability to override DNS results, by +specifying a *dns_map*, which consists of a simple set of hostname / +IP address pairs. The service-prober will look up hostnames in the +*dns_map* before going to DNS, so this mechanism can be used to target +specific servers, e.g.: + +```json +{ + "vars": { + "servers": ["1.2.3.4", "2.3.4.5"] + }, + "probes": [ + { + "type": "imap_login", + "name": "imap_login/${servers}", + "loop": ["servers"], + "dns_map": { + "imap.example.com": "${servers}" + }, + "params": { + "addr": "imap.example.com:993", + "username": "user1@example.com", + "password": "password1" + } + } + ] +} +``` + +This configuration creates two probes, both will attempt to log in to +"imap.example.com", but each one will resolve that name to a different +IP address from *servers*. However, SSL validation will be performed +correctly with a *server_name* of "imap.example.com". + +## SSL options + +By default, service-prober will validate all SSL connections using the +system-wide CA roots, and will perform CN / subjectAltName validation +of the certificates. + +All SSL connections can be configured with the same set of options, +specified as an object with the following attributes: + +* *ca*, a file containing a CA certificate in PEM format, used to + validate the server's certificate +* *cert*, *key*, specifying a client certificate and private key +* *server_name*, to control SNI and to override the expected server + name for validation purposes +* *skip_validation*, set to true to disable all client-side SSL + validation + +By default, the *server_name* is set to the hostname used for the +connection, so there's no need to override it when using DNS +remappings to target the connection at a specific IP.