Style changes

06c001b3 · ale · 9d5edc8d · 06c001b3
Commit 06c001b3 authored 1 year ago by ale
--- a/README.md
+++ b/README.md
@@ -6,6 +6,8 @@ its ability to *scale down* for small installations, using very few
 resources while maintaining a certain level of usefulness, offering an
 alternative to heavyweight stacks like ELK in this scenario.

+[[_TOC_]]
+
 ## Overview

 The system's functionality is split into two parts:
@@ -111,9 +113,9 @@ The flattened records are then written to
 periodically (and when they reach a certain size). These files can be
 stored remotely, on S3-like backends.

-The ingestion API endpoint is at */ingest*, and it expects a POST
-request with a ND-JSON request body: newline-delimited JSON-encoded
-records, no additional headers or footers.
+The ingestion API endpoint is at `/ingest`, and it expects a POST
+request with a ND-JSON request body (newline-delimited JSON-encoded
+records, no additional headers or footers).

 ### Schema unification

@@ -151,7 +153,7 @@ you won't see logs until the ingestion server decides it's time to
 finalize the current Parquet file. For this reason, it might be
 sensible to set the *--rotation-interval* option to a few minutes.

-The query API is at */query* and it takes a full SQL query as the *q*
+The query API is at `/query` and it takes a full SQL query as the *q*
 parameter. The response will be JSON-encoded. Since the table to query
 is created on-the-fly with every request, its name is not known in
 advance to the caller: the SQL query should contain the placeholder
@@ -214,10 +216,8 @@ the URI scheme:
 * *minio* - Generic S3-like API support. Use standard environment
  variables (MINIO_ACCESS_KEY etc) for credentials, URIs should have
  this form: `minio://hostname/bucket/path`
-
 * *s3* - AWS S3 (not ready yet). Supports URIs like
  `s3://bucket/path`
-
 * *gcs* - Google Cloud Storage (not ready yet). Supports URIs of the
  form `gcs://project_id/bucket/path`

@@ -227,9 +227,9 @@ the URI scheme:
 The server offers some debugging endpoints which might be useful to
 understand what it is doing:

-* */schema* will return the current schema in JSON format
-* */debug/schema* will return a human-readable dump of the internal
-  state of the schema guesser
+* `/schema` will return the current schema in JSON format
+* `/debug/schema` will return a human-readable dump of the internal
+  state of the schema guesser, where you can find a report on the errors encountered 

 ### Performance and Scaling

@@ -261,3 +261,10 @@ is certainly possible to run multiple instances of *pqlogd* in
 parallel, pointing them at the same storage: generated filenames are
 unique, so the query layer will maintain the aggregate view of all
 logs.
+
+Note that multiple instances of the indexer will each run their own,
+independent schema analysis, which can potentially result in different
+schemas depending on the input. This is not an issue, because what
+matters is that the schema is consistent within each individual
+Parquet file: the database engine can easily merge those together at
+query time.