diff --git a/README.md b/README.md index 8a85223cef63dd55cb87c67c2f675b0b978715bc..00f48afb52df8fdafa39d0d25dedc67fc7ee63c9 100644 --- a/README.md +++ b/README.md @@ -6,3 +6,104 @@ support. It can deal with translated documents (where fields have multiple values for each language), as well as sparse multi-language corpora where documents might each have different languages, by providing sensible behavior by default. + + +## Document model + +The index stores *documents*, which are just groups of key/value pairs +identified by a specific unique value for a primary key (also called a +*document ID*). The canonical document representation is a JSON-encoded map. + +Documents to be indexed can have an arbitrarily complex structure: they are +transformed to a "flat" representation, where nested objects are remapped +using a path-like syntax (dot separated) for their keys, e.g. the input +document + +```json +{ + "id": "1", + "measurement": { + "status": "ok", + "duration": 3142, + "location": "US" + } +} +``` + +is converted into the following flat map: + +```json +{ + "id": "1", + "measurement.status": "ok", + "measurement.duration": 3142, + "measurement.location": "US" +} +``` + +so that, when searching, one can reference, for instance, the +*measurement.status* field directly. + +## Schema + +Each index has an associated *schema*, that is, a description of the expected +document fields and their types (and other metadata, too). + +An index can only have a single schema, which can only be modified over time +by adding new fields. Schema changes that require reindexing, such as changing +an existing field's type, or removing it, are not supported. + +The main point of the schema is to define a *field mapping*, specifying how +fields in the document should be analyzed and indexed. If the indexer +encounters a field that is not specified explicitly in the schema, it will +default to a basic analyzer ("keyword") that only finds literal matches. + +Schema fields can have the following metadata attributes: + +* `type` - the field's data type, one of: + * *keyword* - no processing of the input text, literal matching + * *text* - text value + * *numeric* - numeric (float) value + * *datetime* - a timestamp value +* `boost` - boost value for this field when searching (default 1.0) +* `no_index` - set to true if this field should not be indexed +* `no_store` - set to true if this field's values should not be stored, only + indexed +* `aggregate` - set to true to support aggregations on this field + +Stored fields are the fields that will be returned in searches. By default, +the engine will store all fields in its index. + +The engine will make some attempt at converting input value types where those +conversions are possible (namely text / numeric), otherwise it will report an +error if the value types do not match the schema expectations. + +### Text fields + +Text fields are a complex type, a set of language / text pairs, that allow +specifying translations of the same text. The JSON encoding of a text field is +a map with two-letter ISO language codes as keys, e.g.: + +```json +{ + "en": "Example document", + "it": "Documento di esempio" +} +``` + +As a special case, a simple string will be interpreted as being in the +schema's default language, so: + +```json +"some example text" +``` + +is actually equivalent (assuming the default language is English) to: + +```json +{ + "en": "some example text" +} +``` + + diff --git a/schema.go b/schema.go index 0ae5f8edaa261cdedd8be1865828e5a56b407715..2f8323654c576bd075fbd2ce8b60c535dec9b662 100644 --- a/schema.go +++ b/schema.go @@ -14,10 +14,10 @@ type JSONDoc map[string]interface{} // Field types. const ( - KeywordField = "keyword" - TextField = "text" - NumericField = "numeric" - DateField = "date" + KeywordField = "keyword" + TextField = "text" + NumericField = "numeric" + DateTimeField = "datetime" ) // Field is a field in the schema.