diff --git a/README.md b/README.md index a7eace38f245a2961c490453451b8e90f99fa484..3875bcdbbab13a58e8b0563be39d6e12f12bc1cd 100644 --- a/README.md +++ b/README.md @@ -4,21 +4,52 @@ Experimental Emacs LLM autocomplete assistant Note: the code in this repository isn't ready for use yet. This is a simple minimalistic Emacs package that provides -autocomplete-like functionality using LLMs. It uses the Ollama API for -LLM access. +autocomplete-like functionality using LLMs. It uses +[Ollama](https://ollama.com) for LLM access. It tries to improve the autocompletion quality by providing *additional context* for the LLM, obtained by indexing your project's source code, and retrieving snippets that are relevant to the code being completed (a form of *RAG*). -The indexing code is taken from -[Aider](https://github.com/Aider-AI/aider) and can run as a local -service with its own HTTP API in order to return results with low -latency. +# How it works +## Code indexing -## Usage +The indexing code is taken almost verbatim from +[Aider](https://github.com/Aider-AI/aider), specifically its RepoMap +implementation which uses +[grep-ast](https://github.com/Aider-AI/grep-ast) under the hood. It's +a very interesting implementation, that uses tree-sitter to extract +semantically useful elements from the code, and PageRank to score +them. It's also solid, well isolated, and quite sophisticated (it +caches parsed ASTs in a project-wide SQLite database for performance, +among other things), so I saw no need to re-implement it. + +The indexer operates on the project level, which in our case is +identified by a Git repository. Note: in the current implementation, +it's the *indexer* that implements the project abstraction, not Emacs +(so we can't use Emacs's own project-aware tooling). + +For latency reasons, the indexer is implemented as a service that runs +in the background and exposes a HTTP API. This API takes local file +names in input, so the indexer daemon needs to have full access to the +local filesystem. A good way to achieve this is to use systemd's user +session management capabilities, as outlined below. + +## LLM access + +In the current implementation, we're just using the *ollama* binary to +access the LLM. It seems relatively straightforward to switch to a +direct API call eventually. + +## Emacs integration + +The Emacs package is very simple (and surely incorrect in a bunch of +ways), it defines the *copilot-complete* function that will attempt to +autocomplete the active buffer at the current position. + +# Usage To run the indexing service you can use Systemd. But first you need to install the *ecopilot_srcindex* Python package somewhere: for the @@ -44,4 +75,6 @@ systemctl --user enable ecopilot-srcindex.socket systemctl --user start ecopilot-srcindex.socket ``` -The service will be automatically started when necessary. +The service will be automatically started when necessary. It needs to +run as your user because it needs to read source code from the local +filesystem.