float issueshttps://git.autistici.org/ai3/float/-/issues2021-04-27T10:09:16Zhttps://git.autistici.org/ai3/float/-/issues/5Improve container networking2021-04-27T10:09:16ZaleImprove container networkingRight now we simply use "docker --network=host" and manage network overlays separately. It would be nice to support more advanced container networking configurations, in particular, a closer integration between net-overlays and the conta...Right now we simply use "docker --network=host" and manage network overlays separately. It would be nice to support more advanced container networking configurations, in particular, a closer integration between net-overlays and the container scheduling itself.
More specifically here's a possible outcome:
* containers are assigned their own IPs
* net_overlay assigns a subnet to a host, not just a single IP
* container IPs are picked out of private network ranges
There are a few challenges:
* the current service discovery layer assigns IPs to service instances. Multiple containers within a service should use separate ports on the same IP (and should be visible to each other as 'localhost'). Maybe we can do something with "docker network create", or we can bind the docker bridge and the vpn interface later somehow.
* ...https://git.autistici.org/ai3/float/-/issues/14Idea: transparent sharding of user-keyed SSO-enabled services2019-10-25T19:37:43ZaleIdea: transparent sharding of user-keyed SSO-enabled servicesCurrently we support sharding by publishing shard-specific URLs (e.g. https://2.webmail.my.domain). This is a very simple and efficient approach, but it has a few disadvantages:
* the sharding structure is exposed publicly
* people migh...Currently we support sharding by publishing shard-specific URLs (e.g. https://2.webmail.my.domain). This is a very simple and efficient approach, but it has a few disadvantages:
* the sharding structure is exposed publicly
* people might bookmark links etc. which become invalid on re-sharding
In order to support partitioned services directly in the HTTP router we have to solve the following problem: given a HTTP request, figure out which shard it should be sent to. In the general case of a complex service (where the answer isn't just in the URL itself) this is a complex problem, but the situation is different for user-partitioned, SSO-enabled services:
* the sharding key is also the username (or can be derived from it)
* the HTTP router has access to the SSO token (for this to be the case we would need to standardize all applications on using the same cookie name for SSO, but that's doable)
in this case, the HTTP router itself can look at the SSO token and route the request accordingly.
This will incur a performance overhead, as finding the backend from the username might require an RPC (a LDAP lookup, for instance), but this can be mitigated with a short-term cache. The implementation would require a new HTTP proxy layer (the alternative of writing a pile of LUA into nginx itself is not very appealing) co-hosted with nginx, like the sso-proxy. Such a proxy:
* would not perform SSO authentication itself (the backend application should do that)
* in fact it might not even validate the SSO token, just look at it
* in pseudo-code, its decision algorithm might look like this:
* unauthenticated request?
* send to a random backend (handles things like /sso_login etc)
* authenticated request?
* find backend from SSO username
* send to that backend
This would allow us to implement the above-mentioned "webmail" service like this:
* public URL is just https://webmail.my.domain
* we simply need to provide a username->backend lookup functionhttps://git.autistici.org/ai3/float/-/issues/40It is hard to tell which process belongs to which container from ps2019-05-11T09:18:55ZgodogIt is hard to tell which process belongs to which container from psFor example in the output below the container name (or better the systemd unit name) isn't mentioned anywhere:
```
root 16562 2.7 0.6 713840 6448 ? Ssl 10:24 13:13 /usr/bin/containerd
root 19863 0.0 0.1 10740 131...For example in the output below the container name (or better the systemd unit name) isn't mentioned anywhere:
```
root 16562 2.7 0.6 713840 6448 ? Ssl 10:24 13:13 /usr/bin/containerd
root 19863 0.0 0.1 10740 1312 ? Sl 10:27 0:04 \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/7948083fb7690b5154238ec2dfa2c332247ad4b05cfbc59
6f80cb0e2b641ff44 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
docker-+ 19881 0.0 0.0 4288 0 ? Ss 10:27 0:00 | \_ /bin/sh -c /usr/bin/memcached -vv -m ${MEM:-64} -p ${PORT:-11211} ${ENABLE_SASL:+-S}
docker-+ 19905 0.1 0.0 327252 0 ? Sl 10:27 0:45 | \_ /usr/bin/memcached -vv -m 64 -p 11212
root 23375 0.0 0.1 9396 1568 ? Sl 10:45 0:03 \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/fdd79fbd67ea703489738eb9120c23cdbc1f918227314b3
0b7b5c90f1d490b97 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
docker-+ 23413 1.2 3.0 1343960 30648 ? Ssl 10:45 5:51 | \_ /usr/share/kibana/bin/../node/bin/node --no-warnings /usr/share/kibana/bin/../src/cli serve --config /etc/kibana/kibana.yml --quiet
root 7837 0.0 0.1 9332 1384 ? Sl 10:49 0:03 \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/61d73f72397b863b184101bce521ca57328c8b21069bb38
764963e88a3a72658 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
docker-+ 7859 0.0 0.0 62288 580 ? Ss 10:49 0:00 | \_ /usr/bin/python3 /usr/local/bin/chaperone
docker-+ 8268 0.2 0.7 137040 7980 ? Sl 10:49 0:57 | \_ /usr/bin/apache_exporter -scrape_uri http://127.0.0.1:8084/server-status/?auto -telemetry.address :8184
docker-+ 8327 0.0 0.0 324660 296 ? Ss 10:49 0:07 | \_ php-fpm: master process (/etc/php/7.0/fpm/php-fpm.conf)
docker-+ 8334 0.0 0.0 324660 172 ? S 10:49 0:00 | | \_ php-fpm: pool www
docker-+ 8335 0.0 0.0 324660 172 ? S 10:49 0:00 | | \_ php-fpm: pool www
docker-+ 8336 0.0 0.0 324660 172 ? S 10:49 0:00 | | \_ php-fpm: pool www
docker-+ 8329 0.0 0.1 99276 1268 ? S 10:49 0:04 | \_ /usr/sbin/apache2 -DFOREGROUND
docker-+ 8355 0.0 0.0 25388 0 ? S 10:49 0:00 | \_ /usr/bin/logger -t apache -p local3 info
docker-+ 8358 0.2 0.0 1306132 272 ? Sl 10:49 1:12 | \_ /usr/sbin/apache2 -DFOREGROUND
docker-+ 8359 0.2 0.0 1306132 268 ? Sl 10:49 1:12 | \_ /usr/sbin/apache2 -DFOREGROUND
root 9408 0.0 0.1 9332 1552 ? Sl 10:50 0:04 \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/33e612efc89a9a54edf63b5346469ebd39534819a2c197752e6038cb51b7b66c -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
root 9428 0.0 0.1 62292 1192 ? Ss 10:50 0:00 \_ /usr/bin/python3 /usr/local/bin/chaperone
root 9453 0.2 0.6 210772 6632 ? Sl 10:50 0:57 \_ /usr/bin/apache_exporter -scrape_uri http://127.0.0.1:8083/server-status/?auto -telemetry.address :8183
root 9462 0.0 0.1 89444 1588 ? S 10:50 0:05 \_ /usr/sbin/apache2 -DFOREGROUND
root 9494 0.0 0.0 25388 28 ? S 10:50 0:01 \_ /usr/bin/logger -p local3 info -t apache
www-data 9495 0.0 0.0 89168 768 ? S 10:50 0:02 \_ /usr/sbin/apache2 -DFOREGROUND
www-data 9596 0.2 0.0 1296292 704 ? Sl 10:50 1:12 \_ /usr/sbin/apache2 -DFOREGROUND
www-data 9597 0.2 0.0 1296292 624 ? Sl 10:50 1:13 \_ /usr/sbin/apache2 -DFOREGROUND
```https://git.autistici.org/ai3/float/-/issues/53Expose internal HTTP endpoints through the sso-proxy2019-10-25T19:37:22ZaleExpose internal HTTP endpoints through the sso-proxyMost services with HTTP endpoints these days also have debug information etc. and it would be useful to be able to access it externally as administrators. This is doable but it's going to require care, as it would primarily rely on split...Most services with HTTP endpoints these days also have debug information etc. and it would be useful to be able to access it externally as administrators. This is doable but it's going to require care, as it would primarily rely on split DNS techniques, so we'd have to be careful to maintain strict separation of internal and external lookups (right now *float* does not control resolv.conf).
Steps for implementation:
* [ ] generate DNS zones for *domain*
* at first add them on top of /etc/hosts and do not modify host.conf
* make it so there are separate internal and external zones:
* the internal zone should match what currently is in /etc/hosts
* the external zone should point all names at the frontend hosts
* [ ] set up NGINX sso-proxy entries for all service backends
* these would match the *shard*.*service*.*domain* structure, without the port
* [ ] set up ACME entries for all these names
* one single certificate for all of them? one per service, with shards as subjectAltNames? a wildcard?
Alternatives to consider:
* perhaps we should simply create sharded public_endpoints manually for internal services with debug APIs? less magic, more manual work.https://git.autistici.org/ai3/float/-/issues/104Live dataset migration2021-11-25T09:04:28ZaleLive dataset migrationCurrently the mechanism for migrating datasets is to restore the latest backup on the new host, which introduces a worst-case 1-day data loss. While this is more or less fine for most of the services currently in float (that can easily t...Currently the mechanism for migrating datasets is to restore the latest backup on the new host, which introduces a worst-case 1-day data loss. While this is more or less fine for most of the services currently in float (that can easily tolerate data loss), and it's the right thing to do when the original host has failed, it's kind of an ugly constraint to have if the original data is still "right there", and it would be much better to have the capability for live dataset migration.
This could easily be implemented as a global rsync service, though it would introduce an avenue for lateral data movement between hosts (once there is local root compromise). On the other hand, this is already possible via the backup system since we have automated transparent restores on different hosts by design.https://git.autistici.org/ai3/float/-/issues/105Consider adding a "configuration file" abstraction2021-04-26T12:18:25ZaleConsider adding a "configuration file" abstractionWhile it is nice to offer the ability to configure containerized services via Ansible (because it allows arbitrary customization, besides being necessary for non-containerized services), it is true that the best practice envisions servic...While it is nice to offer the ability to configure containerized services via Ansible (because it allows arbitrary customization, besides being necessary for non-containerized services), it is true that the best practice envisions service-specific Ansible roles as only being responsible for generating some configuration files, possibly using templates.
It is then worth considering, with the intent of "hiding" Ansible as much as possible unless strictly necessary, if we could add a "configuration file" abstraction to the float service metadata, which would set up configuration files on the filesystem using Ansible templates. This would cover a lot of use cases, which would then no longer require an associated trivial Ansible role for configuration.
One of the obvious downsides is that it makes for a lot of ugly YAML, but this can be partially mitigated by using includes (eventually splitting down service metadata to one-service-per-file or such).https://git.autistici.org/ai3/float/-/issues/139Add Crowdsec support2023-03-07T09:11:02ZaleAdd Crowdsec supportThe functionality of [crowdsec](https://www.crowdsec.net/) seems very interesting for the float reverse proxy, in particular the possibility to implement "milder" ban actions for rate limiting such as requiring a captcha (better than out...The functionality of [crowdsec](https://www.crowdsec.net/) seems very interesting for the float reverse proxy, in particular the possibility to implement "milder" ban actions for rate limiting such as requiring a captcha (better than outright IP blocks).https://git.autistici.org/ai3/float/-/issues/141Replace "zonetool" with "dnscontrol"2023-03-07T09:10:52ZaleReplace "zonetool" with "dnscontrol"https://github.com/StackExchange/dnscontrolhttps://github.com/StackExchange/dnscontrolhttps://git.autistici.org/ai3/float/-/issues/143Model data control flow in logs2023-08-22T07:25:55ZaleModel data control flow in logsWe're using syslog as the generalized transport for asynchronous messages, at least those that are expected to end up in a searchable database somewhere -- so it would be nice to be able to model these data flows explicitly (switching on...We're using syslog as the generalized transport for asynchronous messages, at least those that are expected to end up in a searchable database somewhere -- so it would be nice to be able to model these data flows explicitly (switching on *log_type* attribute, for instance?) and describe them in a way that float would understand, and configure the system accordingly.
In line with this thinking, it would be nice to be able to set up *log consumers* that are not searchable databases, for example for the purpose of *log watching* (for periodic / real-time analysis, or alerting)...https://git.autistici.org/ai3/float/-/issues/144Replace Elasticsearch with Clickhouse2023-08-22T07:29:16ZaleReplace Elasticsearch with ClickhouseClickhouse might be more suited to the low-resource use case and might generally scale better to the high-resources one - we'd lose Kibana, but there is not much there that can't be replaced by a simpler dashboarding / query UI.Clickhouse might be more suited to the low-resource use case and might generally scale better to the high-resources one - we'd lose Kibana, but there is not much there that can't be replaced by a simpler dashboarding / query UI.