This is a brief coverage of the Modio systems architecture, and how we use Kubernetes.
Our embedded devices in the field are the source of most of our data, so I'll explain a bit about how they function.
Our devices run a custom Linux for data collection and logging, batching & streaming data based on availability of network. Not using "realtime" techniques but preferring continuous small batching to improve network efficency, and because networks turned out to be not very reliable in practice.
Different protocols are integrated & decoded on the edge, mapped to a
normalized format and stored as a tuple of
key,value,timestamp, making it a
Each key maps distinctly to all of the following:
- Data type (down to bits of precision expected from hardware)
- Name (human readable, default)
- Description (Also for human consumption)
- Alerting rules
This device is then responsible for normalizing the data it collects from meters, and storing (a hopefully small amount) for batch processing.
Currently using "Modern" TLS with HTTP/2 + Client & Server ceritifcates from a separate PKI, (there is no public CA trust root on the devices), and each device has it's own key for identification.
This is a rough outline of our processing pipeline, not a complete description.
Ingress / Load Balancer
Nginx is acting as load balancer for a multi-provider kubernetes cluster, splitting data processing jobs geographically and logically.
Data is processed in batches via the
submit service, which can be scaled out
to be geographically close to devices and customers in order to improve network
The data submit process includes:
- Validating each data point according to type rules
- Ensuring processed values are accessible
- Handling notifications & alerting systems
Data is then pushed into storage tier one, PostgreSQL.
api service provides our REST API, primarily giving access to historical
data, events, and trends. It is also deployed in kubernetes, and here we also
have a cache layer as the API is designed to serve data using cache-friendly
Any data you want to store, you want to store more than once.
Using Tiered PostgreSQL storage, both with replication for data availability (backups, off-site) and hot+cold storage or archival data using Foreign Data Wrapper (FDW) to secondary storage of older data. This allows us to store large amounts of historical data accurately, saving up to 90% of disk capacity for older data.
Redis is used as a cluster-internal service to be able to quickly answer the
question "what was the last value for sensor
xxx.yy.zzz", which is one of our
most common queries, and something that is surprisingly difficult to answer
efficiently in a relational database.
Multi cloud & k8s native
Data storage (Postgres + offloading) are not managed in k8s, but past that, the entire server-side stack is horizontally scalable, and easy to split geographically. We've always had a multi-cloud approach to managing our k8s environment for availability & cost reasons, and our existing clusters span multiple providers.
Our story with kubernetes began three years ago after a few outages caused by restarting docker daemons during updates, upon which no containers could be started again until the machine's container storage had been wiped and re-initialized, something that required manual attention during new years eve weekend.
We originally migrated to k8s mostly as a glorified container launcher, and have gradually come to adopt more and more of the functionality available from the platform.
Where originally our containers were "OS-level" (including init-systems and more) and a way to package an OS stack and improve life for deploying consistently, they have since turned into a fairly standard k8s load of multiple individually scalable components.
Things not in Kubernetes
We do not keep databases & data-storage in kubernetes, as we try to keep all our kubernetes nodes stateless and keep no data in volumes or similar, to make the architecture more maintainable and manageable.
Neither is VPN and static IP endpoints in kubernetes. Again because of stateful loads and availability. It could in theory be migrated, but was deemed not worth it in practice.