Optimizing disk usage of Zabbix and PostgreSQL

Here at Modio we use Zabbix a lot, both for production and other monitoring. We also use PostgreSQL for everything where we can, and while it's supported by Zabbix, it often feels like a second class citizen next to MySQL.

Due to reasons, we have to store a lot of historical values, for very long times in our systems, and sometimes for a lot of keys. And while SQL is great for a lot of things, time-series data is probably one of the worst when it comes to storage.

MySQL and Postgres store data a bit differently when it comes to on-disk usage, and with InnoDB data is sometimes stored interleaved with the index, making the two engines hard to compare. This post is focusing on PostgresSQL.

Simple table layout

For our example, we'll use a simple table like history_uint:

id	clock	value
1	1478970438	123
2	1478970439	222
3	1478970438	2233

Assuming all keys are integer, this is a s simple numerical table, with an index over (id, clock). You end up with almost as much overhead per row as you have data. Top that up with that half of the data (id, clock) will also be stored twice, in table and in the index.

For small amounts of data ( a few months ) this isn't a problem, but when you wish to store frequent data, or for longer (10 years!) of time, this causes storage requirements to grow. Just one of our installations was slightly over 300GiB of data on disk, and while this is still not big data, it is cause for some concern. For a table of say 50GiB, another 40GiB will be used for the index.

SQL is not the best format for Time-series databases, and there are several optimized datastores for just that. However, none of those integrate neatly with Zabbix.

Notable with this data is that meter based time-series data (usually) enters the system in a chronological order, the datasets on the disk will be organically sorted. Unlike if you do some data-reorganizing using for example CLUSTER or pg_repack, which are strictly sorting . How data is sorted on disk is important for how many disk seeks are done when you wish to look at historical data for a single sensor.

Housekeeper & Clustering

The houseekeeper in Zabbix is a well known cause of performance problems. When you mix storage of data records you wish to keep "forever" with those that you only wish to keep for a while, the housekeeper will go through your dataset and delete older data that it shouldn't keep. Simply doing this, will cause holes in your database files ( on disk ).

id	clock	value
1	1478970438	123
deleted	deleted	deleted
3	1478970438	234

Then when new data is inserted, it may be placed in one of the above holes,

id	clock	value
1	1478970438	123
33	2015362800	2033
3	1478970438	234

The on-disk layout can in PostgreSQL be changed following an index by using CLUSTER-command, but this requires a complete lock & rewrite of the data, while the table is completely locked for writes & reads, and it also needs a b-tree index on the column you wish to sort by. You can sort on-line (without a lock) using pg_repack, but this too requires an index to sort on.

So, for this reason, we do not run the housekeeper on the history tables, and resort to other ways of cleaning out old or unwanted data.

( Note that the points about sorting data may seem irrelevant now, but will become more relevant later. )

OID removal

Our systems have been running for a few years now, and we've been keeping up to date with Zabbix releases as they come. This means that the databases themselves were created by older Zabbix scripts and then upgraded. At some point in time, Zabbix defaulted to explicitly create Postgres tables with OID enabled. In practice, this means another key to the index, and in our case, OID's had no use at all, since they would have wrapped around repeatedly in the tables.

Worth noting is that the current Zabbix database creation scripts are no longer using OID's, but the upgrade process does not remove them.

Simply iterating over all our tables and removing OID's reduced the on-disk usage of some tables with 30-50%. The total reduction on disk was pretty amazing, simply by removing OIDs, our database on disk shrunk with almost 60%!

Disk usage graph

Index readjustment

In time series workloads. you never update a row once it's on the disk, they are inserted, then read repeatedly, and maybe later removed.

With PostgreSQL 9.5 a new type of index, BRIN (Block Range INdex) were introduced. These indexes work best when the data on disk is sorted, which we have already concluded we have. Unlike normal indexes, brin only stores min/max values for a range of blocks on disk, which allows them to be small, but which raises the cost for any lookup against the index. Each lookup that hits the index must read a range of pages from disk, so each hit becomes more expensive.

With our use of Zabbix, the most common "read" operation is a scan on the history table to find the last value reported to the system for a sensor. This value is then put in visualizations, tables and similar. This is the same that is used when Zabbix web interface lists the last timestamp and value for a host in it's last data screen.

The code for this is in PHP, and uses the ZBX_HISTORY_PERIOD=86400 constant ($period in the code below) to limit how far back in time it looks. This code follows a pretty traditional anti-pattern of doing 10 000 little queries that are all too small to be slow by themselves, yet totally causing a large time spent. This means it's not going to show in any slow query log.

    public function getLast(array $items, $limit = 1, $period = null) {  
        $rs = [];  
        foreach ($items as $item) {  
            $values = DBfetchArray(DBselect(  
                'SELECT *'.  
                ' FROM '.self::getTableName($item['value_type']).' h'.  
                ' WHERE h.itemid='.zbx_dbstr($item['itemid']).  
                    ($period ? ' AND h.clock>'.(time() - $period) : '').  
                ' ORDER BY h.clock DESC',  
                $limit  
            ));  
            if ($values) {  
                $rs[$item['itemid']] = $values;  
            }  
        }  
        return $rs;  
    }

A better way would be to pick out the value_type, and then use count(value_type) queries to get the last data. However, knowing that this query exists, and is the most common one, we can optimize for it.

For Zabbix, you cannot just replace the normal BTree index on history tables with a brin one. The above function would cause several thousand queries against the database, where each query would hit the index, and be forced to scan a block of pages on the disk. Trust me, I tried.

However, when we know that $period will be 86400, we can optimize for this specific function, by creating a btree index that covers this part, and using brin for the rest of the table. Sadly, a partial index cannot have a dynamic function in it's where-clause, so it has to be regenerated regularly.

To fix this, we run a script like the below once every night:

#!/bin/bash
database=${1:-zabbix}
cutoff=$[$(date +%s) - 2*86400]

psql -d zabbix << __EOF__
    BEGIN;
    DROP INDEX IF EXISTS history_backpeek_temp;
    CREATE INDEX history_backpeek_temp ON history (itemid, clock) WHERE clock >= ${cutoff};
    DROP INDEX IF EXISTS history_backpeek;
    ALTER INDEX history_backpeek_temp RENAME TO history_backpeek;
    COMMIT;
__EOF__

This will create a history_backpeek index, which allows the query optimizer to use that for getLast operations. As a side note, it would actually be quite nice if Postgres could use the existing index to build the new one.

With our lastValue-optimized history_backpeek index in place, we can replace the main itemid,clock btree-index with brin ones.

drop index history_1;
create index history_1 on history using brin (itemid, clock);

drop index history_str_1;
create index history_str_1 on history_str using brin (itemid, clock);

drop index history_log_1;
create index history_log_1 on history_log using brin (itemid, clock);

drop index history_text_1;
create index history_text_1 on history_text using brin (itemid, clock);

drop index history_uint_1;
create index history_uint_1 on history_uint using brin (itemid, clock);

Note that you want to shut down Zabbix & frontends when doing this, or wrap each in a transaction.

Benefits

So, I can't come here and make these claims and suggestions without some numbers to back it up with. We saw a 60% space reduction on ONE system from the OID reduction. Other systems saw a more modest 13-15% reduction in space.

The move towards brin indexes is hairier, and requires some more forethought. I have not been benchmarking history data (get history for 5-7 days via API), so I cannot tell for sure what the performance impact is.

Disk impact, this is a measure on a simple 15GiB history_uint-table:

index	size
history_uint_btree	9401 MB
history_uint_brin	688 kB
history_backpeek	743 MB

Yes, a freshly created btree index uses nearly 10GiB, 60% of the table, while the corresponding brin index uses below a megabyte. The constrained backpeek index instead uses around 5% of the table size, and covers approximately 7% of the total samples in the table.

In real numbers, replacing btree indexes with brin counts for a disk usage reduction of around 35%.

Disk usage graph

The total view after all these changes look like this:

Total disk usage of Zabbix

Sorting the table

To further optimize for a brin type index, it helps to have the table sorted. Especially if you have ever used the zabbix housekeeper.

The below will first index the entire table, then cluster (rebuild) it, and then drop the index. Warning, it will take time to run. And require a fair bit of disk space. pg_repack is a better choice as you can run it in production without locking the entire table.

Make sure you stop Zabbix first, and schedule a maintenance window.

create index history_sort on history (clock);
cluster history using history_sort;
drop index history_sort;

create index history_sort on history_uint (clock);
cluster history_uint using history_sort;
drop index history_sort;

create index history_sort on history_str (clock);
cluster history_str using history_sort;
drop index history_sort;

create index history_sort on history_log (clock);
cluster history_log using history_sort;
drop index history_sort;

create index history_sort on history_text (clock);
cluster history_text using history_sort;
drop index history_sort;

Update ( 2016-11-14 )

We found out that since web tests use an unlimited backpeek on the history_uint table when you configure it (to show when the test last ran), you may find it is impossible to modify web tests.

The patch at ZBX-11482 for the zabbix php frontend fixes that.

Update ( 2016-11-17 )

Added more complete index handling scripts, added a screenshot of replacing the indexes in production.

Total shrinkage was from 95GB down to 25GB, a reduction with approximately 75%, from removing OID and replacing the indexes.

By: D.S. Ljungmark