Thanos: Supporting Prometheus in HA mode with Persistent Storage

Milind Dhoke
11 min readMay 5, 2020

--

To find a remedy on Prometheus’s legit issue on storage rentention, I started exploring some of the solution out in the market. I looked into Cortex first, evaluated it. It is a great tool while dealing with Prometheus’s storage issue only. Next I evaluated Thanos, I am putting my evaluation and work in this blog.

Whoever working on Prometheus, they might be aware of Prometheus’s main short-comings, like

Data Retention In Failure:

Prometheus stores data in Time Series Database (TSDB) locally. In middle if Prometheus server goes down then this data will be wiped out.

Unsupported HA Mode:

Prometheus does not work in High Availability, so it becomes single point of failure in production environment.

Thanos is helpful in handling these Prometheus’s short-coming.

What exactly is Thanos:

Thanos is an open source tool written in golang. It help Prometheus to leverage it in HA mode as well extending long term storage capabilities. Plus it provides supports for global query view, downsampling and compaction.

It would be better if we understand Thanos by taking a problem statement.

Consider you have micro-services running in two different regions in public cloud. To monitor the application metrics, you set up monitoring cluster consisting of Prometheus, Grafana and AlertManager. Single Prometheus instance is scraping the metrics from application endpoint, storing locally, alert rules have been set which sends alerts on matching conditions. Grafana is querying Prometheus endpoint for dashboard display. Our architecture would look like this:

Prometheus, Grafana, AlertManager Cluster without Thanos

Suddenly, one bad day you Prometheus instance went down for few hours. What could happen ?

  • Collected metrics are not stored somewhere that means data lost.
  • Grafana could not able to query that means unstable system
  • Alert Manager will not work that means no alerts if application fail over happened.

By god’s grace nothing happened to Prometheus and everything is working fine, then also there is problem with above architecture.

  • To get metrices from application individual queries need to be fired against each Prometheus server. Confutions might happen while analyzing the data.

Now we got these issues, so lets implement Thanos to resolve them. Before that lets understand necessary components from Thanos.

  • Thanos sidecar:
    This goes along with Prometheus instance which talk to Prometheus over HTTP. Sidecar has 2 sub components StoreAPI which allows remote query to pass on to Prometheus and Shipper which sends data to remote storage object. This support long term storage retention functionality.
  • Thanos querier
    This is central querier unit which queries multiple Prometheus through StoreAPI. This supports Global Multi-Tenant Query functionality.

These main component would work for minimal functioning. Since sidecar is pushing data to remote storage object, we need to tell sidecar through configuration file. Object storage is just like blob storage, it can be AWS S3, GCP, AZ Storage, local file system etc. We are using local file storage for this implementation.

Revamp architecture for Thanos:

As you can see new obnservability cluster is implemented where Grafana and Thanos Querier component got deployed. Some other components can be deployed in that cluster but those are not very necessary. Grafana from same obeservability cluster will fetch metrics thanos querier.

Some extra information:

  • Querier to sidecar connection is over gRPC
  • sidecar to Prometheus is over HTTP.

Lets implement above architecture:
Prerequisites:

Since we are using local storage as a storage object, we need to create a directory which will store all data pushed by sidecar. I created object-storage directory.

Prometheus requires configuration file(prometheus.yml). For this poc I used individual configuration files for each Prometheus instance.

bucket_config.yaml required by sidecar which tells shipper where to push data.

Directory Structure:

.
├── bucket_config.yaml
├── object-storage
├── prometheus01.yml
├── prometheus02.yml
├── prometheus03.yml
└── setup.sh

prometheus01.yml (Mandatory)


global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: csu
replica: 0
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus01:9091']
- job_name: 'sidecar'
static_configs:
- targets: ['prometheus01-sidecar:19091']

prometheus02.yml (Mandatory)

global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: wus
replica: 0
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus02:9092','prometheus03:9093']
- job_name: 'sidecar'
static_configs:
- targets: ['prometheus02-sidecar:19092','prometheus03-sidecar:19093']

prometheus03.yml (Mandatory)

global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: wus
replica: 1
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus02:9092','prometheus03:9093']
- job_name: 'sidecar'
static_configs:
- targets: ['prometheus02-sidecar:19092','prometheus03-sidecar:19093']

bucket_config.yaml (Mandatory)

# This storage type is used when user wants to store and access the bucket in the local filesystem
type: FILESYSTEM
config:
directory: "/object-storage"

Deployment Strategy:

We are deploying containerized application, individual docker commands are pretty long, thats why I wrote a script which ease our deployment.

Script: [ Improvement will take place]

#!/usr/bin/env bash
# Script to build monitoring cluster with thanos support.
# Actions: deploy|destroy
# Maintainer: Milind Dhoke
#
# Usage: ./setup deploy [ prom|sidecar|querier|grafana|all ] | destroy [ compoprom|sidecar|querier|grafana|allnent ]
#
# set -x
# Params
action="$1"
component="$2"
create_volume() {
echo "--> Creating persistent volumes for prometheus servers"
for item in 1 2 3
do
mkdir -p $(pwd)/prometheusStorage$item
done
}
create_docker_network() {
docker network create thanos &> /dev/null
}
deploy_prom() {
echo "--> Deploying prometheus instances"
for item in 1 2 3
do
echo "--> Deploying prometheus instance #$item"
docker run -d --net=thanos --rm -v $(pwd)/prometheus0$item.yml:/etc/prometheus/prometheus.yml -p 909$item:909$item -v $(pwd)/prometheusStorage$item:/prometheus -u root --name prometheus0$item prom/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/prometheus \
--web.listen-address=:909$item \
--web.enable-lifecycle \
--storage.tsdb.min-block-duration=5m \
--storage.tsdb.max-block-duration=5m \
--web.enable-admin-api &> /dev/null
sleep 3
done
curl http://localhost:9091 &> /dev/null
prom01=$?
curl http://localhost:9092 &> /dev/null
prom02=$?
curl http://localhost:9093 &> /dev/null
prom03=$?
if [[ prome01 -eq 0 || prome02 -eq 0 || prome03 -eq 0 ]] ; then
echo "--> Prometheus 01, 02 ,03 got deployed on 9091,9092,9093 port respectively"
else
echo "--> Error occurred while deploying prometheus server"
exit 1
fi
echo "*"
}
deploy_sidecar() {
if [ ! -d "$(pwd)/object-storage" ]; then
mkdir object-storage
fi
echo "--> Deploying thanos sidecar for each prometheus instance in the cluster"
for item in 1 2 3
do
echo "--> Deploying sidecar for prometheus instance #$item"
docker run -d --rm --net=thanos -v $(pwd)/prometheus0$item.yml:/etc/prometheus/prometheus.yml -v $(pwd)/bucket_config.yaml:/tmp/bucket_config.yaml --name prometheus0$item-sidecar -u root \
-v $(pwd)/prometheusStorage$item:/tmp/prometheusStorage \
-v $(pwd)/object-storage:/object-storage \
thanosio/thanos:master-2020-04-27-6d4c9f33 sidecar \
--http-address 0.0.0.0:1909$item \
--grpc-address 0.0.0.0:1919$item \
--reloader.config-file /etc/prometheus/prometheus.yml \
--prometheus.url http://prometheus0$item:909$item \
--tsdb.path /tmp/prometheusStorage \
--objstore.config-file /tmp/bucket_config.yaml &> /dev/null
SCRC=$?
sleep 3
done
echo "--> Using Local volume as a object storage"
if [ $SCRC -eq 0 ]; then
echo "--> All sidecars got deployed successfully for all prometheus instances"
else
echo "--> Error while deploying sidecars"
exit 1
fi
echo "*"
}
deploy_querier(){
docker run -d --rm --net=thanos --name thanos-querier -p 29090:29090 thanosio/thanos:master-2020-04-27-6d4c9f33 query \
--http-address 0.0.0.0:29090 \
--query.replica-label replica \
--store prometheus01-sidecar:19191 \
--store prometheus02-sidecar:19192 \
--store prometheus03-sidecar:19193 &> /dev/null
if [ $? -eq 0 ]; then
echo "--> Deployed thanos querier component"
else
echo "--> Error while deploying thanos querier component"
exit 1
fi
echo "*"
}
deploy_grafana() {
echo "--> Deploying single instance of grafana"
docker run -d --name grafana --net=thanos -p 3000:3000 grafana/grafana &> /dev/null
if [ $? -eq 0 ]; then
echo "--> Grafana is up and running on 3000 port"
else
echo "--> Error while deploying grafana"
exit 1
fi
echo "*"
echo "--> Cluster is up and running"
}
case "$action" in
deploy)
case "$component" in
prome)
# preparing persistance volumes.
create_volume
# creating a namespace for docker
create_docker_network
# deploying prometheuse server:
echo "*"
deploy_prom
;;
sidecar)
deploy_sidecar
;;
querier)
# Deploying thanos querier which queries the thanos side car endpoint over gRPC
deploy_querier
;;
gafana)
# deploy grafana instance
deploy_grafana
;;
all)
create_volume
create_docker_network
deploy_prom
deploy_sidecar
deploy_querier
deploy_grafana
;;
*)
echo "Choose an component to deploy from prom|sidecar|querier|grafana|all"
exit 1
;;
esac
;;
destroy)
case "$component" in
prom)
for container in prometheus01 prometheus02 prometheus03
do
echo "--> Removing $container residue."
docker container stop $container &> /dev/null
done
;;
sidecar)
for container in prometheus01-sidecar prometheus02-sidecar prometheus03-sidecar
do
echo "--> Removing $container residue."
docker container stop $container &> /dev/null
done
;;
querier)
echo "--> Removing querier residue."
docker container stop thanos-querier &> /dev/null
;;
grafana)
echo "--> Removing grafana residue."
docker container stop grafana &> /dev/null
docker ps -a | grep grafana &>/dev/null
if [ $? -eq 0 ]; then
docker rm -f grafana &> /dev/null
fi
;;
all)
for container in prometheus01 prometheus02 prometheus03 prometheus01-sidecar prometheus02-sidecar prometheus03-sidecar thanos-querier grafana
do
docker ps | grep $container &> /dev/null
UPRC=$?
if [ "$UPRC" -eq 0 ]; then
for container in prometheus01 prometheus02 prometheus03 prometheus01-sidecar prometheus02-sidecar prometheus03-sidecar thanos-querier grafana
do
echo "--> Removing $container residue"
docker container stop $container &> /dev/null
done
CONTAINERRC=$?
if [ "$CONTAINERRC" -eq 0 ]; then
docker ps -a | grep grafana | awk '{print $1}' | xargs docker rm -f &>/dev/null
if [ $? -eq 0 ]; then
docker rm -f grafana &> /dev/null
fi
echo "*"
echo "--> All containers drained out"
else
echo "--> Error while stopping grafana container"
exit 1
fi
else
CLUSTERDOWN=yes
fi
done
docker network ls | grep thanos &> /dev/null
if [ $? -eq 0 ]; then
echo "--> Removing docker network [thanos]"
docker network rm thanos &> /dev/null
fi
for item in prometheusStorage1 prometheusStorage2 prometheusStorage3 object-storage
do
if [ -d "$item" ]; then
echo "--> Deleting mounted storage volume: $item"
rm -rf $item
fi
done
if [ ! -z "$CLUSTERDOWN" ]; then
echo "--> Cluster is already down, nothing to tear off"
exit 0
fi
;;*)
echo "Choose an component to destroy from prom|sidecar|querier|grafana|all"
exit 1
;;
esac
;;
*)
echo "Usage: $0 deploy [ prom|sidecar|querier|grafana|all ] | destroy [ prom|sidecar|querier|grafana|all ]"
exit 1
;;

Execution Commands:

Deploying Prometheus: 
./setup deploy prom
Deploying thanos sidecar
./setup deploy sidecar
Deploying thanos querier
./setup deploy querier
Deploying grafana
./setup deploy grafana
Deploying whole cluster
./setup deploy all

Once you run ./setup deploy all command, all container will be up and running and applications will be serving on below URLS

Prometheus01: http://localhost:9091/
Prometheus02:
http://localhost:9092/
Prometheus03:
http://localhost:9093/
Thanos Querier:
http://localhost:29090/
Grafana:
http://localhost:3000/

mdhoke@thanos ➤ ./setup.sh deploy all
--> Creating persistent volumes for prometheus servers
--> Deploying prometheus instances
--> Deploying prometheus instance #1
--> Deploying prometheus instance #2
--> Deploying prometheus instance #3
--> Prometheus 01, 02 ,03 got deployed on 9091,9092,9093 port respectively
*
--> Deploying thanos sidecar for each prometheus instance in the cluster
--> Deploying sidecar for prometheus instance #1
--> Deploying sidecar for prometheus instance #2
--> Deploying sidecar for prometheus instance #3
--> Using Local volume as a object storage
--> All sidecars got deployed successfully for all prometheus instances
*
--> Deployed thanos querier component
*
--> Deploying single instance of grafana
--> Grafana is up and running on 3000 port
*
--> Cluster is up and running

Lets make sure all components are working together.
From thanos querier webpage, ( http://localhost:29090/) if you click on store , you will find all sidecars are connected by thanos querier and they are green (Up).

Fire any prometheus related query on querier, you will see commulative metrics from all prometheus instances. This means our deployment is successfull.

Once Prometheus creates a block of data after 2 Hrs of defaul window, sidecar will store it to a local storage object which is (object-storage directory), you will see blocks are being stored under this directory after 2 Hrs.

After 2 Hrs when you fire up a query on thanos,

thanos_shipper_uploads_total

you will see how many uploads have been done by shipper.

Dashboard Representation On Grafana:
Every thing is working fine, last piece we need to verify is from grafana. If Grafana displays same result shown by querier, that means we are successful.

Lets login to grafana (username: admin, password: admin) and configure data source by selecting prometheus. Better to provide name as Thanos and provide data source URL as http://localhost:29090 as shown in below image:

now fire up same query in grafana by creating a dashboard. You will see identical result which has been shown by thanos querier dashboard.

All looks good. Now lets check our local storage, you will see chunk of files being stored by sidecar: ** ( My data would be different)

mdhoke@object-storage ➤ tree
.
├── 01E79PABCZQK4Q69ED7X41THDC
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E79PAC5SGNPVKC89ZZ86RFCT
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E79PAGF726B8M083JQBZAVJD
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GQPPDYNQVC3EN0V6KC4MBC
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GQPQMGHZ8ATF3Q56NPR87P
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GQPRBY55JS1DV4T66HHVTY
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GQYMTKG07SF76C5JFE4WSS
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GQYQQH2MYGPH17XSKV8GFP
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GQYRNYKDB3HHFM1QPKM05D
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GR7WC04FTQN6QBH386K159
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GR7XA437Z5RRFYNVS6QB1B
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GR7ZQY0DC0PTDDGXEXFZD9
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GRH10DGY7ZZKSM6AT9MJDW
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GRH1YHH7Q0VCHS83WRB161
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
├── 01E7GRH4C9JVPNPDAA4HMK1RRB
│ ├── chunks
│ │ └── 000001
│ ├── index
│ └── meta.json
└── debug
└── metas
├── 01E79PABCZQK4Q69ED7X41THDC.json
├── 01E79PAC5SGNPVKC89ZZ86RFCT.json
├── 01E79PAGF726B8M083JQBZAVJD.json
├── 01E7GQPPDYNQVC3EN0V6KC4MBC.json
├── 01E7GQPQMGHZ8ATF3Q56NPR87P.json
├── 01E7GQPRBY55JS1DV4T66HHVTY.json
├── 01E7GQYMTKG07SF76C5JFE4WSS.json
├── 01E7GQYQQH2MYGPH17XSKV8GFP.json
├── 01E7GQYRNYKDB3HHFM1QPKM05D.json
├── 01E7GR7WC04FTQN6QBH386K159.json
├── 01E7GR7XA437Z5RRFYNVS6QB1B.json
├── 01E7GR7ZQY0DC0PTDDGXEXFZD9.json
├── 01E7GRH10DGY7ZZKSM6AT9MJDW.json
├── 01E7GRH1YHH7Q0VCHS83WRB161.json
└── 01E7GRH4C9JVPNPDAA4HMK1RRB.json
32 directories, 60 files

Well, Thanos can be extended using other components as well. But our main questions from Prometheus got resolved by this implementation.

Lets destroy the cluster by executing ./setup.sh destroy all

mdhoke@thanos ➤ ./setup.sh destroy all
--> Removing prometheus01 residue
--> Removing prometheus02 residue
--> Removing prometheus03 residue
--> Removing prometheus01-sidecar residue
--> Removing prometheus02-sidecar residue
--> Removing prometheus03-sidecar residue
--> Removing thanos-querier residue
--> Removing grafana residue
*
--> All containers drained out
--> Removing docker network [thanos]
--> Deleting mounted storage volumes
--> Deleting mounted storage volumes
--> Deleting mounted storage volumes
--> Cluster is already down, nothing to tear off

--

--