Thanos: Supporting Prometheus in HA mode with Persistent Storage

11 min readMay 5, 2020

To find a remedy on Prometheus’s legit issue on storage rentention, I started exploring some of the solution out in the market. I looked into Cortex first, evaluated it. It is a great tool while dealing with Prometheus’s storage issue only. Next I evaluated Thanos, I am putting my evaluation and work in this blog.

Whoever working on Prometheus, they might be aware of Prometheus’s main short-comings, like

Data Retention In Failure:

Prometheus stores data in Time Series Database (TSDB) locally. In middle if Prometheus server goes down then this data will be wiped out.

Unsupported HA Mode:

Prometheus does not work in High Availability, so it becomes single point of failure in production environment.

Thanos is helpful in handling these Prometheus’s short-coming.

What exactly is Thanos:

Thanos is an open source tool written in golang. It help Prometheus to leverage it in HA mode as well extending long term storage capabilities. Plus it provides supports for global query view, downsampling and compaction.

It would be better if we understand Thanos by taking a problem statement.

Consider you have micro-services running in two different regions in public cloud. To monitor the application metrics, you set up monitoring cluster consisting of Prometheus, Grafana and AlertManager. Single Prometheus instance is scraping the metrics from application endpoint, storing locally, alert rules have been set which sends alerts on matching conditions. Grafana is querying Prometheus endpoint for dashboard display. Our architecture would look like this:

Prometheus, Grafana, AlertManager Cluster without Thanos

Suddenly, one bad day you Prometheus instance went down for few hours. What could happen ?

Collected metrics are not stored somewhere that means data lost.
Grafana could not able to query that means unstable system
Alert Manager will not work that means no alerts if application fail over happened.

By god’s grace nothing happened to Prometheus and everything is working fine, then also there is problem with above architecture.

To get metrices from application individual queries need to be fired against each Prometheus server. Confutions might happen while analyzing the data.

Now we got these issues, so lets implement Thanos to resolve them. Before that lets understand necessary components from Thanos.

Thanos sidecar:
This goes along with Prometheus instance which talk to Prometheus over HTTP. Sidecar has 2 sub components StoreAPI which allows remote query to pass on to Prometheus and Shipper which sends data to remote storage object. This support long term storage retention functionality.
Thanos querier
This is central querier unit which queries multiple Prometheus through StoreAPI. This supports Global Multi-Tenant Query functionality.

These main component would work for minimal functioning. Since sidecar is pushing data to remote storage object, we need to tell sidecar through configuration file. Object storage is just like blob storage, it can be AWS S3, GCP, AZ Storage, local file system etc. We are using local file storage for this implementation.

Revamp architecture for Thanos:

As you can see new obnservability cluster is implemented where Grafana and Thanos Querier component got deployed. Some other components can be deployed in that cluster but those are not very necessary. Grafana from same obeservability cluster will fetch metrics thanos querier.

Some extra information:

Querier to sidecar connection is over gRPC
sidecar to Prometheus is over HTTP.

Lets implement above architecture:
Prerequisites:

Since we are using local storage as a storage object, we need to create a directory which will store all data pushed by sidecar. I created object-storage directory.
Prometheus requires configuration file(prometheus.yml). For this poc I used individual configuration files for each Prometheus instance.
bucket_config.yaml required by sidecar which tells shipper where to push data.

Directory Structure:

.
├── bucket_config.yaml
├── object-storage
├── prometheus01.yml
├── prometheus02.yml
├── prometheus03.yml
└── setup.sh

prometheus01.yml (Mandatory)


global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: csu
    replica: 0scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus01:9091']
  - job_name: 'sidecar'
    static_configs:
      - targets: ['prometheus01-sidecar:19091']

prometheus02.yml (Mandatory)

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: wus
    replica: 0scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus02:9092','prometheus03:9093']
  - job_name: 'sidecar'
    static_configs:
      - targets: ['prometheus02-sidecar:19092','prometheus03-sidecar:19093']

prometheus03.yml (Mandatory)

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: wus
    replica: 1scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus02:9092','prometheus03:9093']
  - job_name: 'sidecar'
    static_configs:
      - targets: ['prometheus02-sidecar:19092','prometheus03-sidecar:19093']

bucket_config.yaml (Mandatory)

# This storage type is used when user wants to store and access the bucket in the local filesystem
type: FILESYSTEM
config:
  directory: "/object-storage"

Deployment Strategy:

We are deploying containerized application, individual docker commands are pretty long, thats why I wrote a script which ease our deployment.

Script: [ Improvement will take place]

#!/usr/bin/env bash
# Script to build monitoring cluster with thanos support.
# Actions: deploy|destroy
# Maintainer: Milind Dhoke
#
# Usage: ./setup deploy [ prom|sidecar|querier|grafana|all ]  | destroy [ compoprom|sidecar|querier|grafana|allnent ]
#
# set -x# Params
action="$1"
component="$2"create_volume() {
    echo "--> Creating persistent volumes for prometheus servers"
    for item in 1 2 3
    do
        mkdir -p $(pwd)/prometheusStorage$item
    done
}create_docker_network() {
    docker network create thanos &> /dev/null
}deploy_prom() {
    echo "--> Deploying prometheus instances"
    for item in 1 2 3
    do
        echo "--> Deploying prometheus instance #$item"
        docker run -d --net=thanos --rm -v $(pwd)/prometheus0$item.yml:/etc/prometheus/prometheus.yml -p 909$item:909$item -v $(pwd)/prometheusStorage$item:/prometheus -u root --name prometheus0$item  prom/prometheus  \
        --config.file=/etc/prometheus/prometheus.yml \
        --storage.tsdb.path=/prometheus \
        --web.listen-address=:909$item \
        --web.enable-lifecycle \
        --storage.tsdb.min-block-duration=5m \
        --storage.tsdb.max-block-duration=5m \
        --web.enable-admin-api &> /dev/null
        sleep 3
    done
    curl http://localhost:9091 &> /dev/null
    prom01=$?
    curl http://localhost:9092 &> /dev/null
    prom02=$?
    curl http://localhost:9093 &> /dev/null
    prom03=$?
    if [[ prome01 -eq 0 || prome02 -eq 0 || prome03 -eq 0 ]] ; then
        echo "--> Prometheus 01, 02 ,03 got deployed on 9091,9092,9093 port respectively"
    else
        echo "--> Error occurred while deploying prometheus server"
        exit 1
    fi
    echo "*"
}deploy_sidecar() {
    if [ ! -d "$(pwd)/object-storage" ]; then
        mkdir object-storage
    fi
    echo "--> Deploying thanos sidecar for each prometheus instance in the cluster"
    for item in 1 2 3
    do
        echo "--> Deploying sidecar for prometheus instance #$item"
        docker run -d --rm --net=thanos -v $(pwd)/prometheus0$item.yml:/etc/prometheus/prometheus.yml -v $(pwd)/bucket_config.yaml:/tmp/bucket_config.yaml --name prometheus0$item-sidecar -u root \
            -v $(pwd)/prometheusStorage$item:/tmp/prometheusStorage \
            -v $(pwd)/object-storage:/object-storage \
            thanosio/thanos:master-2020-04-27-6d4c9f33  sidecar \
            --http-address 0.0.0.0:1909$item \
            --grpc-address 0.0.0.0:1919$item \
            --reloader.config-file /etc/prometheus/prometheus.yml \
            --prometheus.url http://prometheus0$item:909$item \
            --tsdb.path /tmp/prometheusStorage \
            --objstore.config-file /tmp/bucket_config.yaml &> /dev/nullSCRC=$?
        sleep 3
    done
    echo "--> Using Local volume as a object storage"
    if [ $SCRC -eq 0 ]; then
        echo "--> All sidecars got deployed successfully for all prometheus instances"
    else
        echo "--> Error while deploying sidecars"
        exit 1
    fi
    echo "*"
}deploy_querier(){
    docker run -d --rm --net=thanos --name thanos-querier -p 29090:29090 thanosio/thanos:master-2020-04-27-6d4c9f33 query \
        --http-address 0.0.0.0:29090 \
        --query.replica-label replica \
        --store prometheus01-sidecar:19191 \
        --store prometheus02-sidecar:19192 \
        --store prometheus03-sidecar:19193 &> /dev/null
    if [ $? -eq 0 ]; then
        echo "--> Deployed thanos querier component"
    else
        echo "--> Error while deploying thanos querier component"
        exit 1
    fi
    echo "*"
}deploy_grafana() {
            echo "--> Deploying single instance of grafana"
        docker run -d --name grafana --net=thanos -p 3000:3000 grafana/grafana &> /dev/null
        if [ $? -eq 0 ]; then
            echo "--> Grafana is up and running on 3000 port"
        else
            echo "--> Error while deploying grafana"
            exit 1
        fi
        echo "*"
        echo "--> Cluster is up and running"
}case "$action" in
    deploy)
        case "$component" in
            prome)
                # preparing persistance volumes.
                create_volume
                # creating a namespace for docker
                create_docker_network
                # deploying prometheuse server:
                echo "*"
                deploy_prom
            ;;sidecar)
                deploy_sidecar
            ;;querier)
                # Deploying thanos querier which queries the thanos side car endpoint over gRPC
                deploy_querier
            ;;
            gafana)
                # deploy grafana instance
                deploy_grafana
            ;;all)
                create_volume
                create_docker_network
                deploy_prom
                deploy_sidecar
                deploy_querier
                deploy_grafana
            ;;
            *)
                echo "Choose an component to deploy from prom|sidecar|querier|grafana|all"
                exit 1
            ;;
        esac
    ;;destroy)
        case "$component" in
            prom)
                for container in prometheus01 prometheus02 prometheus03
                do
                    echo "--> Removing $container residue."
                    docker container stop $container &> /dev/null
                done
            ;;sidecar)
                for container in prometheus01-sidecar prometheus02-sidecar prometheus03-sidecar
                do
                    echo "--> Removing $container residue."
                    docker container stop $container &> /dev/null
                done
            ;;querier)
                echo "--> Removing querier residue."
                docker container stop thanos-querier &> /dev/null
            ;;grafana)
                echo "--> Removing grafana residue."
                docker container stop grafana &> /dev/null
                docker ps -a | grep grafana &>/dev/null
                if [ $? -eq 0 ]; then
                    docker rm -f grafana &> /dev/null
                fi
            ;;all)
                for container in prometheus01 prometheus02 prometheus03 prometheus01-sidecar prometheus02-sidecar prometheus03-sidecar thanos-querier grafana
                do
                    docker ps | grep $container &> /dev/null
                    UPRC=$?
                    if [ "$UPRC" -eq 0 ]; then
                        for container in prometheus01 prometheus02 prometheus03 prometheus01-sidecar prometheus02-sidecar prometheus03-sidecar thanos-querier grafana
                        do
                            echo "--> Removing $container residue"
                            docker container stop $container &> /dev/null
                        done
                        CONTAINERRC=$?
                        if [ "$CONTAINERRC" -eq 0 ]; then
                            docker ps -a | grep grafana | awk '{print $1}' | xargs docker rm -f &>/dev/null
                            if [ $? -eq 0 ]; then
                                docker rm -f grafana &> /dev/null
                            fi
                            echo "*"
                            echo "--> All containers drained out"
                        else
                            echo "--> Error while stopping grafana container"
                            exit 1
                        fielse
                        CLUSTERDOWN=yes
                    fi
                done
                docker network ls | grep thanos &> /dev/null
                if [ $? -eq 0 ]; then
                    echo "--> Removing docker network [thanos]"
                    docker network rm thanos &> /dev/null
                fi
                for item in prometheusStorage1 prometheusStorage2 prometheusStorage3 object-storage
                do
                    if [ -d "$item" ]; then
                        echo "--> Deleting mounted storage volume: $item"
                        rm -rf $item
                    fi
                done
                if [ ! -z "$CLUSTERDOWN" ]; then
                    echo "--> Cluster is already down, nothing to tear off"
                    exit 0
                fi;;*)
                echo "Choose an component to destroy from prom|sidecar|querier|grafana|all"
                exit 1
            ;;
        esac
    ;;*)
        echo "Usage: $0 deploy [ prom|sidecar|querier|grafana|all ] | destroy [ prom|sidecar|querier|grafana|all ]"
        exit 1
    ;;

Execution Commands:

Deploying Prometheus: 
./setup deploy promDeploying thanos sidecar 
./setup deploy sidecarDeploying thanos querier 
./setup deploy querierDeploying grafana 
./setup deploy grafanaDeploying whole cluster 
./setup deploy all

Once you run ./setup deploy all command, all container will be up and running and applications will be serving on below URLS

Prometheus01: http://localhost:9091/
Prometheus02: http://localhost:9092/
Prometheus03: http://localhost:9093/
Thanos Querier: http://localhost:29090/
Grafana: http://localhost:3000/

mdhoke＠thanos ➤ ./setup.sh deploy all
--> Creating persistent volumes for prometheus servers
--> Deploying prometheus instances
--> Deploying prometheus instance #1
--> Deploying prometheus instance #2
--> Deploying prometheus instance #3
--> Prometheus 01, 02 ,03 got deployed on 9091,9092,9093 port respectively
*
--> Deploying thanos sidecar for each prometheus instance in the cluster
--> Deploying sidecar for prometheus instance #1
--> Deploying sidecar for prometheus instance #2
--> Deploying sidecar for prometheus instance #3
--> Using Local volume as a object storage
--> All sidecars got deployed successfully for all prometheus instances
*
--> Deployed thanos querier component
*
--> Deploying single instance of grafana
--> Grafana is up and running on 3000 port
*
--> Cluster is up and running

Lets make sure all components are working together.
From thanos querier webpage, ( http://localhost:29090/) if you click on store , you will find all sidecars are connected by thanos querier and they are green (Up).

Fire any prometheus related query on querier, you will see commulative metrics from all prometheus instances. This means our deployment is successfull.

Once Prometheus creates a block of data after 2 Hrs of defaul window, sidecar will store it to a local storage object which is (object-storage directory), you will see blocks are being stored under this directory after 2 Hrs.

After 2 Hrs when you fire up a query on thanos,

thanos_shipper_uploads_total

you will see how many uploads have been done by shipper.

Dashboard Representation On Grafana:
Every thing is working fine, last piece we need to verify is from grafana. If Grafana displays same result shown by querier, that means we are successful.

Lets login to grafana (username: admin, password: admin) and configure data source by selecting prometheus. Better to provide name as Thanos and provide data source URL as http://localhost:29090 as shown in below image:

now fire up same query in grafana by creating a dashboard. You will see identical result which has been shown by thanos querier dashboard.

All looks good. Now lets check our local storage, you will see chunk of files being stored by sidecar: ** ( My data would be different)

mdhoke＠object-storage ➤ tree
.
├── 01E79PABCZQK4Q69ED7X41THDC
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E79PAC5SGNPVKC89ZZ86RFCT
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E79PAGF726B8M083JQBZAVJD
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GQPPDYNQVC3EN0V6KC4MBC
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GQPQMGHZ8ATF3Q56NPR87P
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GQPRBY55JS1DV4T66HHVTY
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GQYMTKG07SF76C5JFE4WSS
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GQYQQH2MYGPH17XSKV8GFP
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GQYRNYKDB3HHFM1QPKM05D
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GR7WC04FTQN6QBH386K159
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GR7XA437Z5RRFYNVS6QB1B
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GR7ZQY0DC0PTDDGXEXFZD9
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GRH10DGY7ZZKSM6AT9MJDW
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GRH1YHH7Q0VCHS83WRB161
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
├── 01E7GRH4C9JVPNPDAA4HMK1RRB
│   ├── chunks
│   │   └── 000001
│   ├── index
│   └── meta.json
└── debug
    └── metas
        ├── 01E79PABCZQK4Q69ED7X41THDC.json
        ├── 01E79PAC5SGNPVKC89ZZ86RFCT.json
        ├── 01E79PAGF726B8M083JQBZAVJD.json
        ├── 01E7GQPPDYNQVC3EN0V6KC4MBC.json
        ├── 01E7GQPQMGHZ8ATF3Q56NPR87P.json
        ├── 01E7GQPRBY55JS1DV4T66HHVTY.json
        ├── 01E7GQYMTKG07SF76C5JFE4WSS.json
        ├── 01E7GQYQQH2MYGPH17XSKV8GFP.json
        ├── 01E7GQYRNYKDB3HHFM1QPKM05D.json
        ├── 01E7GR7WC04FTQN6QBH386K159.json
        ├── 01E7GR7XA437Z5RRFYNVS6QB1B.json
        ├── 01E7GR7ZQY0DC0PTDDGXEXFZD9.json
        ├── 01E7GRH10DGY7ZZKSM6AT9MJDW.json
        ├── 01E7GRH1YHH7Q0VCHS83WRB161.json
        └── 01E7GRH4C9JVPNPDAA4HMK1RRB.json32 directories, 60 files

Well, Thanos can be extended using other components as well. But our main questions from Prometheus got resolved by this implementation.

Lets destroy the cluster by executing ./setup.sh destroy all

mdhoke＠thanos ➤ ./setup.sh destroy all
--> Removing prometheus01 residue
--> Removing prometheus02 residue
--> Removing prometheus03 residue
--> Removing prometheus01-sidecar residue
--> Removing prometheus02-sidecar residue
--> Removing prometheus03-sidecar residue
--> Removing thanos-querier residue
--> Removing grafana residue
*
--> All containers drained out
--> Removing docker network [thanos]
--> Deleting mounted storage volumes
--> Deleting mounted storage volumes
--> Deleting mounted storage volumes
--> Cluster is already down, nothing to tear off

Thanos: Supporting Prometheus in HA mode with Persistent Storage

Written by Milind Dhoke