thanos配置promethes高可用

prometheus官方的高可用有幾種方案：

ha：即兩套prometheus採集完全一樣的資料，外邊掛負載均衡

ha + 遠端儲存：除了基礎的多副本prometheus，還通過remote write寫入到遠端儲存，解決儲存持久化問題

聯邦集群：即federation，按照功能進行分割槽，不同的shard分點採集不同的資料，由global節點來統一存放，解決監控資料規模的問題。

使用官方建議的多副本 + 聯邦仍然會遇到一些問題，本質原因是prometheus的本地儲存沒有資料同步能力，要在保證可用性的前提下再保持資料一致性是比較困難的，基本的多副本proxy滿足不了要求，比如：

目前大多數的prometheus的集群方案是在儲存、查詢兩個角度上保證資料的一致:

實際需求：

隨著集群規模越來越大，監控資料的種類和數量也越來越多：如master/node機器監控、程序監控、4 大核心元件的效能監控，pod資源監控、kube-stats-metrics、k8s events監控、外掛程式監控等等。除了解決上面的高可用問題，還希望基於prometheus構建全域性檢視，主要需求有：

在調研了大量的開源方案(cortex/thanos/victoria/…)和商業產品之後，我們選擇了 thanos，準確的說，thanos只是監控套件，與原生prometheus 結合，滿足了長期儲存+ 無限拓展 + 全域性檢視 + 無侵入性的需求。

thanos是一組元件，在官網上可以看到包括：

除了官方提到的這些，其實還有：

看起來元件很多，但其實部署時二進位制只有乙個，非常方便。只是搭配不同的引數實現不同的功能，如 query 元件就是 ./thanos query，sidecar 元件就是./thanos sidecar，元件all in one，**只有乙份，體積很小。

示例使用thanos sidecar，store，query三個元件。

docker-compose執行的prometheus配置

prometheus: image: prom/prometheus volumes: - ./prometheus/:/etc/prometheus/ - /usr/local/npg/prometheus_data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--storage.tsdb.retention=35d' - '--storage.tsdb.max-block-duration=2h' - '--storage.tsdb.min-block-duration=2h' - '--storage.tsdb.wal-compression' - '--storage.tsdb.retention.time=2h' - '--web.console.libraries=/usr/share/prometheus/console_libraries' - '--web.console.templates=/usr/share/prometheus/consoles' - '--web.enable-admin-api' - '--web.enable-lifecycle' links: - "alertmanager" ports: - 9090:9090 restart: always network_mode: "bridge"

web.enable-lifecycle一定要開，用於熱載入reload你的配置，retention保留 2 小時，prometheus 預設 2 小時會生成乙個 block，thanos 會把這個 block 上傳到物件儲存。

對 prometheus 的要求：

sidecar 元件作為 prometheus server 的 sidecar ，與 prometheus server 部署於同乙個 pod或主機中。他有兩個作用：

它使用prometheus的remote read api，實現了thanos的store api。這使後面要介紹的query 元件可以將prometheus伺服器視為時間序列資料的另乙個**，而無需直接與prometheus api互動（這就是 sidecar 的攔截作用）

可選配置：在prometheus每2小時生成一次tsdb塊時，sidecar將tsdb塊上載到物件儲存桶中。這使得prometheus伺服器可以以較低的保留時間執行，同時使歷史資料持久且可通過物件儲存查詢。

當然，這不意味著prometheus可以是完全無狀態的，因為如果它崩潰並重新啟動，您將丟失2個小時的指標，不過如果你的 prometheus 也是多副本，可以減少這2h 資料的風險。

sidecar配置：

thanos sidecar --tsdb.path /usr/local/npg/prometheus_data --prometheus.url http://localhost:9090 --objstore.config-file /root/config.yaml --http-address  0.0.0.0:19191 --grpc-address  0.0.0.0:19090

儲存配置檔案為/root/config.yaml，可以是共享物件儲存或者本地檔案系統，這裡演示方便使用的是本地檔案系統，檔案內容如下：

[root@localhost ~]# cat config.yaml 
type: filesystem
config:
directory: "/data"

sidecar 部署完成，可以安裝 query 元件

query元件（也稱為「查詢」）實現了prometheus 的http v1 api，可以像 prometheus 的 graph一樣，通過promql查詢thanos集群中的資料。

簡而言之，sidecar暴露了storeapi，query從多個storeapi中收集資料，查詢並返回結果。query是完全無狀態的，可以水平擴充套件。

query配置

thanos query     --http-address 0.0.0.0:19192  --store   localhost:19090  --store localhost:19914

store 引數代表的就是剛剛啟動的 sidecar 元件。

在第 3 步裡，./thanos query有一條–store是 ***:19914，這個 19914 就是接下來要說的store gateway元件。

在第 2 步的 sidecar 配置中，如果你配置了物件儲存objstore.config-file，你的資料就會定時上傳到bucket 中，本地只留 2 小時，那麼要想查詢 2 小時前的資料怎麼辦呢？資料不被 prometheus 控制了，應該如何從 bucket 中拿回來，並提供一模一樣的查詢呢？

store gateway 元件：store gateway 主要與物件儲存互動，從物件儲存獲取已經持久化的資料。與sidecar一樣，store gateway也實現了store api，query 組可以從 store gateway 查詢歷史資料。

配置

thanos store --data-dir=/store --objstore.config-file=/root/config.yaml --http-address=0.0.0.0:19904 --grpc-address=0.0.0.0:19914 --index-cache-size=250mb --sync-block-duration=5m --min-time=-2w --max-time=-1hg

因為store gateway需要從網路上拉取大量歷史資料載入到記憶體，因此會大量消耗 cpu 和記憶體，這個元件也是 thanos 面世時被質疑過的元件，不過當前的效能還算可以，遇到的一些問題後面會提到。

store gateway也可以無限拓展，拉取同乙份 bucket 資料。

下圖是查詢近1d的資料。可以超過2小時，證明sotre元件生效。

store顯示在query的web 頁面上

thanos配置promethes高可用

使用thanos管理Prometheus持久化資料

thanos實現prometheus高可用

虛擬機器使用docker搭建Prometheus

thanos配置promethes高可用

使用thanos管理Prometheus持久化資料

thanos實現prometheus高可用

虛擬機器使用docker搭建Prometheus

相關推薦