手动部署thanos 0.12.2
前言
Prometheus不支持集群部署,在大规模场景下,Prometheus本身的性能和存储都有限(并发量过大,IO,Mem,CPU等很容易耗尽),比如通过降低采集速率,丢弃不重要指标,缩短数据保存天数等解决,不仅给运维带来了一定的麻烦,服务本身运行也带来了挑战
解决方案:
(1) 对服务进行分散采集: 比如部署多个Prometheus,每个Prometheus仅采集和存储某一个或某一部分服务的指标, Pa1 ---> Sa1 ,Pa2 ---> Sa2
(2) 对服务进行分片: 将服务拆分成多个group,让一个Prometheus仅采集该服务的某一个group数据,Pa2 ---> Group1(Sa1), Pa2 ---> Group2(Sa1)
(3) 使用thanos管理: 相同的 Prometheus 部署多个副本(都附带 Sidecar),然后Thanos Query去所有Sidecar查数据(下面会详解)
方案一: 联邦方式, 使用多个Prometheus分散监控服务, 一般规模足矣,若单个服务指标过多, 且副本数较高,则不适合
架构如下:
也可以使用独立存储来解决数据共享问题, OpenTSDB 或 InfluxDB 等支持集群部署的时序数据库(独立存储虽好,但不能使用PromQL)
remote_write:
- url: http://server:8888/write
语法示例:
InfluxDB:
SELECT mean("value") FROM "disk_io_time" WHERE $timeFilter GROUP BY time($interval), "instance" fill(null)
Prometheus:
disk_io_time
方案二: 解决上面超大规模问题, 但此方法部署和维护仍然复杂, 且冷数据的保存也有限制(下面会详解)
架构如下:
方案三:Thanos简化分布式Prometheus的部署与管理,并提供了全局视图,长期存储,高可用等高级特性(还有Cortex)
方案一部署简单,易于理解,这里不做详述,下面针对于方案二和三做详解
使用Prometheus对服务做分片采集
将一个服务拆分成多个group方案:
1: 使用Kubernetes 的 EndpointSlice 特性来做服务发现和分片处理(尚beta状态)
2:不使用kubernetes的服务发现,使用sharding算法将某个节点shard到某个group里,然后注册到consul中(前面文章讲解Prometheus基于API服务自动发现有提到过),Prometheus使用consul_sd_config模块指定每个 Prometheus 实例要采集的 group
- job_name: 'group-1'
consul_sd_configs:
- server: server:8500
services:
- group-1
3:用Kubernetes的node服务发现,再利用 Prometheus relabel配置的hashmod来对node做分片,每个Prometheus实例仅抓其中一个分片中的数据:
- job_name: 'group-1'
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__address__]
modulus: 4
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^2$
action: keep
参数解释:
bearer_token_file: 用webhook方式用 ServiceAccount 的 token让apiserver代理进行RBAC校验
insecure_skip_verify:不校验kubelet的server 证书
modulus:将节点分片成 4 个 group
regex:只抓第3个 group 中节点的数据(从0开始)
使用thanos部署与管理Prometheus的分布式
架构图如下(官方图都挺模糊):
核心组件说明:
Thanos Query: 实现了Prometheus API,将来自下游组件提供的数据进行聚合最终返回给查询数据的 client (如 grafana)
Thanos Sidecar: 连接 Prometheus,将其数据提供给 Thanos Query 查询,或者将其上传到对象存储,以供长期存储
Thanos Store Gateway: 将对象存储的数据暴露给 Thanos Query 去查询
Thanos Ruler: 对监控数据进行评估和告警,还可以计算出新的监控数据,将这些新数据提供给 Thanos Query 查询并且/或者上传到对象存储,以供长期存储
Thanos Compact: 将对象存储中的数据进行压缩和降低采样率,加速大时间区间监控数据查询的速度
架构图如下:
Query 与 Sidecar
工作流程如下:
1: 由于实现了Prometheus API,故可用PromQL语法进行查询
2:Thanos Query 去下游多个存储了数据的地方查数据,最后将这些数据聚合去重后返回给 client
3:每个Prometheus实例部署一个sidecar,将其数据提供给 Thanos Query 查询或存储
Store Gateway
Thanos Query 通过Store Gateway实现的Store API从对象存储中读取数据
工作流程如下:
Thanos Store Gateway 内部还做了一些加速数据获取的优化逻辑,一是缓存了 TSDB 索引,二是优化了对象存储的请求 (用尽可能少的请求量拿到所有需要的数据),支持两种缓存:
1:in-memory(可以通过配置使用–index-cache.config-file引用配置文件或–index-cache.config直接放置yaml config)
--index-cache.config="config":
"max_size": 10MiB
"max_item_size": 1MiB
"type": "IN-MEMORY"
max_size: 缓存的最大数量 max_item_size:单个项目的最大数量
2: Memcached: (使用后端memcache缓存,配置此缓存类型:–index-cache.config-file引用配置文件或–index-cache.config直接放置yaml config)
--index-cache.config="config":
"addresses":
- "dnssrv+_client._tcp.<MEMCACHED_SERCIVE>.thanos.svc.cluster.local"
"dns_provider_update_interval": "10s"
"max_async_buffer_size": 10000
"max_async_concurrency": 20
"max_get_multi_batch_size": 0
"max_get_multi_concurrency": 100
"max_idle_connections": 100
"max_item_size": "1MiB"
"timeout": "500ms"
"type": "MEMCACHED"
timeout:连接超时时间
max_idle_connections:最大连接空闲数
max_async_concurrency:最大并发异步操作数
max_async_buffer_size:最大异步操作的缓存大小
max_get_multi_concurrency:获取密钥时的最大并发连接数,如果设置为0,则并发是无限的
max_get_multi_batch_size: 单个操作应获取的最大密钥数,如果指定了更多密钥,则内部密钥将分为多批并发获取
max_item_size: 要存储在memcached中的项目的最大大小, 此选项应设置为与memcached -I标志相同的值(默认为1MB),以避免浪费网络往返时间来存储大于memcached中允许的最大项目大小的项目,如果设置为0,则项目大小不受限制
dns_provider_update_interval:DNS发现更新间隔
thanos 还支持Bucket的缓存(目前仅支持memcache实验性阶段)
backend: memcached
backend_config:
addresses:
- localhost:11211
caching_config:
chunk_subrange_size: 16000
max_chunks_get_range_requests: 3
chunk_object_size_ttl: 24h
chunk_subrange_ttl: 24h
参数说明:
backend_configmemcached的字段支持与用于索引缓存的 memcached相同的所有配置
caching_config 是块缓存的配置,并支持以下可选设置:
chunk_subrange_size:存储到缓存中的块对象段的大小,这是块缓存使用的最小单位
max_chunks_get_range_requests:可以缓存多少个“get range”子请求来执行以获取丢失的子范围
chunk_object_size_ttl:将有关块文件长度的信息保留在缓存中的时间
chunk_subrange_ttl:将各个子范围保留在缓存中的时间
Ruler
通过查询 Thanos Query 获取全局数据,然后根据 rules 配置计算新指标并存储,同时也通过 Store API 将数据暴露给 Thanos Query
工作流程如下:
Thanos Query 跟 Thanos Ruler 之间会相互查询,Thanos Ruler为Thanos Query提供计算出的新指标数据,而 hanos Query为Thanos Ruler 提供计算新指标所需要的全局原始指标数据
Compact
当时间范围很大时,查询的数据量也会很大,这会导致查询速度非常慢,通常在查询比较久的数据时,并不需要很详细的数据, Thanos Compact 读取对象存储的数据,对其进行压缩以及降采样再上传到对象存储,这样在查询大时间范围数据时就可以只读取压缩和降采样后的数据,极大地减少了查询的数据量,从而加速查询
工作流程如下:
Compact将历史数据中的Block合并压缩成大文件对象,会在原始的Block增加2个块,所以需要预留一部分空间
Bucket
检查对象存储中数据的命令,通常作为独立命令运行并帮助我们进行故障排查,支持通过Web UI 查看目前Buket的数量
Check
通过Thanos check 可以检查和验证Pormetheus Rules 是否正确
思考:
Sidecar 将最新的监控数据存到 Prometheus 本机,Query 通过调所有 Sidecar 的 Store API 来获取最新数据,如果 Sidecar 数量非常多,每次查询 Query 都调所有 Sidecar 会消耗很多资源,并且速度很慢, 监控页面数据无法实时更新
为了解决以上问题,官方开发了Thanos Receiver 组件,它适配了 Prometheus 的 remote write API,所有Prometheus 实例可以实时将数据 push 到 Thanos Receiver,Thanos Query 直接查 Thanos Receiver 即可,同时Receiver 实现了一致性哈希,支持集群部署,解决了负载问题(下面也会讲解)
架构图如下:
thanos两种方式部署
由于前面使用的prometheus-operator方式部署和管理的prometheus,所以可以已经创建了 CRD 对象,为了更好理解,本次手动部署
前提准备对象存储配置
支持本地文件系统存储,实验性的
type: FILESYSTEM
config:
directory: ""
部署minio
helm repo add azure http://mirror.azure.cn/kubernetes/charts/
helm repo update
helm pull azure/minio
修改key:
accessKey: "minio"
secretKey: "minio123"
不使用pvc:
persistence:
enable: false
设置bucket:
buckets:
- name: thanos
policy: none
purge: false
helm install minio -n monitor .
1: 部署thanos(不使用receiver)
启用sidecar(修改prometheus-operator的chart的value.yml文件)
thanos:
baseImage: quay.io/thanos/thanos
version: v0.12.2
objectStorageConfig:
key: thanos.yaml
name: thanos-objectstorage
增加sidecar的端口号
修改prometheus-operator的templates/prometheus/service.yaml增加以下配置
ports:
- name: {{ .Values.prometheus.thanos.portName }}
port: {{ .Values.prometheus.thanos.port }}
targetPort: {{ .Values.prometheus.thanos.targetPort }}
然后values.yaml 中prometheus字段下赋值
thanos:
portName: sidecar
port: 10901
targetPort: 10901
注意前面的空格(故意留出来供复制粘贴)
minio创建secret配置存储对象:thanos-storage-minio.yaml
apiVersion: v1
kind: Secret
metadata:
name: thanos-objectstorage
namespace: monitor
type: Opaque
stringData:
thanos.yaml: |-
type: s3
config:
bucket: thanos
endpoint: minio:9000
insecure: true
access_key: minio
secret_key: minio123
更新prometheus: helm upgrade prometheus -n monitor .
至此,sidecar部署完成
安装Query
thanos-query.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
app.kubernetes.io/version: v0.12.2
name: thanos-query
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
template:
metadata:
labels:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
app.kubernetes.io/version: v0.12.2
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- thanos-query
namespaces:
- monitor
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- args:
- query
- --log.level=debug
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:9090
- --query.replica-label=prometheus_replica
- --query.replica-label=rule_replica
- --store=prometheus-prometheus-oper-prometheus:10901
- --store=dnssrv+_grpc._tcp.thanos-rule.monitor.svc.cluster.local
- --store=dnssrv+_grpc._tcp.thanos-store.monitor.svc.cluster.local
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 4
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
periodSeconds: 30
name: thanos-query
ports:
- containerPort: 10901
name: grpc
- containerPort: 9090
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
terminationGracePeriodSeconds: 120
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
app.kubernetes.io/version: v0.12.2
name: thanos-query
namespace: monitor
spec:
ports:
- name: grpc
port: 10901
targetPort: grpc
- name: http
port: 9090
targetPort: http
selector:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
部署store
thanos-store.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
app.kubernetes.io/version: v0.12.2
name: thanos-store
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
serviceName: thanos-store
template:
metadata:
labels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
app.kubernetes.io/version: v0.12.2
spec:
containers:
- args:
- store
- --data-dir=/var/thanos/store
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --objstore.config=$(OBJSTORE_CONFIG)
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 8
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-store
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/thanos/store
name: data
readOnly: false
terminationGracePeriodSeconds: 120
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
app.kubernetes.io/version: v0.12.2
name: thanos-store
namespace: monitor
spec:
clusterIP: None
ports:
- name: grpc
port: 10901
targetPort: 10901
- name: http
port: 10902
targetPort: 10902
selector:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
部署rule
thanos-rule.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
app.kubernetes.io/version: v0.12.2
name: thanos-rule
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
serviceName: thanos-rule
template:
metadata:
labels:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
app.kubernetes.io/version: v0.12.2
spec:
containers:
- args:
- rule
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --objstore.config=$(OBJSTORE_CONFIG)
- --data-dir=/var/thanos/rule
- --label=rule_replica="$(NAME)"
- --alert.label-drop="rule_replica"
- --query=dnssrv+_http._tcp.thanos-query.monitor.svc.cluster.local
env:
- name: NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 24
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 5
name: thanos-rule
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 18
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/thanos/rule
name: data
readOnly: false
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
app.kubernetes.io/version: v0.12.2
name: thanos-rule
namespace: monitor
spec:
clusterIP: None
ports:
- name: grpc
port: 10901
targetPort: grpc
- name: http
port: 10902
targetPort: http
selector:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
部署bucke
thanos-bucke.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
app.kubernetes.io/version: v0.12.2
name: thanos-bucket
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
template:
metadata:
labels:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
app.kubernetes.io/version: v0.12.2
spec:
containers:
- args:
- bucket
- web
- --objstore.config=$(OBJSTORE_CONFIG)
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 4
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-bucket
ports:
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
terminationGracePeriodSeconds: 120
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
app.kubernetes.io/version: v0.12.2
name: thanos-bucket
namespace: monitor
spec:
ports:
- name: http
port: 10902
targetPort: http
selector:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
部署compact
thanos-compact.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
app.kubernetes.io/version: v0.12.2
name: thanos-compact
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
serviceName: thanos-compact
template:
metadata:
labels:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
app.kubernetes.io/version: v0.12.2
spec:
containers:
- args:
- compact
- --wait
- --objstore.config=$(OBJSTORE_CONFIG)
- --data-dir=/var/thanos/compact
- --debug.accept-malformed-index
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 4
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-compact
ports:
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/thanos/compact
name: data
readOnly: false
terminationGracePeriodSeconds: 120
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
app.kubernetes.io/version: v0.12.2
name: thanos-compact
namespace: monitor
spec:
ports:
- name: http
port: 10902
targetPort: http
selector:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
创建ingress thanos-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: thanos-ingress
namespace: monitor
spec:
rules:
- host: thanos.joey.com
http:
paths:
- backend:
serviceName: thanos-query
servicePort: 9090
应用yaml并修改本地hosts记录表(C:\Windows\System32\drivers\etc),此时浏览器输入域名即可访问对应服务,至此,服务部署完成
2: 部署thanos(使用receiver)
创建secret配置存储对象:thanos-storage-minio.yaml
apiVersion: v1
kind: Secret
metadata:
name: thanos-objectstorage
namespace: monitor
type: Opaque
stringData:
thanos.yaml: |-
type: s3
config:
bucket: thanos
endpoint: minio:9000
insecure: true
access_key: minio
secret_key: minio123
部署receiver
thanos-receive-statefulSet.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/component: database-write-hashring
app.kubernetes.io/instance: thanos-receive
app.kubernetes.io/name: thanos-receive
app.kubernetes.io/version: v0.12.2
name: thanos-receive
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: database-write-hashring
app.kubernetes.io/instance: thanos-receive
app.kubernetes.io/name: thanos-receive
serviceName: thanos-receive
template:
metadata:
labels:
app.kubernetes.io/component: database-write-hashring
app.kubernetes.io/instance: thanos-receive
app.kubernetes.io/name: thanos-receive
app.kubernetes.io/version: v0.12.2
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/instance
operator: In
values:
- thanos-receive
namespaces:
- monitor
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- args:
- receive
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --remote-write.address=0.0.0.0:19291
- --receive.replication-factor=1
- --objstore.config=$(OBJSTORE_CONFIG)
- --tsdb.path=/var/thanos/receive
- --label=replica="$(NAME)"
- --label=receive="true"
- --receive.local-endpoint=$(NAME).thanos-receive.$(NAMESPACE).svc.cluster.local:10901
env:
- name: NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 8
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-receive
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
- containerPort: 19291
name: remote-write
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/thanos/receive
name: data
readOnly: false
terminationGracePeriodSeconds: 120
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: database-write-hashring
app.kubernetes.io/instance: thanos-receive
app.kubernetes.io/name: thanos-receive
app.kubernetes.io/version: v0.12.2
name: thanos-receive
namespace: monitor
spec:
clusterIP: None
ports:
- name: grpc
port: 10901
targetPort: 10901
- name: http
port: 10902
targetPort: 10902
- name: remote-write
port: 19291
targetPort: 19291
selector:
app.kubernetes.io/component: database-write-hashring
app.kubernetes.io/instance: thanos-receive
app.kubernetes.io/name: thanos-receive
修改prometheus-operator
官网找半天没找到prometheus的remote write地址格式,只好去看源码了,如下
由此可得接口为:ip:port/v1/receive ,使用svc地址
remoteWrite:
- url: http://thanos-receive:19291/api/v1/receive
helm upgrade prometheus -n monitor .
安装Query
thanos-query.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
app.kubernetes.io/version: v0.12.2
name: thanos-query
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
template:
metadata:
labels:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
app.kubernetes.io/version: v0.12.2
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- thanos-query
namespaces:
- monitor
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- args:
- query
- --log.level=debug
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:9090
- --query.replica-label=prometheus_replica
- --query.replica-label=rule_replica
- --store=dnssrv+_grpc._tcp.thanos-receive.monitor.svc.cluster.local
- --store=dnssrv+_grpc._tcp.thanos-rule.monitor.svc.cluster.local
- --store=dnssrv+_grpc._tcp.thanos-store.monitor.svc.cluster.local
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 4
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
periodSeconds: 30
name: thanos-query
ports:
- containerPort: 10901
name: grpc
- containerPort: 9090
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
terminationGracePeriodSeconds: 120
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
app.kubernetes.io/version: v0.12.2
name: thanos-query
namespace: monitor
spec:
ports:
- name: grpc
port: 10901
targetPort: grpc
- name: http
port: 9090
targetPort: http
selector:
app.kubernetes.io/component: query-layer
app.kubernetes.io/instance: thanos-query
app.kubernetes.io/name: thanos-query
部署store
thanos-store.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
app.kubernetes.io/version: v0.12.2
name: thanos-store
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
serviceName: thanos-store
template:
metadata:
labels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
app.kubernetes.io/version: v0.12.2
spec:
containers:
- args:
- store
- --data-dir=/var/thanos/store
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --objstore.config=$(OBJSTORE_CONFIG)
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 8
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-store
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/thanos/store
name: data
readOnly: false
terminationGracePeriodSeconds: 120
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
app.kubernetes.io/version: v0.12.2
name: thanos-store
namespace: monitor
spec:
clusterIP: None
ports:
- name: grpc
port: 10901
targetPort: 10901
- name: http
port: 10902
targetPort: 10902
selector:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
部署rule
thanos-rule.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
app.kubernetes.io/version: v0.12.2
name: thanos-rule
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
serviceName: thanos-rule
template:
metadata:
labels:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
app.kubernetes.io/version: v0.12.2
spec:
containers:
- args:
- rule
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --objstore.config=$(OBJSTORE_CONFIG)
- --data-dir=/var/thanos/rule
- --label=rule_replica="$(NAME)"
- --alert.label-drop="rule_replica"
- --query=dnssrv+_http._tcp.thanos-query.monitor.svc.cluster.local
env:
- name: NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 24
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 5
name: thanos-rule
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 18
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/thanos/rule
name: data
readOnly: false
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
app.kubernetes.io/version: v0.12.2
name: thanos-rule
namespace: monitor
spec:
clusterIP: None
ports:
- name: grpc
port: 10901
targetPort: grpc
- name: http
port: 10902
targetPort: http
selector:
app.kubernetes.io/component: rule-evaluation-engine
app.kubernetes.io/instance: thanos-rule
app.kubernetes.io/name: thanos-rule
部署bucke
thanos-bucke.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
app.kubernetes.io/version: v0.12.2
name: thanos-bucket
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
template:
metadata:
labels:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
app.kubernetes.io/version: v0.12.2
spec:
containers:
- args:
- bucket
- web
- --objstore.config=$(OBJSTORE_CONFIG)
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 4
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-bucket
ports:
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
terminationGracePeriodSeconds: 120
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
app.kubernetes.io/version: v0.12.2
name: thanos-bucket
namespace: monitor
spec:
ports:
- name: http
port: 10902
targetPort: http
selector:
app.kubernetes.io/component: object-store-bucket-debugging
app.kubernetes.io/instance: thanos-bucket
app.kubernetes.io/name: thanos-bucket
部署compact
thanos-compact.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
app.kubernetes.io/version: v0.12.2
name: thanos-compact
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
serviceName: thanos-compact
template:
metadata:
labels:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
app.kubernetes.io/version: v0.12.2
spec:
containers:
- args:
- compact
- --wait
- --objstore.config=$(OBJSTORE_CONFIG)
- --data-dir=/var/thanos/compact
- --debug.accept-malformed-index
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.12.2
livenessProbe:
failureThreshold: 4
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-compact
ports:
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/thanos/compact
name: data
readOnly: false
terminationGracePeriodSeconds: 120
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
app.kubernetes.io/version: v0.12.2
name: thanos-compact
namespace: monitor
spec:
ports:
- name: http
port: 10902
targetPort: http
selector:
app.kubernetes.io/component: database-compactor
app.kubernetes.io/instance: thanos-compact
app.kubernetes.io/name: thanos-compact
创建ingress
thanos-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: thanos-ingress
namespace: monitor
spec:
rules:
- host: thanos.joey.com
http:
paths:
- backend:
serviceName: thanos-query
servicePort: 9090
整个部署文档bucket,compact,rule,store组件都一样,query除了store地址不一样其他都一样
打开thanos界面,可见各看到reveice信息
至此,两种方式部署完成
参数稍后解释