前言

Prometheus不支持集群部署,在大规模场景下,Prometheus本身的性能和存储都有限(并发量过大,IO,Mem,CPU等很容易耗尽),比如通过降低采集速率,丢弃不重要指标,缩短数据保存天数等解决,不仅给运维带来了一定的麻烦,服务本身运行也带来了挑战


解决方案:

(1) 对服务进行分散采集: 比如部署多个Prometheus,每个Prometheus仅采集和存储某一个或某一部分服务的指标, Pa1 ---> Sa1 ,Pa2 ---> Sa2

(2) 对服务进行分片: 将服务拆分成多个group,让一个Prometheus仅采集该服务的某一个group数据,Pa2 ---> Group1(Sa1), Pa2 ---> Group2(Sa1)

(3) 使用thanos管理: 相同的 Prometheus 部署多个副本(都附带 Sidecar),然后Thanos Query去所有Sidecar查数据(下面会详解)

方案一: 联邦方式, 使用多个Prometheus分散监控服务, 一般规模足矣,若单个服务指标过多, 且副本数较高,则不适合


架构如下:

pro-rw

也可以使用独立存储来解决数据共享问题, OpenTSDB 或 InfluxDB 等支持集群部署的时序数据库(独立存储虽好,但不能使用PromQL)

remote_write:
  - url: http://server:8888/write

语法示例:

InfluxDB: SELECT mean("value") FROM "disk_io_time" WHERE $timeFilter GROUP BY time($interval), "instance" fill(null)


Prometheus: disk_io_time


方案二: 解决上面超大规模问题, 但此方法部署和维护仍然复杂, 且冷数据的保存也有限制(下面会详解)


架构如下:

pro-rw

方案三:Thanos简化分布式Prometheus的部署与管理,并提供了全局视图,长期存储,高可用等高级特性(还有Cortex)

方案一部署简单,易于理解,这里不做详述,下面针对于方案二和三做详解


使用Prometheus对服务做分片采集

将一个服务拆分成多个group方案:

1: 使用Kubernetes 的 EndpointSlice 特性来做服务发现和分片处理(尚beta状态)


2:不使用kubernetes的服务发现,使用sharding算法将某个节点shard到某个group里,然后注册到consul中(前面文章讲解Prometheus基于API服务自动发现有提到过),Prometheus使用consul_sd_config模块指定每个 Prometheus 实例要采集的 group

  - job_name: 'group-1'
    consul_sd_configs:
      - server: server:8500
        services:
          - group-1

3:用Kubernetes的node服务发现,再利用 Prometheus relabel配置的hashmod来对node做分片,每个Prometheus实例仅抓其中一个分片中的数据:

- job_name: 'group-1'
   metrics_path: /metrics/cadvisor
   scheme: https
   bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
   kubernetes_sd_configs:
   - role: node
   tls_config:
     insecure_skip_verify: true
   relabel_configs:
   - source_labels: [__address__]
     modulus:       4 
     target_label:  __tmp_hash
     action:        hashmod
   - source_labels: [__tmp_hash]
     regex:         ^2$ 
     action:        keep

参数解释:

bearer_token_file: 用webhook方式用 ServiceAccount 的 token让apiserver代理进行RBAC校验
insecure_skip_verify:不校验kubelet的server 证书
modulus:将节点分片成 4 个 group
regex:只抓第3个 group 中节点的数据(从0开始)

使用thanos部署与管理Prometheus的分布式

架构图如下(官方图都挺模糊):

pro-rw


核心组件说明:

Thanos Query: 实现了Prometheus API,将来自下游组件提供的数据进行聚合最终返回给查询数据的 client (如 grafana)
Thanos Sidecar: 连接 Prometheus,将其数据提供给 Thanos Query 查询,或者将其上传到对象存储,以供长期存储
Thanos Store Gateway: 将对象存储的数据暴露给 Thanos Query 去查询
Thanos Ruler: 对监控数据进行评估和告警,还可以计算出新的监控数据,将这些新数据提供给 Thanos Query 查询并且/或者上传到对象存储,以供长期存储
Thanos Compact: 将对象存储中的数据进行压缩和降低采样率,加速大时间区间监控数据查询的速度

架构图如下:

querier


Query 与 Sidecar

工作流程如下:

1: 由于实现了Prometheus API,故可用PromQL语法进行查询
2:Thanos Query 去下游多个存储了数据的地方查数据,最后将这些数据聚合去重后返回给 client
3:每个Prometheus实例部署一个sidecar,将其数据提供给 Thanos Query 查询或存储

storage


Store Gateway

Thanos Query 通过Store Gateway实现的Store API从对象存储中读取数据

工作流程如下:

storagegw

Thanos Store Gateway 内部还做了一些加速数据获取的优化逻辑,一是缓存了 TSDB 索引,二是优化了对象存储的请求 (用尽可能少的请求量拿到所有需要的数据),支持两种缓存:

1:in-memory(可以通过配置使用–index-cache.config-file引用配置文件或–index-cache.config直接放置yaml config)

--index-cache.config="config":
    "max_size": 10MiB
    "max_item_size": 1MiB
  "type": "IN-MEMORY"

max_size: 缓存的最大数量 max_item_size:单个项目的最大数量

2: Memcached: (使用后端memcache缓存,配置此缓存类型:–index-cache.config-file引用配置文件或–index-cache.config直接放置yaml config)

  --index-cache.config="config":
    "addresses":
    - "dnssrv+_client._tcp.<MEMCACHED_SERCIVE>.thanos.svc.cluster.local"
    "dns_provider_update_interval": "10s"
    "max_async_buffer_size": 10000
    "max_async_concurrency": 20
    "max_get_multi_batch_size": 0
    "max_get_multi_concurrency": 100
    "max_idle_connections": 100
    "max_item_size": "1MiB"
    "timeout": "500ms"
  "type": "MEMCACHED"
timeout:连接超时时间
max_idle_connections:最大连接空闲数
max_async_concurrency:最大并发异步操作数
max_async_buffer_size:最大异步操作的缓存大小
max_get_multi_concurrency:获取密钥时的最大并发连接数,如果设置为0,则并发是无限的
max_get_multi_batch_size: 单个操作应获取的最大密钥数,如果指定了更多密钥,则内部密钥将分为多批并发获取
max_item_size: 要存储在memcached中的项目的最大大小, 此选项应设置为与memcached -I标志相同的值(默认为1MB),以避免浪费网络往返时间来存储大于memcached中允许的最大项目大小的项目,如果设置为0,则项目大小不受限制
dns_provider_update_interval:DNS发现更新间隔

thanos 还支持Bucket的缓存(目前仅支持memcache实验性阶段)

backend: memcached
backend_config:
  addresses:
    - localhost:11211
caching_config:
  chunk_subrange_size: 16000
  max_chunks_get_range_requests: 3
  chunk_object_size_ttl: 24h
  chunk_subrange_ttl: 24h

参数说明:

backend_configmemcached的字段支持与用于索引缓存的 memcached相同的所有配置
caching_config 是块缓存的配置,并支持以下可选设置:
    chunk_subrange_size:存储到缓存中的块对象段的大小,这是块缓存使用的最小单位
    max_chunks_get_range_requests:可以缓存多少个“get range”子请求来执行以获取丢失的子范围
    chunk_object_size_ttl:将有关块文件长度的信息保留在缓存中的时间
    chunk_subrange_ttl:将各个子范围保留在缓存中的时间

Ruler

通过查询 Thanos Query 获取全局数据,然后根据 rules 配置计算新指标并存储,同时也通过 Store API 将数据暴露给 Thanos Query


工作流程如下:

storagegw

Thanos Query 跟 Thanos Ruler 之间会相互查询,Thanos Ruler为Thanos Query提供计算出的新指标数据,而 hanos Query为Thanos Ruler 提供计算新指标所需要的全局原始指标数据


Compact

当时间范围很大时,查询的数据量也会很大,这会导致查询速度非常慢,通常在查询比较久的数据时,并不需要很详细的数据, Thanos Compact 读取对象存储的数据,对其进行压缩以及降采样再上传到对象存储,这样在查询大时间范围数据时就可以只读取压缩和降采样后的数据,极大地减少了查询的数据量,从而加速查询

工作流程如下: storagegw

Compact将历史数据中的Block合并压缩成大文件对象,会在原始的Block增加2个块,所以需要预留一部分空间


Bucket

检查对象存储中数据的命令,通常作为独立命令运行并帮助我们进行故障排查,支持通过Web UI 查看目前Buket的数量


Check

通过Thanos check 可以检查和验证Pormetheus Rules 是否正确

思考:

Sidecar 将最新的监控数据存到 Prometheus 本机,Query 通过调所有 Sidecar 的 Store API 来获取最新数据,如果 Sidecar 数量非常多,每次查询 Query 都调所有 Sidecar 会消耗很多资源,并且速度很慢, 监控页面数据无法实时更新

为了解决以上问题,官方开发了Thanos Receiver 组件,它适配了 Prometheus 的 remote write API,所有Prometheus 实例可以实时将数据 push 到 Thanos Receiver,Thanos Query 直接查 Thanos Receiver 即可,同时Receiver 实现了一致性哈希,支持集群部署,解决了负载问题(下面也会讲解)

架构图如下:

Receiver


thanos两种方式部署

由于前面使用的prometheus-operator方式部署和管理的prometheus,所以可以已经创建了 CRD 对象,为了更好理解,本次手动部署


前提准备对象存储配置

支持本地文件系统存储,实验性的

type: FILESYSTEM
config:
  directory: ""

部署minio

helm repo add azure http://mirror.azure.cn/kubernetes/charts/
helm repo update
helm pull azure/minio

修改key:

accessKey: "minio"
secretKey: "minio123"

不使用pvc:

persistence:
  enable: false

设置bucket:

buckets:
  - name: thanos
    policy: none
    purge: false

helm install minio -n monitor .


1: 部署thanos(不使用receiver)

启用sidecar(修改prometheus-operator的chart的value.yml文件)

    thanos:
      baseImage: quay.io/thanos/thanos
      version: v0.12.2
      objectStorageConfig:
        key: thanos.yaml
        name: thanos-objectstorage

增加sidecar的端口号

修改prometheus-operator的templates/prometheus/service.yaml增加以下配置

  ports:
  - name: {{ .Values.prometheus.thanos.portName }}
    port: {{ .Values.prometheus.thanos.port }}
    targetPort: {{ .Values.prometheus.thanos.targetPort }}

然后values.yaml 中prometheus字段下赋值

  thanos:
    portName: sidecar
    port: 10901
    targetPort: 10901

注意前面的空格(故意留出来供复制粘贴)

minio创建secret配置存储对象:thanos-storage-minio.yaml

apiVersion: v1
kind: Secret
metadata:
  name: thanos-objectstorage
  namespace: monitor
type: Opaque
stringData:
  thanos.yaml: |-
    type: s3
    config:
      bucket: thanos
      endpoint: minio:9000
      insecure: true
      access_key: minio
      secret_key: minio123    

更新prometheus: helm upgrade prometheus -n monitor .

至此,sidecar部署完成


安装Query

thanos-query.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query
    app.kubernetes.io/version: v0.12.2
  name: thanos-query
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: query-layer
      app.kubernetes.io/instance: thanos-query
      app.kubernetes.io/name: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/component: query-layer
        app.kubernetes.io/instance: thanos-query
        app.kubernetes.io/name: thanos-query
        app.kubernetes.io/version: v0.12.2
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                  - thanos-query
              namespaces:
              - monitor
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - query
        - --log.level=debug
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:9090
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --store=prometheus-prometheus-oper-prometheus:10901
        - --store=dnssrv+_grpc._tcp.thanos-rule.monitor.svc.cluster.local
        - --store=dnssrv+_grpc._tcp.thanos-store.monitor.svc.cluster.local
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 9090
            scheme: HTTP
          periodSeconds: 30
        name: thanos-query
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 9090
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 9090
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
      terminationGracePeriodSeconds: 120
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query
    app.kubernetes.io/version: v0.12.2
  name: thanos-query
  namespace: monitor
spec:
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 9090
    targetPort: http
  selector:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query

部署store

thanos-store.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store
    app.kubernetes.io/version: v0.12.2
  name: thanos-store
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: object-store-gateway
      app.kubernetes.io/instance: thanos-store
      app.kubernetes.io/name: thanos-store
  serviceName: thanos-store
  template:
    metadata:
      labels:
        app.kubernetes.io/component: object-store-gateway
        app.kubernetes.io/instance: thanos-store
        app.kubernetes.io/name: thanos-store
        app.kubernetes.io/version: v0.12.2
    spec:
      containers:
      - args:
        - store
        - --data-dir=/var/thanos/store
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --objstore.config=$(OBJSTORE_CONFIG)
        env:
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objectstorage
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 8
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-store
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/store
          name: data
          readOnly: false
      terminationGracePeriodSeconds: 120
      volumes: 
      - name: data
        emptyDir: {}

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store
    app.kubernetes.io/version: v0.12.2
  name: thanos-store
  namespace: monitor
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: 10901
  - name: http
    port: 10902
    targetPort: 10902
  selector:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store

部署rule

thanos-rule.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: rule-evaluation-engine
    app.kubernetes.io/instance: thanos-rule
    app.kubernetes.io/name: thanos-rule
    app.kubernetes.io/version: v0.12.2
  name: thanos-rule
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: rule-evaluation-engine
      app.kubernetes.io/instance: thanos-rule
      app.kubernetes.io/name: thanos-rule
  serviceName: thanos-rule
  template:
    metadata:
      labels:
        app.kubernetes.io/component: rule-evaluation-engine
        app.kubernetes.io/instance: thanos-rule
        app.kubernetes.io/name: thanos-rule
        app.kubernetes.io/version: v0.12.2
    spec:
      containers:
      - args:
        - rule
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --objstore.config=$(OBJSTORE_CONFIG)
        - --data-dir=/var/thanos/rule
        - --label=rule_replica="$(NAME)"
        - --alert.label-drop="rule_replica"
        - --query=dnssrv+_http._tcp.thanos-query.monitor.svc.cluster.local
        env:
        - name: NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objectstorage
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 24
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        name: thanos-rule
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 18
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/rule
          name: data
          readOnly: false
      volumes: 
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: rule-evaluation-engine
    app.kubernetes.io/instance: thanos-rule
    app.kubernetes.io/name: thanos-rule
    app.kubernetes.io/version: v0.12.2
  name: thanos-rule
  namespace: monitor
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 10902
    targetPort: http
  selector:
    app.kubernetes.io/component: rule-evaluation-engine
    app.kubernetes.io/instance: thanos-rule
    app.kubernetes.io/name: thanos-rule

部署bucke

thanos-bucke.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: object-store-bucket-debugging
    app.kubernetes.io/instance: thanos-bucket
    app.kubernetes.io/name: thanos-bucket
    app.kubernetes.io/version: v0.12.2
  name: thanos-bucket
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: object-store-bucket-debugging
      app.kubernetes.io/instance: thanos-bucket
      app.kubernetes.io/name: thanos-bucket
  template:
    metadata:
      labels:
        app.kubernetes.io/component: object-store-bucket-debugging
        app.kubernetes.io/instance: thanos-bucket
        app.kubernetes.io/name: thanos-bucket
        app.kubernetes.io/version: v0.12.2
    spec:
      containers:
      - args:
        - bucket
        - web
        - --objstore.config=$(OBJSTORE_CONFIG)
        env:
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objectstorage
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-bucket
        ports:
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
      terminationGracePeriodSeconds: 120
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: object-store-bucket-debugging
    app.kubernetes.io/instance: thanos-bucket
    app.kubernetes.io/name: thanos-bucket
    app.kubernetes.io/version: v0.12.2
  name: thanos-bucket
  namespace: monitor
spec:
  ports:
  - name: http
    port: 10902
    targetPort: http
  selector:
    app.kubernetes.io/component: object-store-bucket-debugging
    app.kubernetes.io/instance: thanos-bucket
    app.kubernetes.io/name: thanos-bucket

部署compact

thanos-compact.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: database-compactor
    app.kubernetes.io/instance: thanos-compact
    app.kubernetes.io/name: thanos-compact
    app.kubernetes.io/version: v0.12.2
  name: thanos-compact
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: database-compactor
      app.kubernetes.io/instance: thanos-compact
      app.kubernetes.io/name: thanos-compact
  serviceName: thanos-compact
  template:
    metadata:
      labels:
        app.kubernetes.io/component: database-compactor
        app.kubernetes.io/instance: thanos-compact
        app.kubernetes.io/name: thanos-compact
        app.kubernetes.io/version: v0.12.2
    spec:
      containers:
      - args:
        - compact
        - --wait
        - --objstore.config=$(OBJSTORE_CONFIG)
        - --data-dir=/var/thanos/compact
        - --debug.accept-malformed-index
        env:
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objectstorage
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-compact
        ports:
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/compact
          name: data
          readOnly: false
      terminationGracePeriodSeconds: 120
      volumes: 
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: database-compactor
    app.kubernetes.io/instance: thanos-compact
    app.kubernetes.io/name: thanos-compact
    app.kubernetes.io/version: v0.12.2
  name: thanos-compact
  namespace: monitor
spec:
  ports:
  - name: http
    port: 10902
    targetPort: http
  selector:
    app.kubernetes.io/component: database-compactor
    app.kubernetes.io/instance: thanos-compact
    app.kubernetes.io/name: thanos-compact

创建ingress thanos-ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: thanos-ingress
  namespace: monitor
spec:
  rules:
  - host: thanos.joey.com
    http:
      paths:
      - backend:
          serviceName: thanos-query
          servicePort: 9090

应用yaml并修改本地hosts记录表(C:\Windows\System32\drivers\etc),此时浏览器输入域名即可访问对应服务,至此,服务部署完成 dashboard



2: 部署thanos(使用receiver)

创建secret配置存储对象:thanos-storage-minio.yaml

apiVersion: v1
kind: Secret
metadata:
  name: thanos-objectstorage
  namespace: monitor
type: Opaque
stringData:
  thanos.yaml: |-
    type: s3
    config:
      bucket: thanos
      endpoint: minio:9000
      insecure: true
      access_key: minio
      secret_key: minio123    

部署receiver

thanos-receive-statefulSet.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: database-write-hashring
    app.kubernetes.io/instance: thanos-receive
    app.kubernetes.io/name: thanos-receive
    app.kubernetes.io/version: v0.12.2
  name: thanos-receive
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: database-write-hashring
      app.kubernetes.io/instance: thanos-receive
      app.kubernetes.io/name: thanos-receive
  serviceName: thanos-receive
  template:
    metadata:
      labels:
        app.kubernetes.io/component: database-write-hashring
        app.kubernetes.io/instance: thanos-receive
        app.kubernetes.io/name: thanos-receive
        app.kubernetes.io/version: v0.12.2
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app.kubernetes.io/instance
                  operator: In
                  values:
                  - thanos-receive
              namespaces:
              - monitor
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - receive
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --remote-write.address=0.0.0.0:19291
        - --receive.replication-factor=1
        - --objstore.config=$(OBJSTORE_CONFIG)
        - --tsdb.path=/var/thanos/receive
        - --label=replica="$(NAME)"
        - --label=receive="true"
        - --receive.local-endpoint=$(NAME).thanos-receive.$(NAMESPACE).svc.cluster.local:10901
        env:
        - name: NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objectstorage
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 8
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-receive
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 10902
          name: http
        - containerPort: 19291
          name: remote-write
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/receive
          name: data
          readOnly: false
      terminationGracePeriodSeconds: 120
      volumes: 
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: database-write-hashring
    app.kubernetes.io/instance: thanos-receive
    app.kubernetes.io/name: thanos-receive
    app.kubernetes.io/version: v0.12.2
  name: thanos-receive
  namespace: monitor
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: 10901
  - name: http
    port: 10902
    targetPort: 10902
  - name: remote-write
    port: 19291
    targetPort: 19291
  selector:
    app.kubernetes.io/component: database-write-hashring
    app.kubernetes.io/instance: thanos-receive
    app.kubernetes.io/name: thanos-receive

修改prometheus-operator

官网找半天没找到prometheus的remote write地址格式,只好去看源码了,如下

source

由此可得接口为:ip:port/v1/receive ,使用svc地址

 remoteWrite:
    - url: http://thanos-receive:19291/api/v1/receive

helm upgrade prometheus -n monitor .


安装Query

thanos-query.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query
    app.kubernetes.io/version: v0.12.2
  name: thanos-query
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: query-layer
      app.kubernetes.io/instance: thanos-query
      app.kubernetes.io/name: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/component: query-layer
        app.kubernetes.io/instance: thanos-query
        app.kubernetes.io/name: thanos-query
        app.kubernetes.io/version: v0.12.2
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                  - thanos-query
              namespaces:
              - monitor
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - query
        - --log.level=debug
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:9090
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --store=dnssrv+_grpc._tcp.thanos-receive.monitor.svc.cluster.local
        - --store=dnssrv+_grpc._tcp.thanos-rule.monitor.svc.cluster.local
        - --store=dnssrv+_grpc._tcp.thanos-store.monitor.svc.cluster.local
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 9090
            scheme: HTTP
          periodSeconds: 30
        name: thanos-query
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 9090
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 9090
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
      terminationGracePeriodSeconds: 120
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query
    app.kubernetes.io/version: v0.12.2
  name: thanos-query
  namespace: monitor
spec:
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 9090
    targetPort: http
  selector:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query

部署store

thanos-store.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store
    app.kubernetes.io/version: v0.12.2
  name: thanos-store
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: object-store-gateway
      app.kubernetes.io/instance: thanos-store
      app.kubernetes.io/name: thanos-store
  serviceName: thanos-store
  template:
    metadata:
      labels:
        app.kubernetes.io/component: object-store-gateway
        app.kubernetes.io/instance: thanos-store
        app.kubernetes.io/name: thanos-store
        app.kubernetes.io/version: v0.12.2
    spec:
      containers:
      - args:
        - store
        - --data-dir=/var/thanos/store
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --objstore.config=$(OBJSTORE_CONFIG)
        env:
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objectstorage
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 8
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-store
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/store
          name: data
          readOnly: false
      terminationGracePeriodSeconds: 120
      volumes: 
      - name: data
        emptyDir: {}

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store
    app.kubernetes.io/version: v0.12.2
  name: thanos-store
  namespace: monitor
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: 10901
  - name: http
    port: 10902
    targetPort: 10902
  selector:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store

部署rule

thanos-rule.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: rule-evaluation-engine
    app.kubernetes.io/instance: thanos-rule
    app.kubernetes.io/name: thanos-rule
    app.kubernetes.io/version: v0.12.2
  name: thanos-rule
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: rule-evaluation-engine
      app.kubernetes.io/instance: thanos-rule
      app.kubernetes.io/name: thanos-rule
  serviceName: thanos-rule
  template:
    metadata:
      labels:
        app.kubernetes.io/component: rule-evaluation-engine
        app.kubernetes.io/instance: thanos-rule
        app.kubernetes.io/name: thanos-rule
        app.kubernetes.io/version: v0.12.2
    spec:
      containers:
      - args:
        - rule
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --objstore.config=$(OBJSTORE_CONFIG)
        - --data-dir=/var/thanos/rule
        - --label=rule_replica="$(NAME)"
        - --alert.label-drop="rule_replica"
        - --query=dnssrv+_http._tcp.thanos-query.monitor.svc.cluster.local
        env:
        - name: NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objectstorage
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 24
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        name: thanos-rule
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 18
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/rule
          name: data
          readOnly: false
      volumes: 
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: rule-evaluation-engine
    app.kubernetes.io/instance: thanos-rule
    app.kubernetes.io/name: thanos-rule
    app.kubernetes.io/version: v0.12.2
  name: thanos-rule
  namespace: monitor
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 10902
    targetPort: http
  selector:
    app.kubernetes.io/component: rule-evaluation-engine
    app.kubernetes.io/instance: thanos-rule
    app.kubernetes.io/name: thanos-rule

部署bucke

thanos-bucke.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: object-store-bucket-debugging
    app.kubernetes.io/instance: thanos-bucket
    app.kubernetes.io/name: thanos-bucket
    app.kubernetes.io/version: v0.12.2
  name: thanos-bucket
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: object-store-bucket-debugging
      app.kubernetes.io/instance: thanos-bucket
      app.kubernetes.io/name: thanos-bucket
  template:
    metadata:
      labels:
        app.kubernetes.io/component: object-store-bucket-debugging
        app.kubernetes.io/instance: thanos-bucket
        app.kubernetes.io/name: thanos-bucket
        app.kubernetes.io/version: v0.12.2
    spec:
      containers:
      - args:
        - bucket
        - web
        - --objstore.config=$(OBJSTORE_CONFIG)
        env:
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objectstorage
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-bucket
        ports:
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
      terminationGracePeriodSeconds: 120
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: object-store-bucket-debugging
    app.kubernetes.io/instance: thanos-bucket
    app.kubernetes.io/name: thanos-bucket
    app.kubernetes.io/version: v0.12.2
  name: thanos-bucket
  namespace: monitor
spec:
  ports:
  - name: http
    port: 10902
    targetPort: http
  selector:
    app.kubernetes.io/component: object-store-bucket-debugging
    app.kubernetes.io/instance: thanos-bucket
    app.kubernetes.io/name: thanos-bucket

部署compact

thanos-compact.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: database-compactor
    app.kubernetes.io/instance: thanos-compact
    app.kubernetes.io/name: thanos-compact
    app.kubernetes.io/version: v0.12.2
  name: thanos-compact
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: database-compactor
      app.kubernetes.io/instance: thanos-compact
      app.kubernetes.io/name: thanos-compact
  serviceName: thanos-compact
  template:
    metadata:
      labels:
        app.kubernetes.io/component: database-compactor
        app.kubernetes.io/instance: thanos-compact
        app.kubernetes.io/name: thanos-compact
        app.kubernetes.io/version: v0.12.2
    spec:
      containers:
      - args:
        - compact
        - --wait
        - --objstore.config=$(OBJSTORE_CONFIG)
        - --data-dir=/var/thanos/compact
        - --debug.accept-malformed-index
        env:
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objectstorage
        image: quay.io/thanos/thanos:v0.12.2
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-compact
        ports:
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/compact
          name: data
          readOnly: false
      terminationGracePeriodSeconds: 120
      volumes: 
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: database-compactor
    app.kubernetes.io/instance: thanos-compact
    app.kubernetes.io/name: thanos-compact
    app.kubernetes.io/version: v0.12.2
  name: thanos-compact
  namespace: monitor
spec:
  ports:
  - name: http
    port: 10902
    targetPort: http
  selector:
    app.kubernetes.io/component: database-compactor
    app.kubernetes.io/instance: thanos-compact
    app.kubernetes.io/name: thanos-compact

创建ingress

thanos-ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: thanos-ingress
  namespace: monitor
spec:
  rules:
  - host: thanos.joey.com
    http:
      paths:
      - backend:
          serviceName: thanos-query
          servicePort: 9090

整个部署文档bucket,compact,rule,store组件都一样,query除了store地址不一样其他都一样

打开thanos界面,可见各看到reveice信息

thanos-receive

至此,两种方式部署完成

参数稍后解释