基础概念

litmuschaos是一个云原生的混沌测试工具,专注于kubernetes集群进行模拟故障测试,发现集群及程序脆弱点从而提高集群及程序健壮性

组件

ChaosExperiment:简单来说就是定义一个该测试支持哪些操作,能传入什么参数,可对哪些类型的对象进行实现等CRD资源的清单
                 通常分为三种类别:通用的测试(比如内存,磁盘,删除等操作),程序的测试(比如针对于kafka来进行测试),平台测试(针对于某个云平台的测试: AWS,Azure,GCP)
ChaosEngine: 具有命名空间范围的自定资源,简单来说就是将ChaosExperiment实现的功能具体实现到哪个命名空间的哪些程序
Chaos Operator: 管理Litmus相关的CRD,监视管理ChaosEngine
Chaos Portal: 一个展示及管理页面(目前还是bete状态)

litmuschaos可以实现Pod及node内存,cpu,网络,磁盘IO等,以及k8s组件(coredns,kubelet,docker)等服务,以及对于不同的程序(比如openebs,kafka,prometheus等)专门的进行测试

安装

使用helm进行部署

添加LitmusChaos Helm存储库

helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm/

安装LitmusChos

kubectl create ns litmus
helm install chaos litmuschaos/litmus --namespace=litmus

查看是否安装了CRD资源

kubectl get crds | grep chaos
chaosengines.litmuschaos.io                           2021-03-16T09:43:45Z
chaosexperiments.litmuschaos.io                       2021-03-16T09:43:45Z
chaosresults.litmuschaos.io                           2021-03-16T09:43:45Z

kubectl api-resources | grep chaos
chaosengines                                   litmuschaos.io                 true         ChaosEngine
chaosexperiments                               litmuschaos.io                 true         ChaosExperiment
chaosresults                                   litmuschaos.io                 true         ChaosResult

安装需要的Chaos Experiments

litmus有很多的Experiments用于混沌测试,有通用的,针对于kafka的,coredns的,openebs等等,指定对test命名空间的程序进行测试

kubectl apply -f https://hub.litmuschaos.io/api/chaos/1.13.2?file=charts/generic/experiments.yaml -n test
kubectl get chaosexperiments -n test
NAME                      AGE
container-kill            29h
disk-fill                 29h
disk-loss                 29h
docker-service-kill       29h
k8-pod-delete             29h
k8-service-kill           29h
kubelet-service-kill      29h
node-cpu-hog              29h
node-drain                29h
node-io-stress            29h
node-memory-hog           29h
node-poweroff             29h
node-restart              29h
node-taint                29h
pod-autoscaler            29h
pod-cpu-hog               29h
pod-delete                29h
pod-io-stress             29h
pod-memory-hog            29h
pod-network-corruption    29h
pod-network-duplication   29h
pod-network-latency       29h
pod-network-loss          29h

可以看到通用的experiments包含以上测试操作,如果要测试kafka等其他程序的测试操作,得安装对应的CRD资源,部署完成,下面开始实战操作

操作实战

准备一个nginx的测试pod: nginx-limus.yaml

apiVersion: apps/v1
kind: Deployment 
metadata: 
  name: nginx
  namespace: test
  labels:
    app: nginx
spec: 
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 2
  minReadySeconds: 120
  selector:
    matchLabels:
      app: nginx
  template: 
    metadata: 
      labels: 
        app: nginx 
    spec: 
      containers: 
        - name: nginx 
          image: nginx:alpine 
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 80
              name: http
          resources:
            limits:
              cpu: 1000m
              memory: 500Mi
            requests:
              cpu: 0.5
              memory: 250Mi 
---
apiVersion: v1 

kind: Service
metadata: 
  name: nginx 
  namespace: test
  labels:
    app: nginx
spec: 
  ports: 
    - port: 80
      name: http
      targetPort: 80
      protocol: TCP 
  selector: 
    app: nginx

Pod failure相关测试(测试nginx的pod被delete或kill操作)

准备ChaosEngine及rabc文件,只有安装了对应的experiments才能进行具体的测试操作

创建Service Account文件:chaos-rbac.yaml

创建一个sa文件以保障有足够权限进行测试,你也可以创建一个总的sa,在chaosServiceAccount字段指定即可

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pod-delete-sa
  namespace: test
  labels:
    name: pod-delete-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-delete-sa
  namespace: test
  labels:
    name: pod-delete-sa
rules:
- apiGroups: [""]
  resources: ["pods","pods/exec","pods/log","events","replicationcontrollers"]
  verbs: ["create","list","get","patch","update","delete","deletecollection"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["create","list","get","delete","deletecollection"]
- apiGroups: ["apps"]
  resources: ["deployments","statefulsets","daemonsets","replicasets"]
  verbs: ["list","get"]
- apiGroups: ["apps.openshift.io"]
  resources: ["deploymentconfigs"]
  verbs: ["list","get"]
- apiGroups: ["argoproj.io"]
  resources: ["rollouts"]
  verbs: ["list","get"]
- apiGroups: ["litmuschaos.io"]
  resources: ["chaosengines","chaosexperiments","chaosresults"]
  verbs: ["create","list","get","patch","update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-delete-sa
  namespace: test
  labels:
    name: pod-delete-sa
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: pod-delete-sa
subjects:
- kind: ServiceAccount
  name: pod-delete-sa
  namespace: test

为了减少爆炸半径(影响过多的程序),对程序进行注释,ChaosOperator会在进行混沌测试时检查是否有次注释

kubectl annotate deploy/nginx litmuschaos.io/chaos="true" -n test

ChaosEngine将应用程序实例连接到Chaos Experiment

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: test
spec:
  appinfo:
    appns: 'test'
    applabel: 'app=nginx'
    appkind: 'deployment'
  annotationCheck: 'true'
  engineState: 'active'
  auxiliaryAppInfo: ''
  chaosServiceAccount: pod-delete-sa
  monitoring: false
  jobCleanUpPolicy: 'delete'
  experiments:
    - name: pod-delete #container-kill
      spec:
        components:
          env:
            - name: TOTAL_CHAOS_DURATION
              value: '30'
            - name: CHAOS_INTERVAL
              value: '10'
            - name: FORCE
              value: 'false'
            - name: PODS_AFFECTED_PERC
              value: '1'
            - name: RAMP_TIME
              value: '30'
            - name: SEQUENCE
              value: 'parallel'            

字段说明:

appns: 要执行对象所在的命名空间
experiments:要执行测试的名称(例如网络延迟测试,pod删除测试等),可用kubectl get chaosexperiments -n test进行查看支持的experiments
chaosServiceAccount:要使用的sa
jobCleanUpPolicy: 是否保留执行该次测试的job,字段可选为delete/retain
annotationCheck: 是否进行注释检查,如果不进行检查,则所有的pod都被进行测试,字段可选为true/false
engineState: 该次测试的状态,可被设置为active/stop
TOTAL_CHAOS_DURATION:混沌测试持续时间,默认15s
CHAOS_INTERVAL:混沌测试时间间隔,默认5s
FORCE:删除pod是否使用--force选项
TARGET_CONTAINER: 删除pod里面的某个容器(默认删除第一个)
PODS_AFFECTED_PERC:测试pod占总数的百分比,默认是0(相当于1个副本)
RAMP_TIME:在进行混沌测试前后需要等待的时间
SEQUENCE:测试执行策略,默认是并行(parallel)执行,可被设置为serial/parallel

此时查看test空间的变化: watch -n 1 kubectl get pods -n test

kubectl get pods -n test
NAME                        READY   STATUS              RESTARTS   AGE
nginx-6bf575b8fd-7xl65      1/1     Running             0          12m
nginx-chaos-runner          0/1     ContainerCreating   0          6s
nginx-dm-6bf575b8fd-4ljkr   1/1     Running             0          18m

NAME                        READY   STATUS              RESTARTS   AGE
nginx-6bf575b8fd-7xl65      1/1     Running             0          14m
nginx-chaos-runner          1/1     Running             0          100s
nginx-dm-6bf575b8fd-4ljkr   1/1     Running             0          20m
pod-delete-pptfhg-skhll     0/1     ContainerCreating   0          81s

NAME                        READY   STATUS    RESTARTS   AGE
nginx-6bf575b8fd-7xl65      1/1     Running   0          14m
nginx-chaos-runner          1/1     Running   0          2m18s
nginx-dm-6bf575b8fd-4ljkr   1/1     Running   0          20m
pod-delete-pptfhg-skhll     1/1     Running   0          119s

NAME                        READY   STATUS              RESTARTS   AGE
nginx-6bf575b8fd-7xl65      1/1     Running             0          14m
nginx-chaos-runner          1/1     Running             0          2m20s
nginx-dm-6bf575b8fd-4ljkr   1/1     Terminating         0          20m
nginx-dm-6bf575b8fd-kzqnc   0/1     ContainerCreating   0          2s
pod-delete-pptfhg-skhll     1/1     Running             0          2m1s

NAME                        READY   STATUS      RESTARTS   AGE
nginx-6bf575b8fd-7xl65      1/1     Running     0          15m
nginx-chaos-runner          1/1     Running     0          3m4s
nginx-dm-6bf575b8fd-jx9jw   1/1     Running     0          30s
pod-delete-pptfhg-skhll     0/1     Completed   0          2m45s
NAME                        READY   STATUS    RESTARTS   AGE
nginx-6bf575b8fd-7xl65      1/1     Running   0          15m
nginx-dm-6bf575b8fd-jx9jw   1/1     Running   0          31s

根据显示出pod的变化信息可知:先创建了一个runner的pod来运行delete操作(创建一个delete的pod来具体实现),然后据有注释的pod被删除然后创建,观察ChaosResult CR测试相关信息

kubectl describe chaosresult nginx-chaos-pod-delete -n test
Name:         nginx-chaos-pod-delete
Namespace:    test
Labels:       app.kubernetes.io/component=experiment-job
              app.kubernetes.io/part-of=litmus
              app.kubernetes.io/version=1.13.2
              chaosUID=d5a6cccf-cfae-4e20-aeec-e37d86188fc8
              controller-uid=c052d5ce-c4b7-4033-bf37-3a662ad29ba4
              job-name=pod-delete-pptfhg
              name=nginx-chaos-pod-delete
Annotations:  <none>
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosResult
Metadata:
  Creation Timestamp:  2021-03-17T03:48:42Z
  Generation:          2
  Managed Fields:
    API Version:  litmuschaos.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:app.kubernetes.io/component:
          f:app.kubernetes.io/part-of:
          f:app.kubernetes.io/version:
          f:chaosUID:
          f:controller-uid:
          f:job-name:
          f:name:
      f:spec:
        .:
        f:engine:
        f:experiment:
      f:status:
        .:
        f:experimentstatus:
        f:history:
    Manager:         experiments
    Operation:       Update
    Time:            2021-03-17T03:48:42Z
  Resource Version:  1628905
  Self Link:         /apis/litmuschaos.io/v1alpha1/namespaces/test/chaosresults/nginx-chaos-pod-delete
  UID:               e8824eec-f249-4398-9cca-42f6b873bc95
Spec:
  Engine:      nginx-chaos
  Experiment:  pod-delete
Status:
  Experimentstatus:
    Fail Step:                 N/A
    Phase:                     Completed
    Probe Success Percentage:  100
    Verdict:                   Pass
  History:
    Failed Runs:   0
    Passed Runs:   1
    Stopped Runs:  0
Events:
  Type    Reason   Age    From                     Message
  ----    ------   ----   ----                     -------
  Normal  Awaited  6m53s  pod-delete-pptfhg-skhll  experiment: pod-delete, Result: Awaited
  Normal  Pass     6m17s  pod-delete-pptfhg-skhll  experiment: pod-delete, Result: Pass

可以看到pass等字样

停止或重启测试

停止: kubectl patch chaosengine nginx-chaos -n test --type merge --patch '{"spec":{"engineState":"stop"}}'
重启: kubectl patch chaosengine nginx-chaos -n test --type merge --patch '{"spec":{"engineState":"active"}}'

网络相关测试

创建ChaosEngine清单(使用上面创建过的sa,偷个懒,不单独创建了): network-chaosengine.yaml

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-network-chaos
  namespace: test
spec: 
  jobCleanUpPolicy: 'delete'
  annotationCheck: 'false'
  engineState: 'active'
  monitoring: false
  appinfo: 
    appns: 'test'
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: pod-delete-sa
  experiments:
    - name: pod-network-latency #pod-network-latency(延迟),pod-network-loss(丢包),pod-network-corruption(损坏)
      spec:
        components:
          env:
            - name: NETWORK_INTERFACE
              value: 'eth0'     
            - name: NETWORK_LATENCY
              value: '2000'
            - name: TOTAL_CHAOS_DURATION
              value: '60'
            - name: CONTAINER_RUNTIME
              value: 'docker'
            - name: SOCKET_PATH
              value: '/var/run/docker.sock'

注释说明:

NETWORK_INTERFACE: 容器内部的网络接口
NETWORK_LATENCY: 网络延迟大小
CONTAINER_RUNTIME:runtime类型,支持docker, containerd, crio(pumba LIB仅支持docker)
NETWORK_PACKET_LOSS_PERCENTAGE: 丢包百分比(默认100%)
DESTINATION_IPS:service或pod的ip地址(默认根据label来随机选择)
NETWORK_PACKET_CORRUPTION_PERCENTAGE:数据包损坏百分比(默认100%)

Ping一下受影响的pod ip,会发现有些ping不同,或延迟过高(或者pod内部进行ping测试)

ping 10.10.175.251
PING 10.10.175.251 (10.10.175.251) 56(84) bytes of data.
From 10.10.175.192 icmp_seq=1 Destination Host Unreachable
From 10.10.175.192 icmp_seq=2 Destination Host Unreachable
From 10.10.175.192 icmp_seq=3 Destination Host Unreachable
From 10.10.175.192 icmp_seq=4 Destination Host Unreachable

Pod的CPU,内存的测试

创建ChaosEngine清单:cpu-chaosengine.yaml

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: test
spec:
  annotationCheck: 'false'
  engineState: 'active'
  appinfo:
    appns: 'test'
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: pod-delete-sa
  monitoring: false
  jobCleanUpPolicy: 'delete'
  experiments:
    - name: pod-cpu-hog
      spec:
        components:
          env:
            - name: CPU_CORES
              value: '1'
            - name: TOTAL_CHAOS_DURATION
              value: '60' 

注释说明:

CPU_CORES: 受影响的核心数
CHAOS_INJECT_COMMAND: cpu负载测试的命令(默认是md5sum /dev/zero)
CHAOS_KILL_COMMAND:停止进行测试的命令(默认是kill $(find /proc -name exe -lname '*/md5sum' 2>&1 | grep -v 'Permission denied' | awk -F/ '{print $(NF-1)}'))

通过监控等各种手段查看pod的cpu负载情况

创建ChaosEngine清单:mem-chaosengine.yaml

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: test
spec:
  annotationCheck: 'false'
  engineState: 'active'
  appinfo:
    appns: 'test'
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: pod-delete-sa
  monitoring: false
  jobCleanUpPolicy: 'delete'
  experiments:
    - name: pod-memory-hog
      spec:
        components:
          env:
            - name: MEMORY_CONSUMPTION
              value: '500'

            - name: TOTAL_CHAOS_DURATION
              value: '60'
            

注释说明:

MEMORY_CONSUMPTION: 占用pod的内存大小(默认是500M,最大2G)

磁盘相关的测试

测试之前最好先对pod的磁盘大小进行限制

创建ChaosEngine清单测试磁盘使用率增大:disk-chaosengine.yaml

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: test
spec:
  annotationCheck: 'false'
  engineState: 'active'
  auxiliaryAppInfo: ''
  appinfo:
    appns: 'test'
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: pod-delete-sa
  monitoring: false
  jobCleanUpPolicy: 'delete'
  experiments:
    - name: disk-fill
      spec:
        components:
          env:
            - name: FILL_PERCENTAGE
              value: '80'
            - name: TARGET_CONTAINER
              value: 'nginx'             

注释说明:中止测试后也不会回收空间,需要手动操作删除,但在自动测试完成后会自动回收

FILL_PERCENTAGE: 达到存储限制的百分比(比如说设置为存储使用达到80%)

节点的CPU,内存等测试

创建CPU rbac文件

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: node-cpu-hog-sa
  namespace: test
  labels:
    name: node-cpu-hog-sa
    app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-cpu-hog-sa
  labels:
    name: node-cpu-hog-sa
    app.kubernetes.io/part-of: litmus
rules:
- apiGroups: [""]
  resources: ["pods","pods/exec","pods/log","events"]
  verbs: ["create","list","get","patch","update","delete","deletecollection"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["create","list","get","delete","deletecollection"]
- apiGroups: ["litmuschaos.io"]
  resources: ["chaosengines","chaosexperiments","chaosresults"]
  verbs: ["create","list","get","patch","update"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get","list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-cpu-hog-sa
  labels:
    name: node-cpu-hog-sa
    app.kubernetes.io/part-of: litmus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: node-cpu-hog-sa
subjects:
- kind: ServiceAccount
  name: node-cpu-hog-sa
  namespace: test

创建CPU ChaosEngine清单:nodecpu-chaosengine.yaml

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: test
spec:
  annotationCheck: 'false'
  engineState: 'active'
  auxiliaryAppInfo: ''
  appinfo:
    appns: 'test'
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: node-cpu-hog-sa
  monitoring: false
  jobCleanUpPolicy: 'delete'
  experiments:
    - name: node-cpu-hog
      spec:
        components:
          env:
            - name: TOTAL_CHAOS_DURATION
              value: '60'
            - name: NODE_CPU_CORE
              value: '1'
            - name: TARGET_NODES
              value: 'k8s-node-5'

注释说明:

TARGET_NODES: k8s节点列表
NODE_CPU_CORE:  要消耗的节点CPU核心数(默认是2)
NODES_AFFECTED_PERC: 总结点数的百分比(默认是0,对应一个节点)

kubectl top nodes 可以看到该节点已经爆满了

NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8s-master-4   335m         16%    1252Mi          72%       
k8s-node-5     2001m        100%   1145Mi          31%       
k8s-node-6     162m         8%     1362Mi          37%       

创建内存 ChaosEngine清单:nodemem-chaosengine.yaml

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: test
spec:
  annotationCheck: 'false'
  engineState: 'active'
  auxiliaryAppInfo: ''
  appinfo:
    appns: 'test'
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: node-cpu-hog-sa
  monitoring: false
  jobCleanUpPolicy: 'delete'
  experiments:
    - name: node-memory-hog
      spec:
        components:
          env:
            - name: TOTAL_CHAOS_DURATION
              value: '120'
            - name: MEMORY_CONSUMPTION_PERCENTAGE
              value: '30'
            - name: TARGET_NODES
              value: 'k8s-node-5'

注释说明:

MEMORY_CONSUMPTION_PERCENTAGE: 测试占用节点总共多少内存百分比(默认为30)
MEMORY_CONSUMPTION_MEBIBYTES:测试使用多少byte的内存

kubectl top nodes 可以看到该节点内存炸了

创建节点驱逐的rbac文件:nodedrain-rbac.yaml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: node-drain-sa
  namespace: test
  labels:
    name: node-drain-sa
    app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-drain-sa
  labels:
    name: node-drain-sa
    app.kubernetes.io/part-of: litmus
rules:
- apiGroups: [""]
  resources: ["pods","pods/exec","pods/log","events","pods/eviction"]
  verbs: ["create","list","get","patch","update","delete","deletecollection"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["create","list","get","delete","deletecollection"]
- apiGroups: ["apps"]
  resources: ["daemonsets"]
  verbs: ["list","get","delete"]
- apiGroups: ["litmuschaos.io"]
  resources: ["chaosengines","chaosexperiments","chaosresults"]
  verbs: ["create","list","get","patch","update"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["patch","get","list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-drain-sa
  labels:
    name: node-drain-sa
    app.kubernetes.io/part-of: litmus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: node-drain-sa
subjects:
- kind: ServiceAccount
  name: node-drain-sa
  namespace: test

创建节点驱逐的ChaosEngine清单:nodedrain-chaosengine.yaml

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: test
spec:
  annotationCheck: 'false'
  engineState: 'active'
  auxiliaryAppInfo: ''
  appinfo:
    appns: 'test'
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: node-drain-sa
  monitoring: false
  jobCleanUpPolicy: 'delete'
  experiments:
    - name: node-drain
      spec:
        components:
          nodeSelector: 
            kubernetes.io/hostname: 'kubernetes.io/hostname=k8s-node-5'        
          env:
            - name: TARGET_NODE
              value: 'k8s-node-5'

watch kubectl get pods,nodes -n test -o wide 观察pod情况

kubectl get pods,nodes -n test -o wide
NAME                          READY   STATUS        RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
pod/nginx-6bf575b8fd-7ckvq    0/1     Pending       0          3m9s    <none>          <none>       <none>           <none>
pod/nginx-6bf575b8fd-cnrpr    0/1     Pending       0          3m10s   <none>          <none>       <none>           <none>
pod/nginx-6bf575b8fd-dpxzc    1/1     Running       0          107m    10.10.175.228   k8s-node-6   <none>           <none>
pod/nginx-chaos-runner        1/1     Terminating   0          3m41s   10.10.86.140    k8s-node-5   <none>           <none>
pod/node-drain-rbo5cy-k7qxl   0/1     Pending       0          3m10s   <none>          <none>       <none>           <none>

NAME                STATUS                     ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION           CONTAINER-RUNTIME
node/k8s-master-4   Ready                      master   7d23h   v1.19.8   192.168.8.4   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://18.9.9
node/k8s-node-5     Ready,SchedulingDisabled   <none>   7d23h   v1.19.8   192.168.8.5   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://18.9.9
node/k8s-node-6     Ready                      <none>   7d23h   v1.19.8   192.168.8.6   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://18.9.9

可以看到node5变得不可调度,且pod正在飘逸至其他地方

今天的水划到这里,litmus还有一部分概念没写出来,比如说ChaosResult,Litmus Probe等,概念比较多,后面慢慢完善把