kubeadm 部署 Kubernetes 1.18.15 集群
环境说明
kubeadm 在1.13 版本以后正式进入GA,本文基于此方式来部署
集群节点网络规划:
192.168.10.11 k8s-master1
192.168.10.12 k8s-node1
192.168.10.13 k8s-node2
192.168.10.14 k8s-node3
PodIp网段: 10.10.0.0/16 ClusterIp:172.20.0.0/16
部署集群
安装docker
安装系统工具:
yum install -y yum-utils device-mapper-persistent-data lvm2
添加软件源信息:
docker : yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
kubernetes: /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
更新并安装:
yum makecache fast
yum -y install docker-ce
配置docker
新建docker目录:
mkdir /etc/docker /data/docker -p
创建daemon.json文件:
{
"bip": "172.20.0.1/16",
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": ["https://zfcwod7i.mirror.aliyuncs.com"],
"data-root": "/data/docker",
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "5"
},
"dns-search": ["default.svc.cluster.local", "svc.cluster.local", "localdomain"],
"dns-opts": ["ndots:2", "timeout:2", "attempts:2"]
}
启动docker:
systemctl enable docker
systemctl start docker
systemctl status docker
系统配置
关闭防火墙及selinux:
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -ri 's#(SELINUX=).*#\1disabled#' /etc/selinux/config
关闭虚拟内存:
swapoff -a 或者/etc/fstab 里面注释关于swap的设置
配置主机名和hosts:
hostnamectl --static set-hostname k8s-master1
hostnamectl --static set-hostname k8s-node1
hostnamectl --static set-hostname k8s-node2
hostnamectl --static set-hostname k8s-node3
开启内核namespace的支持:
grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
修改内核参数:
/etc/sysctl.d/docker.conf:
net.ipv4.ip_forward=1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1
vm.swappiness=0
sysctl --system&&sysctl -p
修改kubernetes内核优化:
/etc/sysctl.d/kubernetes.conf:
net.netfilter.nf_conntrack_max = 10485760
net.core.netdev_max_backlog = 10000
net.ipv4.neigh.default.gc_thresh1 = 80000
net.ipv4.neigh.default.gc_thresh2 = 90000
net.ipv4.neigh.default.gc_thresh3 = 100000
sysctl --system&&sysctl -p
参数说明:
net.netfilter.nf_conntrack_max: conntrack 连接跟踪数最大数量
net.core.netdev_max_backlog: 允许送到队列的数据包的最大数目
net.ipv4.neigh.default.gc_thresh1: ARP 高速缓存中的最少层数
net.ipv4.neigh.default.gc_thresh2: ARP 高速缓存中的最多的记录软限制
net.ipv4.neigh.default.gc_thresh3: ARP 高速缓存中的最多记录的硬限制
配置IPVS模块:
/etc/sysconfig/modules/ipvs.modules:
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
chmod 755 /etc/sysconfig/modules/ipvs.modules
bash /etc/sysconfig/modules/ipvs.modules
lsmod | grep -e ip_vs -e nf_conntrack_ipv4
部署master节点
安装及配置相关工具
所有节点安装:
yum -y install tc ipvsadm ipset kubeadm-1.18.15 kubectl-1.18.15 kubelet-1.18.15
开机自启:
systemctl enable kubelet
配置kubectl命令补全功能:
安装bash-completion:
yum -y install bash-completion
配置 bashrc:
vim /etc/bashrc
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
. /etc/bashrc
修改证书期限
下载源码:
wget https://github.com/kubernetes/kubernetes/archive/v1.18.15.tar.gz
下载镜像编译kubeadm:
docker run -itd zhuyuhua/kube-cross:v1.15.2-1
将kubernetes源码复制到镜像内:
docker cp kubernetes 4e:/go/
修改日期:
staging/src/k8s.io/client-go/util/cert/cert.go(NotAfter: now.Add(duration365d * 10).UTC(),此处已是十年)
cmd/kubeadm/app/constants/constants.go (修改为CertificateValidity = time.Hour * 24 * 365 * 10)
编译kubeadm:
make all WHAT=cmd/kubeadm GOFLAGS=-v
将编译完成的kubeadm覆盖所有宿主机上的kubeadm:
docker cp 4e:/go/kubernetes/_output/local/bin/linux/amd64/kubeadm /usr/bin/kubeadm
修改kubeadm 配置信息
查看kubeadm init的yaml配置信息
kubeadm config print init-defaults --component-configs KubeletConfiguration
kubeadm config print init-defaults --component-configs KubeProxyConfiguration
导出配置信:
kubeadm config print init-defaults > kubeadm-init.yaml
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.10.11
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8s-master1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
extraArgs:
audit-log-maxage: "20"
audit-log-maxbackup: "10"
audit-log-maxsize: "100"
audit-log-path: "/var/log/kube-audit/audit.log"
audit-policy-file: "/etc/kubernetes/audit-policy.yaml"
audit-log-format: json
extraVolumes:
- name: "audit-config"
hostPath: "/etc/kubernetes/audit-policy.yaml"
mountPath: "/etc/kubernetes/audit-policy.yaml"
readOnly: true
pathType: "File"
- name: "audit-log"
hostPath: "/var/log/kube-audit"
mountPath: "/var/log/kube-audit"
pathType: "DirectoryOrCreate"
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: local-k8sclus
controlPlaneEndpoint: "192.168.10.11:6443"
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /data/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.18.15
networking:
dnsDomain: cluster.local
podSubnet: 10.10.0.0/16
serviceSubnet: 172.17.0.0/16
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- 10.10.0.10
clusterDomain: cluster.local
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
minSyncPeriod: 5s
syncPeriod: 5s
scheduler: "lc"
参数说明:
advertiseAddress:本机ip地址
controlPlaneEndpoint:masterIP(若master是集群则是VIP)
scheduler: ipvs调度规则,例如rr,wrr,lc等
若使用外部的etcd,则配置为:
etcd:
external:
endpoints:
- https://ETCD_0_IP:2379
- https://ETCD_1_IP:2379
- https://ETCD_2_IP:2379
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
创建审计策略文件(/etc/kubernetes/audit-policy.yaml):
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages:
- "RequestReceived"
rules:
- level: RequestResponse
resources:
- group: ""
resources: ["pods"]
- level: Metadata
resources:
- group: ""
resources: ["pods/log", "pods/status"]
- level: None
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: ""
resources: ["endpoints", "services"]
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*"
- "/version"
- level: Request
resources:
- group: ""
resources: ["configmaps"]
namespaces: ["kube-system"]
- level: Metadata
resources:
- group: ""
resources: ["secrets", "configmaps"]
- level: Request
resources:
- group: ""
- group: "extensions"
- level: Metadata
omitStages:
- "RequestReceived"
初始化集群
初始化集群
kubeadm init --config kubeadm-init.yaml --upload-certs
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 192.168.10.11:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:32d68bf3b10297fdfccdc305884db221917f44a5923291b0c9a18aaff8930d9c \
--control-plane --certificate-key 53f41e0118810ce3a2746b018b27a8fb0b1cd77005876d27e74c938c6f53695f
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.10.11:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:32d68bf3b10297fdfccdc305884db221917f44a5923291b0c9a18aaff8930d9c
–upload-certs: 用来将在所有控制平面实例之间的共享证书上传到集群(要不你就手动复制证书)
若init过程中出现kubelet的相关错误,则修改docker或者kubelet的cgroup 驱动程序,错误如下:
error: failed to run Kubelet: failed to create kubelet:
misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
修改kubelet的驱动程序(不要去动kubelet的系统里面的文件(/etc/sysconfig/kubelet或者/usr/lib/systemd/system/kubelet.service),因为kubelet已经弃用了–cgroup-driver标志),若已经init过则配置被保存在/var/lib/kubelet/config.yaml中并上传到configmap里面(名字命名kubelet-config-1.X),如下字段(或者在init的文件中也是以下字段修改),修改后重启即可:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
第二种解决模式,kubeadm 将环境文件写入/var/lib/kubelet/kubeadm-flags.env
根据提示拷贝权限文件:
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
查看集群状态:
kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
若你们的是Unhealthy connnection refused ,是因为在新版本中为了保障端口的使用安全,关闭了默认的10251等端口号,不影响集群使用,若需要打开,则去/etc/kubernetes/manifests/ 目录下的各组件的配置文件中修改为–insecure-port=0重启组件即可
查看集群其他组件状态:
kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-546565776c-j5862 0/1 Pending 0 17m
coredns-546565776c-qrkn7 0/1 Pending 0 17m
etcd-k8s-master1 1/1 Running 0 17m
kube-apiserver-k8s-master1 1/1 Running 0 17m
kube-controller-manager-k8s-master1 1/1 Running 0 17m
kube-proxy-jnzj4 1/1 Running 0 17m
kube-scheduler-k8s-master1 1/1 Running 0 17m
因为没有安装网络插件,coredns处于pending状态,node处于NotReady 状态
测试apiserver是否正常:
curl -k https://192.168.10.11:6443
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
此时单节点master部署完成,若需要做高可用,则需要注意一下问题:
1:将audit-policy.yaml拷贝至相同路径
2:将kubeadm-init.yaml中apiserver地址改为master集群的VIP地址
3:kubeadm init后显示的--control-plane 为 加入Master集群,显示的token有时效性,失效了使用kubeadm token create –print-join-command 创建新的 join token
执行命令加入master集群:
kubeadm join 192.168.10.11:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:32d68bf3b10297fdfccdc305884db221917f44a5923291b0c9a18aaff8930d9c \
--control-plane --certificate-key 53f41e0118810ce3a2746b018b27a8fb0b1cd77005876d27e74c938c6f53695f \
--apiserver-advertise-address 192.168.10.11 \
--apiserver-bind-port 6443
然后一样的拷贝config权限文件
部署node节点
node节点执行命令加入集群:
kubeadm join 192.168.10.11:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:32d68bf3b10297fdfccdc305884db221917f44a5923291b0c9a18aaff8930d9c
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
查看节点信息:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master1 NotReady master 27m v1.18.15
k8s-node1 NotReady <none> 43s v1.18.15
k8s-node2 NotReady <none> 17s v1.18.15
k8s-node3 NotReady <none> 15s v1.18.15
查看验证证书:
kubeadm alpha certs check-expiration
若后续需要更新证书,则所有master节点执行:
kubeadm alpha certs renew all
部署网络插件
使用flannel,下载flannel的yaml文件:
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
修改配置:
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.flannel.unprivileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
privileged: false
volumes:
- configMap
- secret
- emptyDir
- hostPath
allowedHostPaths:
- pathPrefix: "/etc/cni/net.d"
- pathPrefix: "/etc/kube-flannel"
- pathPrefix: "/run/flannel"
readOnlyRootFilesystem: false
runAsUser:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
fsGroup:
rule: RunAsAny
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
allowedCapabilities: ['NET_ADMIN']
defaultAddCapabilities: []
requiredDropCapabilities: []
hostPID: false
hostIPC: false
hostNetwork: true
hostPorts:
- min: 0
max: 65535
seLinux:
rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.10.0.0/16",
"Backend": {
"Type": "host-gw"
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-amd64
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- amd64
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-arm64
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- arm64
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-arm64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-arm64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-arm
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- arm
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-arm
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-arm
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-ppc64le
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- ppc64le
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-ppc64le
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-ppc64le
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds-s390x
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- s390x
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-s390x
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.mirrors.ustc.edu.cn/coreos/flannel:v0.12.0-s390x
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
字段说明:
1: 修改pods网络段与网络模式(host-gw)
2: 修改镜像地址(quay太慢,使用中科大的镜像:docker pull quay.mirrors.ustc.edu.cn/xxx)
应用yaml:
kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
查看flannel是否正常:
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-546565776c-2pjwn 0/1 Pending 0 5m21s
coredns-546565776c-hl74s 0/1 Pending 0 5m21s
etcd-k8s-master1 1/1 Running 0 5m30s
kube-apiserver-k8s-master1 1/1 Running 0 5m30s
kube-controller-manager-k8s-master1 1/1 Running 0 5m30s
kube-flannel-ds-amd64-cq9tt 0/1 Init:0/1 0 29s
kube-flannel-ds-amd64-vlzbc 0/1 Init:0/1 0 29s
kube-flannel-ds-amd64-xvqf7 0/1 Init:0/1 0 29s
kube-flannel-ds-amd64-z2k2g 0/1 Init:0/1 0 29s
kube-proxy-46hvp 1/1 Running 0 5m21s
kube-proxy-48vbq 1/1 Running 0 2m56s
kube-proxy-ghgg2 1/1 Running 0 2m55s
kube-proxy-whsrr 1/1 Running 0 2m59s
kube-scheduler-k8s-master1 1/1 Running 0 5m30s
正在初始化中,等待完成
此时查看节点皆以正常:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master1 Ready master 53m v1.18.15
k8s-node1 Ready <none> 27m v1.18.15
k8s-node2 Ready <none> 26m v1.18.15
k8s-node3 Ready <none> 26m v1.18.15
注意:如果安装过程中出现问题, 无论是Master还是Node, 都可以执行 kubeadm reset 命令进行重置
但不会清理CNI配置,iptables或IPVS表和kubeconfig文件
优化DNS
NodeLocal DNSCache(请查看官方文档):
1:通过在集群节点上作为DaemonSet运行dns缓存代理来提高集群DNS性能
2: Pods将可以访问在同一节点上运行的dns缓存代理, 从而避免了iptables DNAT 规则和连接跟踪,本地缓存代理将查询 kube-dns 服务以获取集群主机名的缓存缺失(默认为 cluster.local 后缀)
部署NodeLocal DNSCache:
wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-local-dns
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns-upstream
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "KubeDNSUpstream"
spec:
ports:
- name: dns
port: 53
protocol: UDP
targetPort: 53
- name: dns-tcp
port: 53
protocol: TCP
targetPort: 53
selector:
k8s-app: kube-dns
---
apiVersion: v1
kind: ConfigMap
metadata:
name: node-local-dns
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
data:
Corefile: |
cluster.local:53 {
errors
cache {
success 9984 30
denial 9984 5
}
reload
loop
bind 172.17.0.10 169.254.20.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
health 172.17.0.10:8080
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 172.17.0.10 169.254.20.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 172.17.0.10 169.254.20.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 172.17.0.10 169.254.20.10
forward . __PILLAR__UPSTREAM__SERVERS__ {
force_tcp
}
prometheus :9253
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-local-dns
namespace: kube-system
labels:
k8s-app: node-local-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
updateStrategy:
rollingUpdate:
maxUnavailable: 10%
selector:
matchLabels:
k8s-app: node-local-dns
template:
metadata:
labels:
k8s-app: node-local-dns
annotations:
prometheus.io/port: "9253"
prometheus.io/scrape: "true"
spec:
priorityClassName: system-node-critical
serviceAccountName: node-local-dns
hostNetwork: true
dnsPolicy: Default # Don't use cluster DNS.
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
- effect: "NoExecute"
operator: "Exists"
- effect: "NoSchedule"
operator: "Exists"
containers:
- name: node-cache
image: sqeven/k8s-dns-node-cache-amd64:1.15.10
resources:
requests:
cpu: 25m
memory: 5Mi
args: [ "-localip", "172.17.0.10,169.254.20.10", "-conf", "/etc/Corefile", "-upstreamsvc", "kube-dns-upstream" ]
securityContext:
privileged: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9253
name: metrics
protocol: TCP
livenessProbe:
httpGet:
host: 172.17.0.10
path: /health
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 5
volumeMounts:
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
- name: config-volume
mountPath: /etc/coredns
- name: kube-dns-config
mountPath: /etc/kube-dns
volumes:
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
- name: kube-dns-config
configMap:
name: kube-dns
optional: true
- name: config-volume
configMap:
name: node-local-dns
items:
- key: Corefile
path: Corefile.base
字段说明:
__PILLAR__DNS__SERVER__ 设置为 coredns svc 的 IP
__PILLAR__LOCAL__DNS__ 设置为本地链接IP(默认为169.254.20.10)
__PILLAR__DNS__DOMAIN__ 设置为群集域(默认为cluster.local)
__PILLAR__CLUSTER__DNS__ kube-dns 的 ConfigMap中设置(这里不用管)
__PILLAR__UPSTREAM__SERVERS__ Upstream Server 中的配置(这里不用管)
这里使用 DaemonSet 部署 node-local-dns 使用了 hostNetwork=true,会占用宿主机的 8080 端口 创建服务:
kubectl apply -f nodelocaldns.yaml
kubectl get pods -n kube-system |grep node-local-dns
node-local-dns-8m7th 1/1 Running 0 3m48s
node-local-dns-kzhtt 1/1 Running 0 3m48s
node-local-dns-l9bpj 1/1 Running 0 3m47s
node-local-dns-q8cck 1/1 Running 0 3m47s
修改 kubelet 的 –cluster-dns 参数:
vim /var/lib/kubelet/config.yaml
clusterDNS:
- 169.254.20.10
重启kubelet:
systemctl daemon-reload && systemctl restart kubelet
操作实践
查看集群状态
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master1 Ready master 119m v1.18.15
k8s-node1 Ready <none> 93m v1.18.15
k8s-node2 Ready <none> 92m v1.18.15
k8s-node3 Ready <none> 92m v1.18.15
查看etcd状态
装一个etcd工具或者直接使用容器里面
export ETCDCTL_API=3
etcdctl -w table \
--endpoints=https://k8s-master1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint status
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://k8s-master1:2379 | 6571fb7574e87dba | 3.4.3 | 2.7 MB | true | false | 2 | 21290 | 21290 | |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
备份及恢复etcd数据
单机备份
ETCDCTL_API=3 etcdctl -w table --endpoints=https://k8s-master1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /data/backup/etcd/snapshot.db
单机恢复
1: 停止所有 Master 上 kube-apiserver 服务
2: 停止etcd服务
3: 恢复服务: ETCDCTL_API=3 etcdctl snapshot restore /data/backup/etcd/snapshot.db --name etcd-k8s-master1 --data-dir=/data/etcd
查看ipvs状态
ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.17.0.1:443 wrr
-> 192.168.10.11:6443 Masq 1 3 0
TCP 172.17.0.10:53 wrr
-> 10.10.0.2:53 Masq 1 0 0
TCP 172.17.0.10:9153 wrr
-> 10.10.0.2:9153 Masq 1 0 0
TCP 172.17.73.66:53 wrr
-> 10.10.0.2:53 Masq 1 0 0
-> 10.10.0.3:53 Masq 1 0 0
UDP 172.17.0.10:53 wrr
-> 10.10.0.2:53 Masq 1 0 0
UDP 172.17.73.66:53 wrr
-> 10.10.0.2:53 Masq 1 0 0
-> 10.10.0.3:53 Masq 1 0 0
查看所有pods运行状态
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-546565776c-j5862 1/1 Running 0 125m
kube-system coredns-546565776c-qrkn7 1/1 Running 0 125m
kube-system etcd-k8s-master1 1/1 Running 0 125m
kube-system kube-apiserver-k8s-master1 1/1 Running 0 125m
kube-system kube-controller-manager-k8s-master1 1/1 Running 0 125m
kube-system kube-flannel-ds-amd64-5c42t 1/1 Running 0 75m
kube-system kube-flannel-ds-amd64-ch4pz 1/1 Running 0 75m
kube-system kube-flannel-ds-amd64-drbjj 1/1 Running 0 75m
kube-system kube-flannel-ds-amd64-dtnlc 1/1 Running 0 75m
kube-system kube-proxy-5sw65 1/1 Running 0 98m
kube-system kube-proxy-bgkk2 1/1 Running 0 98m
kube-system kube-proxy-dk6xt 1/1 Running 0 98m
kube-system kube-proxy-jnzj4 1/1 Running 0 125m
kube-system kube-scheduler-k8s-master1 1/1 Running 0 125m
kube-system node-local-dns-8m7th 1/1 Running 0 17m
kube-system node-local-dns-kzhtt 1/1 Running 0 17m
kube-system node-local-dns-l9bpj 1/1 Running 0 17m
kube-system node-local-dns-q8cck 1/1 Running 0 17m
测试集群是否正常
创建nginx的测试yml nginx-test.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-dm
labels:
app: nginx
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 2
minReadySeconds: 120
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:alpine
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: http
resources:
limits:
cpu: 1000m
memory: 500Mi
requests:
cpu: 0.5
memory: 250Mi
volumeMounts:
- name: tz-config
mountPath: /etc/localtime
readOnly: true
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
preStop:
exec:
command: ["/usr/sbin/nginx","-s","quit"]
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 1
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 15
periodSeconds: 20
successThreshold: 1
failureThreshold: 3
volumes:
- name: tz-config
hostPath:
path: /etc/localtime
---
apiVersion: v1
kind: Service
metadata:
name: nginx-svc
labels:
app: nginx
spec:
ports:
- port: 80
name: http
targetPort: 80
protocol: TCP
selector:
app: nginx
参数说明:(更多参数移步官方文档)
RollingUpdate: 配置滚动升级策略
maxSurge: 生成n个新的pod完成后再删除1个旧的pod
maxUnavailable: 设置最多容忍个pods处于无法提供服务的状态
minReadySeconds: pods就绪时长(在该时间段内正常,则认为可用,再将旧pods删除
readinessProbe: 就绪探针,检测服务是否正常,异常将会从Endpoints列表删除(pods不重启),此时不接受svc流量
livenessProbe: 存活探针,检查服务是否正常,不正常将会被kill重启,一般情况下livenessProbe大于readinessProbe(比如某个pods网络问题,过一会儿就好了,这时候没必要删除)
initialDelaySeconds: 容器启动后多少秒开始检测
periodSeconds: 执行探测的频率,默认是10秒,最小1秒。
timeoutSeconds: 探测超时时间,默认1秒,最小1秒。
successThreshold: 探测失败后,最少连续探测成功多少次才被认定为成功,默认是1.对于liveness必须是1.最小值是1
failureThreshold: 探测成功后,最少连续探测失败多少次才被认定为失败,默认是3.最小值是1。
lifecycle: Pod hook(钩子)是由kubelet发起的,当容器中的进程启动前或者容器中的进程终止之前运行(比如说启动前获取配置文件,OOM时pods重启前打印堆栈信息)
resources: 资源限制,request请求资源,limits限制资源(超过将会被kill),cpu可压缩资源,mem不可压缩资源(一个核心分为1000m,0.5就是500m,比如机器是两核,这个还是表示500m)
应用部署并查看状态
kubectl apply -f nginx-test.yml
kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-dm-bb6d5c8f9-qsg2t 1/1 Running 0 91s
此时便可看见pods内/usr/share/message的信息:
kubectl exec -it nginx-dm-bb6d5c8f9-8649t -- cat /usr/share/message
Hello from the postStart handler
查看nginx svc信息
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.17.0.1 <none> 443/TCP 3h9m
nginx-svc ClusterIP 172.17.154.27 <none> 80/TCP 3m19s
访问nginx的svc地址
curl 172.17.154.27
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
查看及验证dns:
kubectl exec -it nginx-dm-bb6d5c8f9-8649t -- ping nginx-svc
PING nginx-svc (172.17.154.27): 56 data bytes
64 bytes from 172.17.154.27: seq=0 ttl=64 time=0.064 ms
64 bytes from 172.17.154.27: seq=1 ttl=64 time=0.147 ms
--- nginx-svc ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.064/0.105/0.147 ms
[root@k8s-master1 kubernetes]# kubectl exec -it nginx-dm-bb6d5c8f9-8649t -- cat /etc/resolv.conf
nameserver 169.254.20.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
此时发现ping nginx的svc name没问题,并且nameserver改为了169.254.20.10 至此,集群已部署完成,apiserver,etcd等组件的yaml文件处于/etc/kubernetes/manifests中,需要修改参数到此处修改yaml应用即可
部署Metrics-Server
下载yaml
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.4.1/components.yaml
修改镜像地址:registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.4.1
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.4.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
参数说明:需要再args下面添加如下选项
1:kubelet-insecure-tls 表示不验证客户端证书(10250是https端口,连接它时需要提供证书)
2: --kubelet-preferred-address-types=ExternalIP,InternalIP,Hostname,InternalDNS,ExternalDNS
表示metrics-server哪种方式连接到kebelet(外部IP,集群内部IP,主机名,外部dns,内部dns)
部署helm
直接使用helm3(helm2需要RBAC,需要tiller等一系列操作)
目前只需要[下载二进制文件](https://get.helm.sh/helm-v3.5.1-linux-amd64.tar.gz),然后放入bin目录即可
wget https://get.helm.sh/helm-v3.5.1-linux-amd64.tar.gz
cp -ra helm /usr/local/bin
注意:helm需要与kubectl一样的权限才能执行
部署traefik-v2
traefik的其他参数请参考官方文档
添加仓库
helm repo add traefik https://helm.traefik.io/traefik
下载chart
helm pull traefik/traefik
修改字段参数
修改values.yaml参数:
additionalArguments:
- "--log.level=WARN"
service:
enabled: true
type: ClusterIP
logs:
general:
level: WARN
access:
enabled: true
将web和websecure修改为80和443
修改deployment.yaml参数:
- "--entryPoints.metrics.address=:7070"
- "--metrics.prometheus.entryPoint=metrics"
- "--entrypoints.web.http.redirections.entryPoint.to=websecure"
- "--metrics.prometheus=true"
打开metrics端口号用于prometheus监控指标 强制性将http请求转发到https
安装traefik及测试
创建https证书
kubectl create secret tls traefik-certs --cert=podsbook.com.pem --key=podsbook.com.key -n traefik
helm部署在traefik中
helm install traefik -n traefik .
测试ingress是否可用 vim ingress-nginx.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: qa-nginx-ingressroute
namespace: default
spec:
entryPoints:
- websecure
routes:
- kind: Rule
match: Host(`podsbook.com`)
services:
- name: nginx-svc
port: 80
tls:
secreName: traefik-certs
应用并访问
kubectl apply -f ingress-nginx.yaml
日常运维操作
节点设置为不可调度(调度)
kubectl cordon k8s-node-1
kubectl uncordon k8s-node-1
将节点上的pod进行驱赶,宽限期120s
kubectl drain k8s-node-1 --grace-period=120 --ignore-daemonsets --delete-local-data
对节点进行打标签
kubectl label node k8s-node-1 service=java
获取pod名称等基础信息的变量
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
在业务的deployment.yaml中添加如上变量,即可直接在pod内的环境变量中根据变量名获取值
如何强制删除namespace
有时候因为一些奇奇怪怪的原因导致ns无法强制删除,则使用以下方法去处理
1:先导出要删除的ns的信息:
kubectl get namespace argocd -o json > argocd.json
{
"apiVersion": "v1",
"kind": "Namespace",
"metadata": {
"creationTimestamp": "2020-09-25T06:20:12Z",
"deletionTimestamp": "2020-10-29T09:12:46Z",
"managedFields": [
{
"apiVersion": "v1",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:status": {
"f:phase": {}
}
},
"manager": "kubectl",
"operation": "Update",
"time": "2020-09-25T06:20:12Z"
},
{
"apiVersion": "v1",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:status": {
"f:conditions": {
".": {},
"k:{\"type\":\"NamespaceContentRemaining\"}": {
".": {},
"f:lastTransitionTime": {},
"f:message": {},
"f:reason": {},
"f:status": {},
"f:type": {}
},
"k:{\"type\":\"NamespaceDeletionContentFailure\"}": {
".": {},
"f:lastTransitionTime": {},
"f:message": {},
"f:reason": {},
"f:status": {},
"f:type": {}
},
"k:{\"type\":\"NamespaceDeletionDiscoveryFailure\"}": {
".": {},
"f:lastTransitionTime": {},
"f:message": {},
"f:reason": {},
"f:status": {},
"f:type": {}
},
"k:{\"type\":\"NamespaceDeletionGroupVersionParsingFailure\"}": {
".": {},
"f:lastTransitionTime": {},
"f:message": {},
"f:reason": {},
"f:status": {},
"f:type": {}
},
"k:{\"type\":\"NamespaceFinalizersRemaining\"}": {
".": {},
"f:lastTransitionTime": {},
"f:message": {},
"f:reason": {},
"f:status": {},
"f:type": {}
}
}
}
},
"manager": "kube-controller-manager",
"operation": "Update",
"time": "2021-04-06T09:36:42Z"
}
],
"name": "argocd",
"resourceVersion": "50511231",
"selfLink": "/api/v1/namespaces/argocd",
"uid": "67ec7089-2cc0-4e7a-be73-e8a0a6e4d3cd"
},
"spec": {
"finalizers": [
"kubernetes"
]
},
"status": {
"conditions": [
{
"lastTransitionTime": "2021-04-06T09:36:42Z",
"message": "All resources successfully discovered",
"reason": "ResourcesDiscovered",
"status": "False",
"type": "NamespaceDeletionDiscoveryFailure"
},
{
"lastTransitionTime": "2020-10-29T09:12:52Z",
"message": "All legacy kube types successfully parsed",
"reason": "ParsedGroupVersions",
"status": "False",
"type": "NamespaceDeletionGroupVersionParsingFailure"
},
{
"lastTransitionTime": "2020-10-29T09:12:52Z",
"message": "All content successfully deleted, may be waiting on finalization",
"reason": "ContentDeleted",
"status": "False",
"type": "NamespaceDeletionContentFailure"
},
{
"lastTransitionTime": "2020-10-29T09:12:52Z",
"message": "Some resources are remaining: applications.argoproj.io has 2 resource instances",
"reason": "SomeResourcesRemain",
"status": "True",
"type": "NamespaceContentRemaining"
},
{
"lastTransitionTime": "2020-10-29T09:12:52Z",
"message": "Some content in the namespace has finalizers remaining: resources-finalizer.argocd.argoproj.io in 2 resource instances",
"reason": "SomeFinalizersRemain",
"status": "True",
"type": "NamespaceFinalizersRemaining"
}
],
"phase": "Terminating"
}
}
2:然后删除下面的字段信息,删除下面三行,其他字段保留
"finalizers": [
"kubernetes"
]
3:kube-proxy并通过api强制删除
kubectl proxy
curl -k -H "Content-Type: application/json" -X PUT --data-binary @argocd.json http://127.0.0.1:8001/api/v1/namespaces/argocd/finalize
至此可通过kubectl get ns 查看是否删除,若要强制删除其他类型的资源则使用kubectl patch -p ‘{“metadata”:{“finalizers”:null}}’ 即可
……未完待续