Kubernetes上手

本文主要介绍k8s的上手。k8s的易用度做的不够好,大版本的变化都会带来教程的失效。这篇文章结合我们过去的经验和实践重新进行了梳理。
同时,基于k8s的1.28版本部署。
参考如下:
k8s官网:https://kubernetes.io/docs/home/
极客时间:https://time.geekbang.org/column/article/39712
阿里云的镜像:https://developer.aliyun.com/mirror/kubernetes
另外,k8s的大量镜像需要从k8s、docker-huv、ghcr上拉取,你最好配置好自己的nexus,一劳永逸解决,配置方法参见nexus配置的一些实践

1.安装

参考阿里云的镜像:https://developer.aliyun.com/mirror/kubernetes
接下来的内容主要基于rocky linux9,基本和alma linux 9centos 9redhat 9一致,其他操作系统请参考阿里云的说明。

1.1.更新操作系统

1
2
3
4
5
6
7
8
9
# 添加epel源
$ dnf install -y https://mirrors.aliyun.com/epel/epel-release-latest-9.noarch.rpm;
# 修改epel源为阿里云
$ sed -i 's|^#baseurl=https://download.example/pub|baseurl=https://mirrors.aliyun.com|' /etc/yum.repos.d/epel*;
$ sed -i 's|^metalink|#metalink|' /etc/yum.repos.d/epel*;
# 如果你安装过docker,需要移除docker
$ dnf remove -y docker-ce docker-ce-cli containerd.io cri-o kubelet kubeadm kubectl cri-tools kubernetes-cni;rm -rf /etc/yum.repos.d/docker-ce.repo
# 安装
$ dnf clean all;dnf makecache -y;dnf update -y;dnf groupinstall -y 'development tools';dnf install -y go nfs-utils;

另外,你要保障/etc/hosts里正确了配置机器名和ip的映射。

1.2.准备工作

你最好有自己的nexus对docker仓库进行代理,我们这里假设你的docker私服是:docker.test.com

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# k8s-准备
# 关闭交换区,最好vim /etc/fstab ,将swap一行注释掉
$ swapoff -a;
$ /usr/bin/crb enable;setenforce 0;
# 关闭防火墙
$ systemctl stop firewalld.service;systemctl disable firewalld.service;sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config;
# k8s-加载插件
$ tee /etc/modules-load.d/k8s.conf <<-'EOF'
overlay
br_netfilter
EOF

$ modprobe overlay;modprobe br_netfilter;

# sysctl params required by setup, params persist across reboots
$ tee /etc/sysctl.d/k8s.conf <<-'EOF'
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
$ sysctl -p /etc/sysctl.d/k8s.conf;sysctl --system;

1.2.1.容器运行时:cri-o

使用cri-o,参考官方说明:https://github.com/cri-o/cri-o/blob/main/install.md#installation-instructions
关于容器运行时的选择,参考官方说明:https://v1-28.docs.kubernetes.io/docs/setup/production-environment/container-runtimes

1
2
3
4
5
6
7
8
9
10
#cri-o
$ curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable.repo https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/CentOS_9_Stream/devel:kubic:libcontainers:stable.repo
$ curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable:cri-o:1.28.2.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:1.28:1.28.2/CentOS_9_Stream/devel:kubic:libcontainers:stable:cri-o:1.28:1.28.2.repo
$ dnf install -y cri-o;
# 修改pause_image为docker.test.com/pause:3.9,并取消注释
$ vim /etc/crio/crio.conf
# 将私服地址添加到unqualified-search-registries
$ vim /etc/containers/registries.conf

$ systemctl enable crio && systemctl start crio;

不使用containerd和docker,常用命令和docker基本一致

1
2
3
4
# 对比docker ps -a
$ crictl ps -a
# 对比docker pull xxxxxx
$ crictl pull xxxxxx

1.2.2.【推荐】容器运行时:containerd

参考官方说明:https://github.com/containerd/containerd/blob/main/docs/getting-started.md
主要不要动过package manager安装,否则可能会安装到docker shim。

1
2
3
4
5
6
7
8
9
10
11
12
# 下载containerd,参见官方release:https://github.com/containerd/containerd/releases
$ curl "https://github.com/containerd/containerd/releases/download/v1.7.15/containerd-1.7.15-linux-amd64.tar.gz" -O
# 下载CNI插件,参见官方release:https://github.com/containernetworking/plugins/releases
$ curl "https://github.com/containernetworking/plugins/releases/download/v1.4.1/cni-plugins-linux-amd64-v1.4.1.tgz" -O
# 下载runc插件,参见官方release:https://github.com/opencontainers/runc/releases
$ curl "https://github.com/opencontainers/runc/releases/download/v1.1.12/runc.amd64" -O
# 解压containerd
$ tar Cxzvf /usr/local/ ./containerd-1.7.15-linux-amd64.tar.gz
# 安装runc
$ install -m 755 runc.amd64 /usr/local/sbin/runc
# 安装CNI插件
$ mkdir -p /opt/cni/bin;tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.4.1.tgz

将containerd添加到启动服务:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ vim /etc/systemd/system/containerd.service
# 粘贴如下内容
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd

Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

$ systemctl daemon-reload;systemctl enable --now containerd

cgroup切换为systemd

1
2
$ mkdir /etc/containerd/;containerd config default > /etc/containerd/config.toml;sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml;sed -i 's/registry.k8s.io\/pause:3.8/docker.test.com\/pause:3.9/g' /etc/containerd/config.toml;
$ systemctl restart containerd;

1.3.正式安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# k8s-添加k8s源
$ tee /etc/yum.repos.d/kubernetes.repo <<-'EOF'
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/rpm/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/rpm/repodata/repomd.xml.key
# 这里exclude的目的是,不能更新如下组件,但是首次安装时,又需要使用这个源,那么需要添加--disableexcludes参数
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF
# k8s-安装k8s核心组件,去除源里的exclude,手动安装;
$ dnf install -y --disableexcludes=kubernetes kubeadm kubelet kubectl ;
#将crictl指向containerd,cri-o则不需要这么操作
$ crictl config runtime-endpoint unix:///var/run/containerd/containerd.sock;crictl config image-endpoint unix:///var/run/containerd/containerd.sock;
# k8s-启动服务
$ systemctl enable --now kubelet;

2.初始化

你最好有自己的nexus对docker仓库进行代理,我们这里假设你的docker私服是:docker.test.com

2.1.需要了解的内容

查看初始化需要的镜像

1
2
3
4
5
6
7
8
# 可选-查看原始镜像
$ kubeadm config images list
# 下载镜像
$ kubeadm config images pull
# 可选-查看配置文件的镜像
$ kubeadm config images list --config kubeadm.yaml
# 可选-依据配置文件拉取镜像,检查你是否遇到了问题
$ kubeadm config images pull --config kubeadm.yaml

另外,有问题就重置

1
$ kubeadm reset -f;rm -rf /etc/cni /var/lib/cni/ /etc/kubernetes /var/lib/dockershim /var/lib/etcd /var/lib/kubelet /var/run/kubernetes ~/.kube/*;mkdir -p /etc/cni/net.d/;

参考:https://stackoverflow.com/questions/44698283/how-to-completely-uninstall-kubernetes

2.2.初始化master节点

1
kubeadm init --control-plane-endpoint 10.0.1.170:6443 --apiserver-advertise-address=10.0.1.170 --image-repository=docker.test.com --pod-network-cidr=10.244.0.0/16 --v=5 

需要注意:

  1. control-plane-endpoint和apiserver-advertise-address都使用本机ip,注意,/etc/hosts需要设置好机器名和本机ip映射
  2. pod-network-cidr是网络地址,默认为flannel网络插件的ip,不要修改
  3. imageRepository是一定要设置的,k8s需要从repo(默认是:https://registry.k8s.io)拉取镜像,你可以在自己的nexus设置https://registry.k8s.io的代理,然后这里填你自己的nexus地址
  4. v=5是日志级别

2.3.启动后续

2.3.1.证书

集群默认证书是1年,检查证书有效期:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#查看集群证书有效期
$ kubeadm certs check-expiration
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Feb 18, 2025 07:25 UTC 364d ca no
apiserver Feb 18, 2025 07:25 UTC 364d ca no
apiserver-etcd-client Feb 18, 2025 07:25 UTC 364d etcd-ca no
apiserver-kubelet-client Feb 18, 2025 07:25 UTC 364d ca no
controller-manager.conf Feb 18, 2025 07:25 UTC 364d ca no
etcd-healthcheck-client Feb 18, 2025 07:25 UTC 364d etcd-ca no
etcd-peer Feb 18, 2025 07:25 UTC 364d etcd-ca no
etcd-server Feb 18, 2025 07:25 UTC 364d etcd-ca no
front-proxy-client Feb 18, 2025 07:25 UTC 364d front-proxy-ca no
scheduler.conf Feb 18, 2025 07:25 UTC 364d ca no

CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Feb 16, 2034 07:06 UTC 9y no
etcd-ca Feb 16, 2034 07:06 UTC 9y no
front-proxy-ca Feb 16, 2034 07:06 UTC 9y no

虽然可以通过kubeadm certs renew all进行证书更新,但是每年需要操作。
可以通过如下脚本一次生成9年的证书,脚本参见:https://github.com/yuyicai/update-kube-cert/blob/master/update-kubeadm-cert.sh
用法:

1
chmod 755 ./update-kubeadm-cert.sh;./update-kubeadm-cert.sh all --cri containerd

会自动签发证书并重启kube-apiserver、kube-controller-manager、kube-scheduler、etcd

2.3.1.添加集群配置

启动时的日志可以看到如下提示,请逐一执行

1
2
3
4
5
6
7
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

添加环境变量

1
2
3
Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

下面是告诉你如何添加别的节点,请保留,后续在新的cluster上执行这句话

1
2
3
Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.1.11:6443 --token ennal9.tobyb8of2c8yeppx --discovery-token-ca-cert-hash sha256:cde119d31cbda65b693cd84cee70580764f09714e43e228d05d4e6cc1b50c8b1

提示你部署网络,这个稍后再处理

1
2
3
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

2.3.2.安装网络插件-flannel

参考:https://gist.github.com/rkaramandi/44c7cea91501e735ea99e356e9ae7883#configure-kubernetes-master

1
2
3
4
5
# 如果网络不通畅,可以手工下载
$ curl https://github.com/coreos/flannel/raw/master/Documentation/kube-flannel.yml -O
# 将插件里的的镜像地址修改为你的私服地址
$ sed -i 's/docker.io/docker.test.com/g' kube-flannel.yml
$ kubectl apply -f ./kube-flannel.yml

2.3.3.安装metric

1
2
3
4
5
6
7
8
9
10
11
12
$ curl https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics.yml
$ sed -i 's/registry.k8s.io/docker.test.com/g' ./metrics.yml
# 额外在如下部分追加--kubelet-insecure-tls
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls #添加

检查效果

1
2
3
4
5
6
7
8
9
#查看node负载
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master 232m 5% 1708Mi 46%
k8s-slave1 29m 1% 594Mi 34%
k8s-slave2 25m 1% 556Mi 32%
#查看pod负载
$ kubectl top pod

2.3.4.查看集群

全部是running状态,准备就绪

1
2
3
4
5
6
7
8
9
10
$ kubectl get nodes --show-labels 
NAME STATUS ROLES AGE VERSION
centos7 Ready control-plane,master 53s v1.22.4
# 查看所有pod
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-59c77d78dd-ghjnk 1/1 Running 0 52s
kube-system coredns-59c77d78dd-sb2lc 1/1 Running 0 52s
# 查看指定pod的详情
$ kubectl describe pod coredns-59c77d78dd-ghjnk -n kube-system

3.节点管理

3.1.查看

1
2
3
4
5
6
7
8
# 查看node
$ kubectl get nodes --show-labels
# 为node添加一个deploy-type的标签
$ kubectl label nodes test-node-03 deploy-type=dynamic-node
# 查看pod
$ kubectl get pods --all-namespaces
# 描述pod
$ kubectl describe pod coredns-59c77d78dd-fh47c -n kube-system

3.2.新增节点

1
2
$ kubeadm token create –print-join-command
$ kubeadm join 192.168.1.11:6443 --token ennal9.tobyb8of2c8yeppx --discovery-token-ca-cert-hash sha256:cde119d31cbda65b693cd84cee70580764f09714e43e228d05d4e6cc1b50c8b1

4.DashBoard

4.1.安装

官网:https://github.com/kubernetes/dashboard
这里主要通过ingress-nginx暴露

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ wget -e "https_proxy=http://你的代理" -O dashboard.yml https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
$ sed -i 's/image: /image: docker.test.com\//g' dashboard.yml
#添加80端口的访问
spec:
ports:
- port: 443
targetPort: 8443
name: https #添加
- port: 80 #添加
targetPort: 9090 #添加
name: http #添加
#修改探针为http
livenessProbe:
httpGet:
scheme: HTTP # HTTPS 改为HTTP
path: /
port: 9090 # 8443改为30080
#修改容器端口
ports:
- containerPort: 8443
protocol: TCP
- containerPort: 9090 #添加
protocol: TCP #添加
args:
#- --auto-generate-certificates # 注释掉SSL

#修改权限
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin #从kubernetes-dashboard改为cluster-admin,但这个会导致匿名即为管理员,可以不改
$ kubectl apply -f dashboard.yml

添加一个ingress代理:ingress-dashboard.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-dashboard
namespace: kubernetes-dashboard
spec:
ingressClassName: nginx
rules:
- host: "k8s-dashboard.dev.com" #这里修改为你的看板域名
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kubernetes-dashboard
port:
number: 9090 # 保持和dashboard暴露的http端口一致

最后kubectl apply -f ingress-dashboard.yml,即可通过k8s-dashboard.dev.com:30080访问看板。
注意这里做好域名的解析,另外,30080是ingress暴露的默认http端口,这里可以根据你的实际ingress端口来配置。

4.2.登录

权限介绍参见官网:https://github.com/kubernetes/dashboard#create-an-authentication-token-rbac
这里的权限介绍,也参考了:https://kuboard.cn/install/install-k8s-dashboard.html#访问
创建一个auth.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard

执行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 执行文件
$ kubectl apply -f ./auth.yml
# 查看token
$ kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}')
# 内容大致如下
Name: admin-user-token-p25dh
Namespace: kubernetes-dashboard
Labels: <none>
Annotations: kubernetes.io/service-account.name: admin-user
kubernetes.io/service-account.uid: 8b6ed1d6-05d4-44e5-b6be-1403f8e86d41

Type: kubernetes.io/service-account-token

Data
====
namespace: 20 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IndSZjhsaGZKeENYcmZrTlRnd29zbkFrdHNRdGVpYUJtQmFyYzdjajNqTWcifQ.eyJpc3MiOiJrdWJlcm5ldGdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6YWRtaW4tdXNlciJ9.1_7Zy7qHb26YhntIqQNBeAorxuu7hxEGXFBZUN4CjdZrJCKTdESt2QLR3BWh0EIoQHCLmCmqoRZQO-ti4BCsV1Gb_oC25iLyTW817HzGeUcfPRkmIc2KPYZrGZGj6Sp_zYKUgAxSAVkn4VsLSDkIaCW6n3yCfuzGM477qs4W4ziPWvFsSdzUbQy42cNcNuAv9YRqUQU7V5lOHw7ry6ort-X48De2fX1Z2_ZrJbIoeeH-c7V50le_Czy97gDCvysKsgQ3EqlZGgFZVIU5pC-ghM3YH99FGaL7avAyFnXkks6zQSaoH4Kbf_8qOWQ9uoS_N97AUp8VtByW6bcQwloT8w
ca.crt: 1099 bytes

5.StorageClass

推荐nfs做为默认存储。参考:https://cloud.tencent.com/developer/article/2365976
准备一台普通服务器。

1
2
3
4
5
$ dnf install nfs-utils -y ;
$ mkdir -p /data/k8s;
$ vim /etc/exports;
/data/k8s *(rw,no_root_squash)
$ systemctl enable nfs-server --now

k8s集成:
helm官方github:https://github.com/helm/helm
nfs官方仓库:https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#给你的nexus添加helm的代理:https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
$ helm repo add nfs-subdir-external-provisioner https://你的nexus/repository/helm-nfs/;
$ kubectl create ns nfs-system;
# 注意,不能直接helm install,因为里面的仓库的image地址需要修改
$ helm pull nfs-subdir-external-provisioner/nfs-subdir-external-provisioner;
$ tar xvf nfs-subdir-external-provisioner-4.0.18.tgz;
$ vim nfs-subdir-external-provisioner/values.yaml;
# 主要修改内容如下
image:
repository: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner #镜像拉取地址,默认可能拉取不下来,建议替换成本地或是其他可正常访问的仓库
tag: v4.0.2 #镜像 tag 默认为 v4.0.2,可根据实际情况替换
nfs:
server: 192.168.9.81 #指定 NFS 服务器的地址
path: /data/k8s #指定 NFS 导出的共享数据目录
storageClass:
defaultClass: true #是否设置为默认的 StorageClass,本示例没设置,有需要的可以设置为 true
name: nfs-sc
$ helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner -f nfs-subdir-external-provisioner/values.yaml -n nfs-system
# 准备一个卷测试:
$ vim zadig-minio-pv.yml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: zadig-minio-pv
spec:
storageClassName: nfs-sc
accessModes:
- ReadWriteMany
resources:
requests:
storage: 40Gi
$ kubectl apply -f zadig-minio-pv.yml -n nfs-system

检查结果

1
2
3
4
5
6
#检查k8s默认的storageclass
kubectl get storageclass
kubectl get sc nfs-sc -o wide;
kubectl get deployment -n nfs-system -o wide;
kubectl get pod -n nfs-system -o wide;
kubectl get pvc -n nfs-system -o wide;

6.coredns

如果使用私有dns,coredns将无法正确转发到私有dns:

1
2
# 将forwarder /etc/resolve.conf的部分修改为你的dns服务器ip
$ kubectl -n kube-system edit cm coredns

7.部署

7.1.镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
# 创建一个命名空间
$ kubectl create namespace java-qa
# 创建镜像拉取密钥
$ kubectl create secret generic my-pull-secret -n java-qa --from-literal=username=my_account --from-literal=password=my_password
# 查看所有pod
$ kubectl get pods -A
# 查看指定namespace的pod
$ kubectl get pod -n java-qa -o wide
# 查看部署信息
$ kubectl get deploy -n java-qa -o wide
# 查看指定详情
$ kubectl describe pod coredns-59c77d78dd-fh47c -n java-qa