服务器资源
IP地址 | 系统版本 | 说明 |
---|---|---|
192.168.13.247 | ubuntu22.04 | k8s-maser |
192.168.13.16 | ubuntu22.04 | k8s-worker-1 |
192.168.13.119 | ubuntu22.04 | k8s-worker-2 |
k8s版本
版本 | |
---|---|
kubeadm | 1.23.5-00 |
kubelet | 1.23.5-00 |
kubectl | 1.23.5-00 |
flannel | v0.22.3 |
环境准备
设置静态ip
若为云服务器或者ip固定,此步骤可跳过。当前操作是在本地安装了虚拟机,模拟集群环境。
查看网卡信息
# 若提示没有该工具,可使用sudo apt-get install net-tools 安装即可
ifconfig
# 主要信息如下:
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.13.247 netmask 255.255.248.0 broadcast 192.168.15.255
# 网卡名称 enp0s3
# 内网ip:enp0s3 192.168.13.247
查看掩码位数
# 查看路由相关信息
ip route show
# 此处子掩码为21位,如果配置错误,将无法联通内外网及互联网
192.168.8.0/21 dev enp0s3 proto kernel scope link src 192.168.13.247
配置网关IP及DNS
宿主机相关信息
此次虚拟机对应的模式为桥接模式,需要在宿主机上查看对应的关键信息,非虚拟机可跳过此步骤
# 切换至宿主机,打开cmd窗口,执行 ipconfig /all命令
ipconfig /all
# 关键信息如下
默认网关. . . . . . . . . . . . . : 192.168.11.1
DNS 服务器 . . . . . . . . . . . : 192.168.11.1
虚拟机配置
# 切换至虚拟机,修改对应配置
# 先备份原有配置文件
sudo cp /etc/netplan/00-installer-config.yaml 00-installer-config.yaml.origin
# 写入以下内容,注意:> 代表覆盖原有内容,>代表追加内容
sudo bash -c "cat >> /etc/netplan/00-installer-config.yaml" << \EOF
# 修改为以下内容
# This is the network config written by 'subiquity'
network:
ethernets:
# 找到对应的网卡配置修改即可,此次只有一个网卡 enp0s3
enp0s3:
# 设置为false,不分配动态ip
dhcp4: false
# 此处填写对应的ip地址及对应的掩码地址(掩码位数可执行ip route show 命令查看)
addresses:
- 192.168.13.247/21
routes:
# 此处填写网关地址(是虚拟机直接填写宿主机对应的网关地址)
- to: default
via: 192.168.11.1
nameservers:
addresses:
# 此处填写DNS地址(是虚拟机直接填写宿主机对应的DNS地址)
- 192.168.11.1
version: 2
EOF
# 应用配置
sudo netplan apply
apt镜像更改
备份软件源配置文件sources.list
sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
替换镜像
sudo vim /etc/apt/sources.list
# 替换原有全部内容,注意:> 代表覆盖原有内容,>代表追加内容
sudo bash -c "cat > /etc/netplan/00-installer-config.yaml" << |EOF
# 剔除替换为阿里云的镜像
# 镜像开始#
deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
# k8s相关镜像
deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main
# docker镜像
deb https://mirrors.aliyun.com/docker-ce/linux/ubuntu jammy InRelease
# 镜像结束
EOF
# 更新源
sudo apt-get update
sudo apt-get upgrade
# 执行前先更新软件列表
# 若提示 NO_PUBKEY xxx 则执行下述命令更新即可 xxx为具体的密匙,如:key is not available: NO_PUBKEY B53DC80D13EDEF05
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys B53DC80D13EDEF05
# 更新源
sudo apt-get update
其他
vi/vim相关操作可参考:Linux vi/vim | 菜鸟教程 (runoob.com)
关闭 swapoff
sudo swapoff -a
# 修改/etc/fstab,注释掉swap那行,持久化生效
sudo vim /etc/fstab
修改时区
sudo timedatectl set-timezone Asia/Shanghai
# 重启,使日志相关时间生效
sudo systemctl restart rsyslog
#安装ntpdate工具
sudo apt-get install ntpdate
#设置系统时间与网络时间同步
ntpdate cn.pool.ntp.org
#将系统时间写入硬件时间
hwclock --systohc
设置内核参数
# 首先确认你的系统已经加载了 br_netfilter 模块,默认是没有该模块的,需要你先安装 bridge-utils
sudo apt-get install -y bridge-utils
# 然后再使用 modprobe 加载一下
sudo modprobe br_netfilter
# lsmod 就能看到 br_netfilter 模块,此时再确认一下 内核参数 net.bridge.bridge-nf-call-iptables 是否为 1。
sudo lsmod | grep br_netfilter
# 在Ubuntu 20.04 Server上,这个值就是1。如果你的系统上不一致,使用下面的命令来修改:
# 设置所需的 sysctl 参数,参数在重新启动后保持不变
sudo cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
安装基础软件
安装 docker、containerd
# 卸载docker(未安装无需执行此步骤),参考:https://blog.csdn.net/qq_45495857/article/details/113743109
sudo apt-get autoremove docker docker-ce docker-engine docker.io containerd runc docker-ce-cli docker-compose-plugin docker-buildx-plugin
# 删除相关配置
dpkg -l |grep ^rc|awk '{print $2}' |sudo xargs dpkg -P
# 删除docker的相关配置&目录
sudo rm -rf /etc/systemd/system/docker.service.d
sudo rm -rf /var/lib/docker
# 查看docker是否卸载干净
dpkg -l | grep docker
# 更新软件包索引,并且安装必要的依赖软件,来添加一个新的 HTTPS 软件源
sudo apt install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
# 使用下面的 curl 导入源仓库的 GPG key:(root用户下执行)
sudo curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add -
# 将 Docker APT 软件源添加到你的系统:arch后面的值可根据arch命令查看(root用户下执行)
sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get -y update
# 安装docker
sudo apt-get install docker-ce docker-ce-cli containerd.io
# 启动并设置开机自启
sudo systemctl start docker
sudo systemctl enable docker
# 查看docker 版本
docker --version
# 结果
Docker version 20.10.25, build 20.10.25-0ubuntu1~22.04.1
配置containerd
# 默认配置文件位置:/etc/containerd/config.toml
# 生成默认文件(root用户下执行)
sudo containerd config default > /etc/containerd/config.toml
# 修改镜像地址为阿里云地址(61行)
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"
sudo systemctl daemon-reload
sudo service containerd restart
安装 kubectl、kubeadm、kubectl
# 方式一:安装指定版本
sudo apt install -y kubeadm=1.23.5-00 kubelet=1.23.5-00 kubectl=1.23.5-00
# 方式二:不指定版本,默认安装最新版本,安装指定版本用法:kubectl-xxxx,xxx为版本号
sudo apt-get install -y kubectl kubeadm kubelet
# 允许 kubelet 开机自启
sudo systemctl enable kubelet
构建一个集群
使用kubuadm init 命令构建集群
主节点配置
以下操作在主节点集群上运行
指定参数方式
# apiserver-advertise-address 为主节点ip
kubeadm init --apiserver-advertise-address=192.168.13.247 --kubernetes-version=v1.23.10 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 --v=5 --ignore-preflight-errors=all
配置文件方式
# 创建存放配置的文件夹
sudo mkdir -p /etc/kubernetes/1-config
sudo touch /etc/kubernetes/1-config/kubeadm-master.config
sudo chmod 666 /etc/kubernetes/1-config/kubeadm-master.config;
# 文件写入下属内容
sudo bash -c " cat >> /etc/kubernetes/1-config/kubeadm-master.config" << \EOF
# 主节点初始化配置
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
bootstrapTokens:
- token: abcdef.abcdef0123456789
description: 此令牌用于worker节点加入master时使用
ttl: 24h0m0s
nodeRegistration:
name: k8s-master
# 此处填写安装cri的实现,可参考:https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/
criSocket: unix:///run/containerd/containerd.sock
taints: null
imagePullPolicy: IfNotPresent
localAPIEndpoint:
advertiseAddress: 192.168.13.247
bindPort: 6443
---
# 集群配置
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
serviceSubnet: 192.168.13.0/12
dnsDomain: k8s.cluster.local
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kubernetesVersion: 1.28.0
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
#cgroupDriver: cgroupfs
EOF
# 实例化集群
sudo kubeadm init --config /etc/kubernetes/1-config/kubeadm-master.config;
集群命令配置
非root用户
执行下述命令后相关用户可使用kubectl命令
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
root用户
export KUBECONFIG=/etc/kubernetes/admin.conf
重置初始化集群
初始化失败时可执行此命令,还原配置
# 主节点上执行
sudo kubeadm reset
rm -rf $HOME/.kube/config
从节点配置
加入集群
以下命令在从节点执行
# 加入主节点
sudo kubeadm join 192.168.1.5:6443 --token abcdef.abcdef0123456789 \
--discovery-token-ca-cert-hash sha256:35c8485b344fd3e5262d521df1b3a8c3953146c7b95504299bf14d622f9d024f
查看集群节点
kubectl get nodes
# 此时节点已经加入集群,但出于NotReady状态,需要安装网络插件
安装网络插件
此次网络插件使用的是flannel,以下操作在主节点上执行
获取flannel配置文件
直接从网络下载
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
手动编写
此次使用的是v0.22.3版本,可根据实际情况调整
sudo bash -c " cat >> /home/k8s/kube-flannel.yml" << \EOF
---
kind: Namespace
apiVersion: v1
metadata:
name: kube-flannel
labels:
k8s-app: flannel
pod-security.kubernetes.io/enforce: privileged
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
labels:
k8s-app: flannel
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- apiGroups:
- networking.k8s.io
resources:
- clustercidrs
verbs:
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
labels:
k8s-app: flannel
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: flannel
name: flannel
namespace: kube-flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-flannel
labels:
tier: node
k8s-app: flannel
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-flannel
labels:
tier: node
app: flannel
k8s-app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
hostNetwork: true
priorityClassName: system-node-critical
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni-plugin
image: docker.io/flannel/flannel-cni-plugin:v1.2.0
command:
- cp
args:
- -f
- /flannel
- /opt/cni/bin/flannel
volumeMounts:
- name: cni-plugin
mountPath: /opt/cni/bin
- name: install-cni
image: docker.io/flannel/flannel:v0.22.3
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: docker.io/flannel/flannel:v0.22.3
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN", "NET_RAW"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: EVENT_QUEUE_DEPTH
value: "5000"
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
- name: xtables-lock
mountPath: /run/xtables.lock
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni-plugin
hostPath:
path: /opt/cni/bin
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
EOF
执行安装命令
# flannel,需要在kubeadm init 时设置 --pod-network-cidr=10.244.0.0/16
kubectl apply -f kube-flannel.yml
# 再次查看节点状态,已变为Ready
kubectl get nodes
其他
查看初始化节点时生成的token命令
# 主节点上执行
kubeadm token create --print-join-command
卸载k8s
kubeadm reset -f
sudo apt-get purge --auto-remove kubernetes-cni
sudo apt-get purge --auto-remove kubeadm
sudo apt-get purge --auto-remove kubectl
sudo apt-get purge --auto-remove kubelet
modprobe -r ipip
rm -rf ~/.kube/
rm -rf /etc/kubernetes/
rm -rf /etc/systemd/system/kubelet.service.d
rm -rf /etc/systemd/system/kubelet.service
rm -rf /usr/bin/kube*
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/etcd
rm -rf /var/etcd
apt clean all
apt remove kube*
sudo apt-get remove --purge kubernetes-cni
sudo apt-get remove --purge kubeadm
sudo apt-get remove --purge kubectl
sudo apt-get remove --purge kubelet
find / -name kube*
其他问题
服务器启动检查网络耗费时间长的解决办法
# Ubuntu22.04 TLS 开机卡“A start job is running for wait for network to be Configured”
cd /etc/systemd/system/network-online.target.wants/
sudo vi systemd-networkd-wait-online.service
# 添加下属内容
TimeoutStartSec=2sec
# 保存,重启后超过指定时间后自动跳过错误检查
:wq
配置K8S出现以下错误“/proc/sys/net/ipv4/ip_forward contents are not set to 1”
# 将值重置为1
# 临时修改:
#方式1
sudo echo "1" > /proc/sys/net/ipv4/ip_forward
# 方式2:
sudo bash -c "cat > /proc/sys/net/ipv4/ip_forward" << \EOF
1
EOF
# 方式3
sudo sysctl -w net.ipv4.ip_forward=1
# 永久修改
sudo vim /etc/sysctl.conf
# net.ipv4.ip_forward=0改为net.ipv4.ip_forward=1
# 重启网络
sudo service network restart
root用户无法使用ssh登录
# 安装ssh命令:apt-get install openssh-*
# 打开配置:
vim /etc/ssh/sshd_config
# 修改下列配置项后重启ssh服务
PermitRootLogin yes
StrictModes yes
service ssh restart
使用flannel插件后提示无法创建pod,提示network: open /run/flannel/subnet.env:no such file or directory
# 查看节点是否有subnet.env文件,没有的话新增一个
vim /run/flannel/subnet.env
# 写入下属内容,其中的ip和集群初始化时service-cidr一致
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
使用gparted分区后磁盘大小未改变
参考:https://blog.csdn.net/jacky128256/article/details/123442981
# 扩充剩余全部空间给ubuntu--vg-ubuntu--lv
lvextend -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv
# 使扩展生效
resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
docker批量操作
$ docker ps // 查看所有正在运行容器
$ docker stop containerId // containerId 是容器的ID
$ docker ps -a // 查看所有容器
$ docker ps -a -q // 查看所有容器ID
$ docker stop $(docker ps -a -q) // stop停止所有容器
$ docker rm $(docker ps -a -q) // remove删除所有容器
docker清除镜像缓存
# 查看缓存
docker system df
# 清除缓存
docker system prune -a --force
docker配置增加日志大小及日志数量
,"113.57.95.26:7030"
vim /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "30m",
"max-file": "2"
}
}
systemctl daemon-reload
service docker restart
service docker start
k8s拉取镜像时添加凭证
# 语法格式:kubectl --namespace k8s命名空间 create secret docker-registry regcred --docker-server=docker仓库地址 --docker-username=用户名 --docker-password=密码
# 示例:
kubectl --namespace demo-user-dev create secret docker-registry regcred --docker-server=192.168.11.11:7030 --docker-username=docker --docker-password=123456789
临时使用–registry
pnpm --registry https://registry.npm.taobao.org install
关于修改为静态ip后ssh无法连接的问题处理
# 先还原为动态ip
netplan apply
# 在修改为静态ip
netplan apply