K3s小型集群搭建

k3s节点设计

节点 系统 资源 类型 ip
exxk Ubuntu server 20.04 4c/8g内存/100g硬盘/4g显存 工作+控制+cuda 172.16.80.80
node1 fedora-coreos-41 2c/4g内存/20g硬盘 工作+控制 172.16.80.163
node2 fedora-coreos-41 2c/4g内存/20g硬盘 工作+控制 172.16.80.179
nfs alpine3.20 1c/512m内-1g交/100g硬盘 nfs 172.16.80.144

k3s各节点安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# -----------------------节点Node1虚拟机-----------------------
core@node1:~$ sudo hostnamectl set-hostname node1
core@node1:~$ sudo reboot
core@node1:~$ sudo yum install -y curl
core@node1:~$ curl -sfL https://get.k3s.io | K3S_TOKEN=EXXK_SECRET sh -s - server --cluster-init
core@node1:~$ sudo reboot
core@node1:~$ sudo kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,etcd,master 119s v1.30.6+k3s1
#如果容器挂载失败时,需要安装
core@node1:~$ sudo rpm-ostree install nfs-utils
# -----------------------节点Node2虚拟机-----------------------
core@node2:~$ sudo hostnamectl set-hostname node2
core@node2:~$ sudo reboot
core@node2:~$ sudo yum install -y curl
core@node2:~$ curl -sfL https://get.k3s.io | K3S_TOKEN=EXXK_SECRET sh -s - server --server https://172.16.80.163:6443
core@node1:~$ sudo reboot
core@node2:~$ sudo kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,etcd,master 17m v1.30.6+k3s1
node2 Ready control-plane,etcd,master 34s v1.30.6+k3s1
#如果容器挂载失败时,需要安装
core@node2:~$ sudo rpm-ostree install nfs-utils
# -----------------------节点exxk虚拟机-----------------------
exxk@exxk:~$ curl -sfL https://get.k3s.io | K3S_TOKEN=EXXK_SECRET sh -s - server --server https://172.16.80.163:6443
exxk@exxk:~$ sudo reboot
exxk@exxk:~$ sudo kubectl get nodes
[sudo] password for exxk:
NAME STATUS ROLES AGE VERSION
exxk Ready control-plane,etcd,master 3m4s v1.30.6+k3s1
node1 Ready control-plane,etcd,master 25m v1.30.6+k3s1
node2 Ready control-plane,etcd,master 8m19s v1.30.6+k3s1
#如果容器挂载失败时,需要安装
exxk@exxk:~$ sudo apt install nfs-common
# 安装显卡驱动,这里已知安装的版本,详情见:https://www.iexxk.com/2024/11/19/ubuntu-intall-cuda/
exxk@exxk:~$ sudo add-apt-repository ppa:graphics-drivers/ppa
exxk@exxk:~$ sudo apt update
exxk@exxk:~$ sudo apt install -y nvidia-driver-560 --no-install-recommends
exxk@exxk:~$ sudo reboot
exxk@exxk:~$ nvidia-smi #验证
# 安装 NVIDIA Container Toolkit见https://www.iexxk.com/2024/11/19/ubuntu-intall-cuda/
#注意下下面这句用containerd
exxk@exxk:~$ sudo nvidia-ctk runtime configure --runtime=containerd
exxk@exxk:~$ sudo reboot #重启
#验证
exxk@exxk:~$ sudo sudo ctr run --rm --gpus 0 docker.io/nvidia/cuda:11.8.0-base-ubuntu20.04 bash
nvidia-smi #执行
Fri Dec 13 10:05:41 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:00:10.0 Off | N/A |
| N/A 45C P8 3W / 60W | 2MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
exit #退出
#安装k8s-device-plugin插件见:https://www.iexxk.com/2024/12/17/k8s-use-cuda/#方式一:helm命令方式-成功
# -----------------------nfs虚拟机-----------------------
# 安装参考:https://wiki.alpinelinux.org/wiki/Setting_up_an_NFS_server
nfs:~# apk add nfs-utils
nfs:~# mkdir /nfsdata
nfs:/nfs# vi /etc/exports
#增加如下内容
/nfsdata 172.16.80.0/24(rw,nohide,no_subtree_check,no_root_squash)
nfs:~# rc-update add nfs
nfs:~# rc-service nfs start

客户端使用

kubeconfig获取,可以从其中一个主节点sudo cat /etc/rancher/k3s/k3s.yaml拷贝或下载下来这个文件,修改里面的server ip为节点外网的ip,然后保存。

Lens

  1. 下载mac版本
  2. Lens客户端打开,点击左侧菜单Local KubeConfigs上面的+号,然后导入kubeconfig

kuboard

  1. 我这里直接用的另一个集群的kuboard,就么有安装,直接导入即可。
  2. 登录进入Home Page->Add kubernetes,填入kebeconfig配置。