0%

需求,电脑想连2个check point vpn,但是网上客户端支持一个

因此另一个打算用snx加容器进行部署然后在用snell来代理到surge上面进行访问,因此需要制作一个snx加上snell的镜像。

采用snell的原因,比较简单,其他的strongSwan、ss都比较复杂。

编译源码件iexxk/checkpoint-snx-snell

使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#启动容器
docker run --name snx-vpn --cap-add=ALL -p 500:500 -v /lib/modules:/lib/modules -d exxk/checkpoint-snx-snell:latest
#进入容器
docker exec -it snx-vpn bash
#登录vpn,执行下面的命令会提示输入密码,然后提示是否接受,然后输入y
snx -s 服务svn的ip -u 用户名
#查看vpn路由地址
route
Destination Gateway Genmask Flags Metric Ref Use Iface
192.16.192.0 0.0.0.0 255.255.224.0 U 0 0 0 tunsnx
192.16.250.1 0.0.0.0 255.255.255.255 UH 0 0 0 tunsnx
U对应SRC-IP类型
UH对应IP-CIDR类型 255.255.224.0对应192.16.192.0/19 详细见:子网掩码计算
一个255对应8位,三个255就是3*8=24,最后一个不是2个8位+11100000(单个1)转换为十进制就是224
#snx断开
snx -d
#停止容器
docker stop snx-vpn
#删除容器
docker rm snx-vpn

自定义修改配置

修改配置可以修改容器内这个/etc/snell/snell-server.conf配置文件,可以修改psk密码(如果暴露在外一定要修改密码)和端口

客户端配置

配置代理

进入surge-->代理-->策略-->规则判断-->新建一个代理

配置

  • 服务器地址:容器的宿主机地址(本机就127.0.0.1)
  • 端口:默认500
  • PSK:默认2iPRYDZyOVfjRwt9
  • 混淆:TLS

然后根据那些地址自己要走代理的地方设置走该代理即可

配置路由规则

进入surge-->代理-->规则-->新建一个规则,添加容器里面的路由到规则里面

终端使用代理

在当前终端执行export https_proxy=http://127.0.0.1:6152;export http_proxy=http://127.0.0.1:6152;export all_proxy=socks5://127.0.0.1:6153

退出终端失效,测试不能用ping,ping不是http协议和socks5协议,用curl -vv https://www.google.com

终端使用ssh代理

ssh -o "ProxyCommand=nc -X 5 -x 127.0.0.1:6153 %h %p" root@10.1.1.10

royal TSX使用代理

连接右键-->properties-->advanced-->SSH-->Additional SSH Options添加该内容-o "ProxyCommand=nc -X 5 -x 127.0.0.1:6153 %h %p"

常见问题

  1. SNX: Virtual Network Adapter initialization and configuration failed. Try to reconnect.

    解决:apt-get install kmod

  2. SNX: Routing table configuration failed. Try to reconnect.

    解决:docker run --cap-add=ALL -v /lib/modules:/lib/modules -t checkpoint-snx-snell:22.9.21 添加--cap-add=ALL

  3. SNX: Connection aborted.

    解决:去vpn服务器用sifra浏览器访问,下载ssl network extender(linux)版本,然后执行docker cp snx_install.sh snx-vpn:/ 进入容器执行chmod +x snx_install.sh && ./snx_install.sh

    解决2: 升级docker版本之后You’re currently on version 4.11.1 (84025). The latest version is 4.13.1 (90346)也提示该错误,回退版本到4.11.1(84025)解决。

  4. Another session of SNX is already running, aborting…

    解决:执行snx -d断开连接,然后重新连接。

参考

子网掩码计算

snx安装包

snx安装先决条件

primovist/snell.sh

surge-networks/snell

Kedu-SCCL/docker-snx-checkpoint-vpn

用 strongSwan 搭建免证书的 IKEv2 VPN

Docker 的(Linux/Mac OS)网络配置问题

Docker for Mac 的网络问题及解决办法

需求

因为Jenkins经常产生尸体容器,所以需要一个定时任务清理错误的容器

解决方案-采用

原生命令:kubectl get pods -n kubesphere-devops-system |grep Error |awk '{print $1}' |xargs kubectl delete pod -n kubesphere-devops-system

  1. 集群管理—>配置中心—>服务账户—>项目[kubesphere-devops-system]—>创建[test]—>选择管理员权限

    单独创建的原因:defalut的账户没有删除的权限,通过创建账号可以生产token,然后修改权限即可,创建的容器会默认加载defalut的token,但是因为没权限,所以需要自己挂载。

  2. 集群管理—>应用负载—>任务—>定时任务[kubesphere-devops-system]—>创建[jenkins-agent-clean]

    容器镜像配置:

    1
    2
    3
    镜像:bitnami/kubectl:latest
    运行命令:sh
    参数:kubectl get pods -n kubesphere-devops-system |grep Error|awk '{print $1}' |xargs kubectl delete pod -n kubesphere-devops-system

    挂载存储配置:

    1
    2
    3
    密钥:test-token-xxxx
    模式:只读
    挂载目录:/var/run/secrets/kubernetes.io/serviceaccount

生成的完整的配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: batch/v1beta1
kind: CronJob
metadata:
namespace: kubesphere-devops-system
labels:
app: jenkins-agent-clean
name: jenkins-agent-clean
annotations:
kubesphere.io/description: 定时清理jenkins的死掉编译的容器
spec:
concurrencyPolicy: Forbid
jobTemplate:
metadata:
labels:
app: jenkins-agent-clean
spec:
template:
spec:
containers:
- name: container-suzpfl
imagePullPolicy: IfNotPresent
image: 'bitnami/kubectl:latest'
command:
- sh
args:
- '-c'
- >-
kubectl get pods -n kubesphere-devops-system |grep Error|awk
'{print $1}' |xargs kubectl delete pod -n
kubesphere-devops-system
volumeMounts:
- name: volume-sjpdty
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
restartPolicy: Never
serviceAccount: default
initContainers: []
volumes:
- name: volume-sjpdty
secret:
secretName: test-token-f2fxz
imagePullSecrets: null
metadata:
annotations:
logging.kubesphere.io/logsidecar-config: '{}'
schedule: 0 * * * *

旧版安装

loki-stack-2.1.2 会安装有状态副本集loki和守护进程集loki-promtail

1
2
3
helm repo add loki https://grafana.github.io/loki/charts
#默认安装有状态副本集loki和守护进程集loki-promtail,--set grafana.enabled=true额外安装grafana
helm upgrade --install loki loki/loki-stack --version 2.1.2 --set grafana.enabled=true --namespace=kubesphere-loki-system

单独的grafana安装

1
2
helm repo add grafana https://grafana.github.io/helm-charts
helm install my-grafana grafana/grafana

新版安装

Loki-stack-2.6.5

1
2
3
helm upgrade --install loki grafana/loki-stack --set grafana.enabled=true
#查看密码
kubectl get secret --namespace <YOUR-NAMESPACE> loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

配置

Loki的部署如果是emptyDir: {}数据不会持久话,重启会丢失,因此尽量配置nfs目录挂载,重启就不会丢失了。

Grafana Loki 存储保留

查看路径: 平台管理--->集群--->配置中心--->密钥--->[filter]kubesphere-loki-system--->loki--->右侧眼睛(不然是加密的)

修改:在查看路径的基础上 更多操作--->编辑密钥--->loki.yaml--->编辑

Loki.yaml配置详解

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s #为避免查询超过保留期的数据,必须小于或等于table_manager时间值 如果开启24h
compactor:
shared_store: filesystem
working_directory: /data/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
max_transfer_retries: 0
limits_config:
enforce_metric_name: false
reject_old_samples: true #是否拒绝老样本
reject_old_samples_max_age: 168h #168h之前的样本将会被删除
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v11
store: boltdb-shipper
server:
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks # 块存储路径
table_manager: #表管理器
retention_deletes_enabled: false #是否开启删除 如果开启 true
retention_period: 0s #保留时间,0s不保留 , 如果开启 24h

参考

loki 自定义部署配置

常见错误

  1. 启动过报错

    1
    level=error ts=2022-08-10T01:49:48.744952015Z caller=table.go:81 msg="failed to cleanup table" name=index_19214

创建nfs服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
yum install rpcbind nfs-utils #所有机器都要安装,他是通过宿主机来挂载的
systemctl start rpcbind #开启服务
systemctl start nfs-server # 开启服务
systemctl enable rpcbind # 开机启动
systemctl enable nfs-server #开机启动
mkdir -p /share
vim /etc/exports
#添加如下内容rw表示可读可写; no_root_squash的配置可以让任何用户都能访问此文件夹,192.168.4.*不支持,会出现访问拒绝的错
/share 192.168.4.1(rw,no_root_squash)
#加载配置服务
exportfs -rv
#测试挂载
mount -t nfs 192.168.4.2:/share /root/testshare
#删除挂载
umount /root/testshare
#mac测试挂载,在finder按快捷键command+k,输入如下地址
nfs://192.168.4.2/share

排查问题相关命令

1
2
showmount -e localhost #查询本机nfs共享目录情况
showmount -a localhost #查询本机共享目录连接情况

k8s配置使用nfs存储类nfs-client-provisioner

旧版不支持kubernetes 1.20以上版本

1
2
3
helm repo add stable http://mirror.azure.cn/kubernetes/charts
helm install my-release --set nfs.server=192.168.4.2 --set nfs.path=/share stable/nfs-client-provisioner
helm delete my-release #卸载

新版

1
2
3
4
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--set nfs.server=192.168.4.2 \
--set nfs.path=/share

使用

创建pvc,新建nginx-pvc-nfs.yaml 文件内容如下

1
2
3
4
5
6
7
8
9
10
11
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-pvc
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Mi

执行kubectl apply -f nginx-pvc-nfs.yaml ,检查pvc是否创建成功

创建部署,新建nginx-deployment.yaml文件,内容如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: nginx-data
persistentVolumeClaim:
claimName: nginx-pvc
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: nginx-data

执行kubectl apply -f nginx-deployment.yaml

总结

nfs-client-provisioner容器只有在创建pvc的时候会通过该容器来连接管理nfs,pvc创建成功之后就与nfs-client-provisioner无关了,不管他是否还在运行。

nfs的挂载不是在容器内部,还是依赖于宿主机,因此宿主机需要有挂载的依赖等等

查看nfs的挂载情况

可以在pod所在机器执行df -h可以看到类似下面的输出

1
10.25.207.176:/mnt/data/kubesphere-loki-system-loki-storage-pvc-dc2431fb-5352-4265-8a9d-0a11b69d8588  9.8G  1.8G  7.5G   19% /var/lib/kubelet/pods/4d276910-7529-4355-8d92-6cfa1da68825/volumes/kubernetes.io~nfs/pvc-dc2431fb-5352-4265-8a9d-0a11b69d8588

删除

eip-nfs-nfs: 存在于kube-system中,是由 Kubernetes StorageClass 创建的资源,用于管理 NFS 存储。

local-path-provisioner:存在于kube-system中,是一种轻量级的存储提供程序,用于在 Kubernetes 中利用节点的本地存储。它通常用于小型或单节点集群。

问题

  1. nfs-client-provisioner容器报错: unexpected error getting claim reference: selfLink was empty, can't make reference

    原因:kubernetes在1.20版本移除了SelfLink,kubernetes Deprecate and remove SelfLink

    解决:nfs-client-provisioner使用新版

  2. 拉去镜像报错:Back-off pulling image "k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2"

    解决:使用docker代理镜像在线生成地址

  3. 部署nginx时,使用挂载卷,多台时,其中一台报错如下

    1
    2
    3
    4
    异常	FailedMount	2 分钟前
    (近 4 分钟发生 2 次) kubelet Unable to attach or mount volumes: unmounted volumes=[nginx-data], unattached volumes=[nginx-data kube-api-access-vq8pp]: timed out waiting for the condition
    异常 FailedMount 1 秒前
    (近 7 分钟发生 11 次) kubelet MountVolume.SetUp failed for volume "pvc-bb951005-5152-4ab8-ba6c-251d11af5c7a" : mount failed: exit status 32 Mounting command: mount Mounting arguments: -t nfs 192.168.4.2:/share/default-nginx-pvc-pvc-bb951005-5152-4ab8-ba6c-251d11af5c7a /var/lib/kubelet/pods/f274b225-5cec-432b-8b84-35f9355b0486/volumes/kubernetes.io~nfs/pvc-bb951005-5152-4ab8-ba6c-251d11af5c7a Output: mount.nfs: access denied by server while mounting 192.168.4.2:/share/default-nginx-pvc-pvc-bb951005-5152-4ab8-ba6c-251d11af5c7a

    解决:修改/etc/exports添加权限

  4. 客户端连接测试时

    1
    2
    3
    4
    5
    6
    7
    8
    [root@xxx mnt]# mount -t nfs 10.255.7.6:/mnt/data /mnt/test
    mount: 文件系统类型错误、选项错误、10.255.7.6:/mnt/data 上有坏超级块、
    缺少代码页或助手程序,或其他错误
    (对某些文件系统(如 nfs、cifs) 您可能需要
    一款 /sbin/mount.<类型> 助手程序)

    有些情况下在 syslog 中可以找到一些有用信息- 请尝试
    dmesg | tail 这样的命令看看。

    解决:安装nfs客户端:yum install -y nfs-utilssystemctl start nfs-utils

  5. 错误日志如下:

    1
    MountVolume.SetUp failed for volume "pvc-dc2431fb-5352-4265-8a9d-0a11b69d8588" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/bb3fdc10-38de-4437-b8df-81b207e57f1d/volumes/kubernetes.io~nfs/pvc-dc2431fb-5352-4265-8a9d-0a11b69d8588 --scope -- mount -t nfs 10.255.247.176:/mnt/data/kubesphere-loki-system-loki-storage-pvc-dc2431fb-5352-4265-8a9d-0a11b69d8588 /var/lib/kubelet/pods/bb3fdc10-38de-4437-b8df-81b207e57f1d/volumes/kubernetes.io~nfs/pvc-dc2431fb-5352-4265-8a9d-0a11b69d8588 Output: Running scope as unit run-62186.scope. mount: wrong fs type, bad option, bad superblock on 10.255.247.176:/mnt/data/kubesphere-loki-system-loki-storage-pvc-dc2431fb-5352-4265-8a9d-0a11b69d8588, missing codepage or helper program, or other error (for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program) In some cases useful info is found in syslog - try dmesg | tail or so.

    解决:简短错误信息就是mount 32错误,基本就是挂载忙,或者宿主机没有安装nfs-utils,因此集群所有节点最好执行安装yum install -y nfs-utils,不然部署的时候随机换了一台机器就会提示该错误

ios原生工程配置

  1. Hbuilder导入vue项目
  2. 在Hbuilder执行npm install
  3. 导入资源:在Hbuilder点击发行==>原生App-本地打包==>生产本地打包App资源,日志会输出一个项目 'xxx-app'导出成功,路径为:/Users/xx/workspace/xxx-app/dist/resources/__UNI__xxx/www
  4. 下载SDK(最新iOS平台SDK下载)
  5. 解压SDK,然后双击/Users/xxxx/3.4.18/SDK/HBuilder-Hello/HBuilder-Hello.xcodeproj会通过Xcode打开
  6. 复制第3步生成的__UNI__xxx目录到xcode的HBuilder-Hello/HBuilder-Hello/Pandora/apps/,把原来里面那个删除掉
  7. 修改Xcode项目的infoPlist的中英文件的CFBundleDisplayName的值,这个值是app显示的名字
  8. 修改Xcode项目里面的control的appid为打包的目录名。
  9. 修改Xcode项目的general=>TARGETS=>HBuilder=>Identity的四个信息和Hbuilder里面manifest.json里面的基础配置对应。
  10. 安装证书,拿到两个证书文件,双击.p12文件
  11. 修改Xcode项目的Signing=>TARGETS=>HBuilder=>release发布证书的信息,自动签名勾选去掉。(注意证书是发布还是开发,修改对应的)
  12. 修改Xcode项目的info=>TARGETS=>HBuilder=>Custom IOS Target Properties以下配置key是DCLOUD_AD_ID=打包的目录名和dcloud_appkey=申请的appKey
  13. 打包,在Xcode的菜单Product=>Archive点击,这个的前提是选择运行编译的一栏,选择build的Any iOS Device
  14. 打包过程,依次Distribute App=>Enterprise=>Next=>选择证书

参考

Xcode如何配置发布证书

Xcode如何打包苹果安装离线包

Xcode如何配置应用图标

Xcode如何导入Hbuilder项目进行打包

官方kubernetes文档

kubeadm引导集群安装

两种高可用架构

  • 堆叠etcd模式:etcd和控制平面在同一个节点

    堆叠

  • 外部etcd模式:etcd和控制平面分开

对象:简单理解就是yaml文件

对象必须包含的四大字段:

  • apiVersion - 创建该对象所使用的 Kubernetes API 的版本
  • kind - 想要创建的对象的类别
  • metadata - 帮助唯一性标识对象的一些数据,包括一个 name 字符串、UID 和可选的 namespace
  • spec - 你所期望的该对象的状态(node不需要)

对象种类(kind)

namespace(命名空间)

使用 Kubernetes 名字空间来划分集群。

节点(Node)

节点上的组件包括 kubelet容器运行时以及 kube-proxy

pod-最小单元

由一组容器组成,最小的可部署的计算单元。

deployments-部署

一个 Deployment 为 PodReplicaSet 提供声明式的更新能力。

ReplicaSet-副本

维护Pod 副本的稳定集合。 常用来保证给定数量的、完全相同的 Pod 的可用性。一般由Deployment里面定义。

StatefulSet-有状态

管理有状态应用的工作负载 API 对象。为这些 Pod 提供持久存储(pv)和持久标识符。

DaemonSet-守护

确保全部(或者某些)节点上运行一个 Pod 的副本,和pod共存亡,一般用于守护进程。

job-任务

job 会创建一个或者多个 Pod,并将继续重试 Pod 的执行,直到指定数量的 Pod 成功终止。 随着 Pod 成功结束,Job 跟踪记录成功完成的 Pod 个数。 当数量达到指定的成功个数阈值时,任务(即 Job)结束。 删除 Job 的操作会清除所创建的全部 Pod。 挂起 Job 的操作会删除 Job 的所有活跃 Pod,直到 Job 被再次恢复执行。

CronJob-定时任务

CronJob 用于执行周期性的动作,例如备份、报告生成等。 这些任务中的每一个都应该配置为周期性重复的(例如:每天/每周/每月一次); 你可以定义任务开始执行的时间间隔。

ReplicationController-副本

ReplicationController 确保在任何时候都有特定数量的 Pod 副本处于运行状态。现推荐 ReplicaSetDeployment 来建立副本管理机制

Service-暴露服务

将运行在一组 Pods 上的应用程序公开为网络服务的抽象方法。

Ingress-路由

Ingress 是对集群中服务的外部访问进行管理的 API 对象,典型的访问方式是 HTTP。

Ingress 可以提供负载均衡、SSL 终结和基于名称的虚拟托管。

EndpointSlice-

端点切片(EndpointSlices) 提供了一种简单的方法来跟踪 Kubernetes 集群中的网络端点 (network endpoints)。它们为 Endpoints 提供了一种可伸缩和可拓展的替代方案。

PersistentVolumeClaim-PVC

PersistentVolume-PV

StorageClass-存储类

StorageClass 为管理员提供了描述存储 “类” 的方法。 不同的类型可能会映射到不同的服务质量等级或备份策略,或是由集群管理员制定的任意策略。

VolumeSnapshot-卷快照

每个 VolumeSnapshot 包含一个 spec 和一个状态。

VolumeSnapshotContent-预配置快照

VolumeSnapshotClass-卷快照类

VolumeSnapshotClass 提供了一种在配置卷快照时描述存储“类”的方法。

KubeSchedulerConfiguration-调度器配置

pod-缩写po

概念:Pod 类似于共享名字空间和文件系统卷的一组 Docker 容器。
简单部署

文件nginx-pod.yaml

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels: #标签
app: nginx #标签为了后面service的selector
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80

基本操作:

1
2
3
4
kubectl apply -f nginx-pod.yaml #部署pod
kubectl get po #查看pod
kubectl describe pods nginx #查看pod详情
kubectl delete -f nginx-pod.xml #删除

Pod 建议不要直接创建,而是使用工作负载资源创建的。原因:Kubernetes Pods 有确定的生命周期。在pod所在的节点重启之后,pod需要重新创建。

挂载卷

k8s volume挂载卷

service

概念:将运行在一组 Pods 上的应用程序公开为网络服务的抽象方法。
简单部署

文件nginx-service.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
type: NodePort
selector: #选择器
app: nginx #选择标签app=nginx的pod
ports:
- protocol: TCP
port: 9376 #映射端口,可以说是集群内服务端口,默认情况port设置成容器内部端口一致
targetPort: 80 #容器内端口
nodePort: 30000 #对外暴露端口,外网访问端口

基本操作

1
2
3
4
kubectl apply -f nginx-service.yaml #部署服务
kubectl get service #查看服务
kubectl describe service nginx-service #查看服务详情
kubectl delete -f nginx-service.yaml #删除服务

Deployment(工作负载)-缩写deploy

概念:一个 Deployment 为 PodReplicaSet 提供声明式的更新能力。
简单部署

文件nginx-Deployment.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 2 #副本,默认值1
selector: #选择标签,创建后不可改变
matchLabels:
app: nginx #标签
template: #是一个pod的模版
metadata:
labels: #于外面的标签同步设置,要明确指定
app: nginx
spec:
volumes:
- name: nginx-data
persistentVolumeClaim:
claimName: nginx-pvc
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: nginx-data

基本操作

1
2
3
4
5
6
7
8
9
10
11
12
13
kubectl apply -f nginx-deployment.yaml #部署
kubectl get deploy #查看部署或者kubectl get deployment
kubectl describe deploy nginx-deployment #查看部署详情
kubectl delete -f nginx-deployment.yaml #删除部署
kubectl edit deploy/nginx-deployment #编辑部署
kubectl rollout status deployment/nginx-deployment #查看部署状态
kubectl rollout restart deploy/nginx-deployment #重新部署
kubectl get rs #查看部署副本期望与实际
kubectl rollout history deployment/nginx-deployment #查看历史版本
kubectl rollout history deployment/nginx-deployment --revision=2 #查看历史版本2的详细信息
kubectl rollout undo deployment/nginx-deployment #撤销当前版本
kubectl rollout undo deployment/nginx-deployment --to-revision=2 #回滚到指定版本
kubectl scale deployment/nginx-deployment --replicas=10 #缩放副本

CRD

概念定制资源(Custom Resource) 是对 Kubernetes API 的扩展,存取结构化的数据。

Operator

etcd/中文

etcd是一种高度一致的分布式键值存储,它提供了一种可靠的方式来存储需要由分布式系统或机器集群访问的数据。它在网络分区期间优雅地处理领导者选举,并且可以容忍机器故障,即使在领导者节点中也是如此。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#存数据
etcdctl put keyName "I am value"
#获取数据
etcdctl get keyName
#kk安装的etcd环境变量文件
cat /etc/etcd.env
#临时设置别名,简化命令,endpoints要设置所有集群ip才能检测健康状态,不然只检查填写的ip的
alias etcdctl='etcdctl --endpoints=https://192.168.14.16:2379,https://192.168.14.17:2379,https://192.168.14.18:2379 --cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/admin-node1.pem --key=/etc/ssl/etcd/ssl/admin-node1-key.pem'
#查看成员
etcdctl member list
#移除成员
/usr/local/bin/etcdctl --endpoints=https://192.168.14.16:2379,https://192.168.14.17:2379 member list
/usr/local/bin/etcdctl --endpoints=https://192.168.14.16:2379,https://192.168.14.17:2379 member remove cab52e8fded2eefe
#查看集群健康状态
etcdctl endpoint health
#kk查看集群状态命令
export ETCDCTL_API=2;
export ETCDCTL_CERT_FILE='/etc/ssl/etcd/ssl/admin-node1.pem';
export ETCDCTL_KEY_FILE='/etc/ssl/etcd/ssl/admin-node1-key.pem';
export ETCDCTL_CA_FILE='/etc/ssl/etcd/ssl/ca.pem';
/usr/local/bin/etcdctl --endpoints=https://192.168.14.16:2379,https://192.168.14.17:2379 cluster-health

etcd解决替换,可以先remove,然后通过kk安装新的节点。

etcd容器安装 vs 虚拟机master节点安装 vs 独立虚拟机安装

etcd备份

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#查看证书路径
[root@master home]# cat /etc/systemd/system/etcd.service
EnvironmentFile=/etc/etcd.env 可以看到这一行
[root@master home]# cat /etc/etcd.env
# CLI settings
ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-master-key.pem
ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-master.pem
#替换证书进行备份
[root@master home]# ETCDCTL_API=3 etcdctl snapshot save /home/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cert=/etc/ssl/etcd/ssl/admin-master.pem \
--key=/etc/ssl/etcd/ssl/admin-master-key.pem \
--cacert=/etc/ssl/etcd/ssl/ca.pem

MAC安装

1
2
3
pip3 install scrapy #安装scrapy
scrapy startproject testSpider #创建工程
scrapy crawl name #运行

kk多节点安装

准备

  1. 准备一个弹性伸缩,用于管理ECS虚拟机

  2. 伸缩组实例配置好ecs的配置(如果是学习可以采用抢占式虚拟机节约成本)

    Eg: ecs.hfc6.large(ecs.c7a.largeamd也可以了)抢占式2vCPU+4GiB+centos7.9 64为,可以挂载同一个共享数据盘,用于存储配置数据

    配置证书,采用证书cer登录

  3. 端口要求:安全组开放(因为用的一个安全组,所以组内连通策略:组内互通)因此不需要设置

    服务 端口
    ssh 22 TCP
    etcd 2379-2380 TCP
    apiserver 6443 TCP
    calico 9099-9100 TCP
    bgp 179 TCP
    nodeport 30000-32767 TCP
    master 10250-10258 TCP
    dns 53 TCP/UDP
    local-registry(离线环境需要) 5000 TCP
    local-apt(离线环境需要) 5080 TCP
    rpcbind( 使用 NFS 时需要) 111 TCP
    ipip(Calico 需要使用 IPIP 协议) IPENCAP / IPIP
    metrics-server 8443 TCP

只安装kubernetes

1
2
3
4
5
6
7
8
9
10
11
yum update -y
yum install -y socat conntrack ebtables ipset
export KKZONE=cn
#手动下载KK,上传阿里云(命令下载不下来)
./kk create config #生产配置文件config-sample.yaml
#修改config-sample.yaml,配置节点信息,密码采用证书登录模式,配置address
#上传证书,注意将证书改成400只读权限
chmod 400 k8s.cer
export KKZONE=cn
#创建kubernetes集群
./kk create cluster -f config-kubernetes-1.23.7.yaml | tee kk.log

安装kubernetes+kubesphere

1
2
3
yum update -y
./kk create config --with-kubesphere #生产配置文件config-sample.yaml
export KKZONE=cn && ./kk create cluster -f config-kubesphere3.3.0-kubernetes1.23.7.yaml | tee kk.log

添加/删除节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#没有之前的部署文件,通过下面命令生产部署文件,文件名sample.yaml,我修改了名字为add-node.yaml
./kk create config --from-cluster
#修改,完善部署文件(主要是节点ip,负载,etcd),添加新节点的配置
#新节点安装依赖
yum install -y socat conntrack ebtables ipset
#执行添加节点
./kk add nodes -f add-node.yaml
#删除节点名是node4的节点
./kk delete node node4 -f add-node.yaml
#高可用
#1. 删除了node3虚拟机
#2. 添加了一个node4虚拟机
#3. kk的add-node.yaml节点信息修改为node1,node2,node4,(node3的配置删除)注意etcd和master要同步修改最新的节点名称
#4. 删除etcd的错误节点
export ETCDCTL_API=2;
export ETCDCTL_CERT_FILE='/etc/ssl/etcd/ssl/admin-node1.pem';
export ETCDCTL_KEY_FILE='/etc/ssl/etcd/ssl/admin-node1-key.pem';
export ETCDCTL_CA_FILE='/etc/ssl/etcd/ssl/ca.pem';
#查看成员
/usr/local/bin/etcdctl --endpoints=https://192.168.14.16:2379,https://192.168.14.17:2379 member list
#根据成员id,删除成员
/usr/local/bin/etcdctl --endpoints=https://192.168.14.16:2379,https://192.168.14.17:2379 member remove cab52e8fded2eefe
#5. 重新执行添加节点命令
./kk add nodes -f add-node.yaml
#6. 清理不可用的节点,因为连接不上,加上参数--delete-emptydir-data --ignore-daemonsets
kubectl cordon node3
kubectl drain node3 --force --ignore-daemonsets --delete-emptydir-data
kubectl delete node node3
#查看节点,错误节点已删除
kubectl get nodes

添加/删除污点

1
2
3
4
#给节点 node1 增加一个污点,它的键名是 key1,键值是 value1,效果是 NoSchedule。 这表示只有拥有和这个污点相匹配的容忍度的 Pod 才能够被分配到 node1 这个节点。
kubectl taint nodes node1 key1=value1:NoSchedule
#删除污点
kubectl taint nodes node1 key1=value1:NoSchedule-

注意事项

  • 能修改deployment优先修改deployment,如果不能修改再修改pod的配置文件,因为pod修改之后会重启消失。

常见问题

  1. kk安装多节点集群的时候,报如下错误:

    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed.

    解决:amd的主机有问题,换了一个inter芯片的主机就OK了,新解决方法安装前,执行yum update -y

  2. 一个节点宕机后,添加一个新的节点替代,执行kk添加节点命令时报如下错误:

    etcd health check failed: Failed to exec command: sudo -E /bin/bash -c "export ETCDCTL_API=2;export ETCDCTL_CERT_FILE='/etc/ssl/etcd/ssl/admin-node1.pem';export ETCDCTL_KEY_FILE='/etc/ssl/etcd/ssl/admin-node1-key.pem';export ETCDCTL_CA_FILE='/etc/ssl/etcd/ssl/ca.pem';/usr/local/bin/etcdctl --endpoints=https://192.168.14.16:2379,https://192.168.14.17:2379,https://192.168.14.20:2379 cluster-health | grep -q 'cluster is healthy'

    解决:先删除etcd的异常节点,在重新执行kk添加etcd及master节点

  3. 执行kubectl drain node3 --force --ignore-daemonsets --delete-emptydir-data删除节点时报如下错误:

    I0705 16:41:20.004286 18301 request.go:665] Waited for 1.14877279s due to client-side throttling, not priority and fairness, request: GET:https://lb.kubesphere.local:6443/api/v1/namespaces/kubesphere-monitoring-system/pods/alertmanager-main-1

    解决:强制取消,执行kubectl delete node node3即可

  4. 重建节点之后,新加的节点无法调度,报如下错误:

    0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient cpu.

    解决:在集群节点查看污点,然后执行kubectl taint nodes node4 node-role.kubernetes.io/master=:NoSchedule-删除污点

  5. 重建节点之后组件监控prometheus-k8s容器事件提示如下错误:

    MountVolume.NewMounter initialization failed for volume "pvc-60891ee0-ba6c-4df4-b381-6e542b27d3a7" : path "/var/openebs/local/pvc-60891ee0-ba6c-4df4-b381-6e542b27d3a7" does not exist

    解决:在master节点执行,以下方法并不能解决,待验证存储卷是否是分布式的?

    1
    2
    3
    4
    5
    6
    7
    8
    9
    #在/etc/kubernetes/manifests/kube-apiserver.yaml
    #spec:
    # containers:
    # - command:
    # - kube-apiserver
    # - -–feature-gates=RemoveSelfLink=fals #添加该行
    vim /etc/kubernetes/manifests/kube-apiserver.yaml
    #应用配置
    kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.yaml
  6. 使用amd主机安装kubesphere,一直卡在Please wait for the installation to complete:,查看pod日志,发现calico-node-4hgbb的pod提示如下错误:

    1
    2
    3
    4
      Type     Reason     Age                    From     Message
    ---- ------ ---- ---- -------
    Warning Unhealthy 3m18s (x440 over 66m) kubelet (combined from similar events): Readiness probe failed: 2022-07-06 02:39:53.164 [INFO][4974] confd/health.go 180: Number of node(s) with BGP peering established = 2
    calico/node is not ready: felix is not ready: readiness probe reporting 503

    参考: kubernetes v1.24.0 install failed with calico node not ready #1282

    解决:resolved by change calico version, maybe calico verison should update from v3.20.0 to v3.23.0

    1
    2
    3
    4
    5
    6
    7
    8
    9
    #删除calico相关pod
    kubectl -n kube-system get all |grep calico | awk '{print $1}' | xargs kubectl -n kube-system delete
    #获取3.23新版本
    wget https://docs.projectcalico.org/archive/v3.23/manifests/calico.yaml
    #重新安装calico
    kubectl apply -f calico.yaml
    #calico虽然正常了,但是后续重新用kk安装又回回到不正常状态,注意不要修改pod的配置文件,要修改deployment。
    #最终解决在安装集群前执行
    yum update -y
  7. 系统组件->监控->prometheus-k8s->事件->错误日志:0/3 nodes are available: 3 Insufficient cpu.

    解决:修改工作负载->有状态副本集->prometheus-k8s

    总结:requests.cpu设置为0.5代表一个cpu的一半,0.5等价于500m,读做”500 millicpu”(五百毫核)

    官方说明:Kubernetes 中的资源单位

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    #重启之后需要重新修改
    containers:
    - name: prometheus
    resources:
    limits:
    cpu: '4'
    memory: 16Gi
    requests:
    cpu: 200m #修改为20m
    memory: 400Mi #修改为40Mi
  8. 执行kubectl top node提示error: Metrics API not available错误

    解决:1.未安装修改kubesphere部署配置文件,已安装登录kubesphere点击定制资源定义->ClusterConfiguration->ks-installer修改。

    1
    2
    metrics_server:
    enabled: false #设置为true
  9. calico/node is not ready: felix is not ready: readiness probe reporting 503

    再次尝试之后calico/node is not ready: felix is not ready: Get "http://localhost:9099/readiness": dial tcp [::1]:9099: connect: connection refused

  10. 记一次Error from server (BadRequest): container "calico-node" in pod "calico-node-bfmgs" is waiting to start: PodInitializing

    解决及排查过程:(参考:Kubernetes Installation Tutorial: Kubespray)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    [root@master ~]# kubectl get nodes
    master Ready control-plane 158d v1.26.4
    node-102 NotReady control-plane 42h v1.26.4
    ....
    [root@master ~]# kubectl get pods -o wide -n kube-system
    calico-node-2zbtg 0/1 Init:CrashLoopBackOff
    [root@master ~]# kubectl describe pod -n kube-system calico-node-2zbtg
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning BackOff 15s (x7 over 79s) kubelet Back-off restarting failed container install-cni in pod calico-node-2zbtg_kube-system(337004cf-9136-48ac-bc6b-eb897bd2c806)
    [root@master ~]# kubectl logs -n kube-system calico-node-2zbtg
    Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init), flexvol-driver (init)
    Error from server (BadRequest): container "calico-node" in pod "calico-node-2zbtg" is waiting to start: PodInitializing
    # --------- 删除pod会马上重启一个 -----------------------------------------------------------
    [root@master ~]# kubectl delete pod -n kube-system calico-node-2zbtg
    # ----------- 在对应节点通过时间找到尸体容器 ---------------------------------------------------
    [root@node-102 ~]# crictl ps -a
    fc19864603510 628dd70880410 About a minute ago Exited install-cni
    [root@node-102 ~]# crictl logs fc19864603510
    time="2024-01-14T12:26:57Z" level=info msg="Running as a Kubernetes pod" source="install.go:145"
    2024-01-14 12:26:57.761 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/bandwidth"
    .....
    2024-01-14 12:26:57.964 [INFO][1] cni-installer/<nil> <nil>: CNI plugin version: v3.24.5

    2024-01-14 12:26:57.964 [INFO][1] cni-installer/<nil> <nil>: /host/secondary-bin-dir is not writeable, skipping
    W0114 12:26:57.964140 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
    time="2024-01-14T12:26:57Z" level=info msg="Using CNI config template from CNI_NETWORK_CONFIG_FILE" source="install.go:340"
    time="2024-01-14T12:26:57Z" level=fatal msg="open /host/etc/cni/net.d/calico.conflist.template: no such file or directory" source="install.go:344"
    // ----得到精准的错误信息 open /host/etc/cni/net.d/calico.conflist.template: no such file or directory
    // 这个文件找不到,就从master节点拷贝一个过来
    [root@node-102 ~]# cd /etc/cni/net.d/
    [root@node-102 net.d]# ls
    calico-kubeconfig
    # -----------------------------拷贝到有问题的节点之后么,删除pod,加速重启 ----------------------------
    [root@master net.d]# kubectl delete pod calico-node-bfmgs -n kube-system
    #-----------------------------再次查看就已经有了,而且节点加入成功,各项pod正常------------------------
    [root@node-102 net.d]# ls
    10-calico.conflist calico.conflist.template calico-kubeconfig
  11. k8s启动服务提示Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5bc537d4925604f98d12ec576b90eeee0534402c6fb32fc31920a763051e6589": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized

    解决:原因服务器时间不对,导致授权异常,排查那个节点的时间不对,重新同步时间,然后重启对应节点的calico服务

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    [root@node-102 ~]# timedatectl
    Local time: 一 2024-01-15 20:33:23 CST
    Universal time: 一 2024-01-15 12:33:23 UTC
    RTC time: 一 2024-01-15 06:43:54
    Time zone: Asia/Shanghai (CST, +0800)
    NTP enabled: yes
    NTP synchronized: no
    RTC in local TZ: no
    DST active: n/a
    [root@node-102 ~]# chronyc makestep
    200 OK
    [root@master ~]# kubectl get pods -o wide -n kube-system
    calico-node-rlxg5 1/1 Running 0 26h 172.16.10.192 node-192
    [root@master ~]# kubectl delete pod -n kube-system calico-node-rlxg5
  12. 记一次,服务之间通过servername无法访问提示Caused by: java.net.UnknownHostException: system-business

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    [root@master ~]# kubectl get pods -o wide -n kube-system
    nodelocaldns-g72zt 0/1 CrashLoopBackOff 17 (39s ago) 63m 172.16.10.4 master <none> <none>
    #查看该pod日志
    [root@master ~]# crictl ps -a | grep nodelocaldns
    7ae6e0fc97f0b 9eaf430eed843 46 seconds ago Exited node-cache 19 c8ccd1afaa13e nodelocaldns-g72zt
    [root@master ~]# crictl logs 7ae6e0fc97f0b
    2024/04/25 09:22:45 [INFO] Starting node-cache image: 1.22.18
    2024/04/25 09:22:45 [INFO] Using Corefile /etc/coredns/Corefile
    2024/04/25 09:22:45 [INFO] Using Pidfile
    2024/04/25 09:22:46 [ERROR] Failed to read node-cache coreFile /etc/coredns/Corefile.base - open /etc/coredns/Corefile.base: no such file or directory
    2024/04/25 09:22:46 [INFO] Skipping kube-dns configmap sync as no directory was specified
    .:53 on 169.254.25.10
    cluster.local.:53 on 169.254.25.10
    in-addr.arpa.:53 on 169.254.25.10
    ip6.arpa.:53 on 169.254.25.10
    [INFO] plugin/reload: Running configuration SHA512 = aa809f767f97014677c4e010f69a19281bea2a25fd44a8c9172f6f43db27a70080deb3a2add822c680f580337da221a7360acac898a1e8a8827a7bda80e00c2d
    CoreDNS-1.10.0
    linux/amd64, go1.18.10,
    [FATAL] plugin/loop: Loop (169.254.25.10:34977 -> 169.254.25.10:53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 1324972288110025575.747110785225940346."
    #其他两个节点的提示的其他错误
    #节点B
    .....
    CoreDNS-1.10.0
    linux/amd64, go1.18.10,
    [ERROR] plugin/errors: 2 helm.yangcoder.online. A: read udp 169.254.25.10:56963->169.254.25.10:53: i/o timeout
    #节点C
    [ERROR] plugin/errors: 2 47.30.16.172.in-addr.arpa. PTR: read tcp 10.233.0.3:47246->10.233.0.3:53: i/o timeout
    [ERROR] plugin/errors: 2 47.30.16.172.in-addr.arpa. PTR: read tcp 10.233.0.3:47252->10.233.0.3:53: i/o timeout

    分析原因:

    1. 公司部分网络出现访问慢,出现过114.114.114.114的dns不通,怀疑是公司网络导致的

      初步排查:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      三天节点的dns都不一样:
      [root@master ~]# cat /etc/resolv.conf
      # Generated by NetworkManager
      search default.svc.cluster.local svc.cluster.local
      nameserver 169.254.25.10
      nameserver 183.221.253.100 #ping Destination Host Unreachable
      nameserver 114.114.114.114 #ping Destination Host Unreachable
      [root@node-192 ~]# cat /etc/resolv.conf
      # Generated by NetworkManager
      search default.svc.cluster.local svc.cluster.local
      nameserver 169.254.25.10
      nameserver 183.221.253.100 #ping icmp_seq=1 Destination Host Unreachable
      nameserver 172.16.10.49 #ping 通
      [root@node-102 ~]# cat /etc/resolv.conf
      # Generated by NetworkManager
      search default.svc.cluster.local svc.cluster.local
      nameserver 169.254.25.10
      nameserver 172.16.10.63 #ping 通
      nameserver 192.168.100.254 #ping 通

      应该是主节点的dns都无法访问,导致主节点的nodelocaldns无法正常启动,导致和其他节点dns失联(待验证)

    解决:

附录

config-kubernetes-1.23.7.yaml部署文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: node1, address: 192.168.14.1, internalAddress: 192.168.14.1, user: root, privateKeyPath: "/exxk/k8s.cer"}
- {name: node2, address: 192.168.14.2, internalAddress: 192.168.14.2, user: root, privateKeyPath: "/exxk/k8s.cer"}
- {name: node3, address: 192.168.14.3, internalAddress: 192.168.14.3, user: root, privateKeyPath: "/exxk/k8s.cer"}
roleGroups:
etcd:
- node[1:3]
master:
- node[1:3]
worker:
- node[1:3]
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
# internalLoadbalancer: haproxy

domain: lb.kubesphere.local
address: "192.168.14.1"
port: 6443
kubernetes:
version: v1.23.7
clusterName: cluster.local
autoRenewCerts: true
containerManager: docker
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
registry:
privateRegistry: ""
namespaceOverride: ""
registryMirrors: []
insecureRegistries: []
addons: []

config-kubesphere3.3.0-kubernetes1.23.7.yaml部署文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: node1, address: 192.168.14.16, internalAddress: 192.168.14.16, user: root, privateKeyPath: "/exxk/k8s.cer"}
- {name: node2, address: 192.168.14.17, internalAddress: 192.168.14.17, user: root, privateKeyPath: "/exxk/k8s.cer"}
- {name: node3, address: 192.168.14.18, internalAddress: 192.168.14.18, user: root, privateKeyPath: "/exxk/k8s.cer"}
roleGroups:
etcd:
- node[1:3]
control-plane:
- node[1:3]
worker:
- node[1:3]
controlPlaneEndpoint:
# Internal loadbalancer for apiservers
# internalLoadbalancer: haproxy
domain: lb.kubesphere.local
address: "192.168.14.16"
port: 6443
kubernetes:
version: v1.23.7
clusterName: cluster.local
autoRenewCerts: true
containerManager: docker
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
registry:
privateRegistry: ""
namespaceOverride: ""
registryMirrors: []
insecureRegistries: []
addons: []

---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.3.0
spec:
persistence:
storageClass: ""
authentication:
jwtSecret: ""
zone: ""
local_registry: ""
namespace_override: ""
# dev_tag: ""
etcd:
monitoring: false
endpointIps: localhost
port: 2379
tlsEnable: true
common:
core:
console:
enableMultiLogin: true
port: 30880
type: NodePort
# apiserver:
# resources: {}
# controllerManager:
# resources: {}
redis:
enabled: false
volumeSize: 2Gi
openldap:
enabled: false
volumeSize: 2Gi
minio:
volumeSize: 20Gi
monitoring:
# type: external
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
GPUMonitoring:
enabled: false
gpu:
kinds:
- resourceName: "nvidia.com/gpu"
resourceType: "GPU"
default: true
es:
# master:
# volumeSize: 4Gi
# replicas: 1
# resources: {}
# data:
# volumeSize: 20Gi
# replicas: 1
# resources: {}
logMaxAge: 7
elkPrefix: logstash
basicAuth:
enabled: false
username: ""
password: ""
externalElasticsearchHost: ""
externalElasticsearchPort: ""
alerting:
enabled: false
# thanosruler:
# replicas: 1
# resources: {}
auditing:
enabled: false
# operator:
# resources: {}
# webhook:
# resources: {}
devops:
enabled: false
# resources: {}
jenkinsMemoryLim: 2Gi
jenkinsMemoryReq: 1500Mi
jenkinsVolumeSize: 8Gi
jenkinsJavaOpts_Xms: 1200m
jenkinsJavaOpts_Xmx: 1600m
jenkinsJavaOpts_MaxRAM: 2g
events:
enabled: false
# operator:
# resources: {}
# exporter:
# resources: {}
# ruler:
# enabled: true
# replicas: 2
# resources: {}
logging:
enabled: false
logsidecar:
enabled: true
replicas: 2
# resources: {}
metrics_server:
enabled: false
monitoring:
storageClass: ""
node_exporter:
port: 9100
# resources: {}
# kube_rbac_proxy:
# resources: {}
# kube_state_metrics:
# resources: {}
# prometheus:
# replicas: 1
# volumeSize: 20Gi
# resources: {}
# operator:
# resources: {}
# alertmanager:
# replicas: 1
# resources: {}
# notification_manager:
# resources: {}
# operator:
# resources: {}
# proxy:
# resources: {}
gpu:
nvidia_dcgm_exporter:
enabled: false
# resources: {}
multicluster:
clusterRole: none
network:
networkpolicy:
enabled: false
ippool:
type: none
topology:
type: none
openpitrix:
store:
enabled: false
servicemesh:
enabled: false
istio:
components:
ingressGateways:
- name: istio-ingressgateway
enabled: false
cni:
enabled: false
edgeruntime:
enabled: false
kubeedge:
enabled: false
cloudCore:
cloudHub:
advertiseAddress:
- ""
service:
cloudhubNodePort: "30000"
cloudhubQuicNodePort: "30001"
cloudhubHttpsNodePort: "30002"
cloudstreamNodePort: "30003"
tunnelNodePort: "30004"
# resources: {}
# hostNetWork: false
iptables-manager:
enabled: true
mode: "external"
# resources: {}
# edgeService:
# resources: {}
terminal:
timeout: 600

add-node.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
##You should complete the ssh information of the hosts
- {name: node1, address: 192.168.14.16, internalAddress: 192.168.14.16, user: root, privateKeyPath: "/exxk/k8s.cer"}
- {name: node2, address: 192.168.14.17, internalAddress: 192.168.14.17, user: root, privateKeyPath: "/exxk/k8s.cer"}
- {name: node3, address: 192.168.14.18, internalAddress: 192.168.14.18, user: root, privateKeyPath: "/exxk/k8s.cer"}
- {name: node4, address: 192.168.14.19, internalAddress: 192.168.14.19, user: root, privateKeyPath: "/exxk/k8s.cer"}
roleGroups:
etcd:
- node[1:3]
master:
- node1
- node2
- node3
worker:
- node1
- node2
- node3
- node4
controlPlaneEndpoint:
##Internal loadbalancer for apiservers
#internalLoadbalancer: haproxy

##If the external loadbalancer was used, 'address' should be set to loadbalancer's ip.
domain: lb.kubesphere.local
address: "192.168.14.16"
port: 6443
kubernetes:
version: v1.23.7
clusterName: cluster.local
proxyMode: ipvs
masqueradeAll: false
maxPods: 110
nodeCidrMaskSize: 24
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
registry:
privateRegistry: ""