我目前正在完成学士学位的学习实习,我的项目正在做 CI/CD 管道,所以我对 DevOps 有点陌生,所以我希望有人能给我一些宝贵的时间来帮助我。使用 NAT 网络在 vmware 上设置 3 个 ubuntu 服务器实例后,这里列出了我使用 kubeadm 设置具有 1 个主节点和 2 个工作节点的 kubernetes 集群所执行的命令

apt install docker.io -y
sudo swapoff -a
nano /etc/fstab
#ive commented the swap line here
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
sudo systemctl enable --now kubelet
#then on my master node ive executed the kubeadm init and did the export config thing and joined the worker nodes using the cmd the master gave me 
sudo kubeadm init

前 10 分钟一切正常,我加入了安装了 calico 等的节点,然后每当我尝试执行“kubectl getnodes”时,我都会收到错误
The connection to the server 192.168.149.141:6443 was refused - did you specify the right host or port?

我尝试重新启动 docker 和 kubelet 服务,它在再次崩溃之前又解决了 10 分钟的问题,还有另一件事我注意到一些 Pod 不断重新启动,因为你可以看到 CrashLoopBackOff ,它不是性能问题,因为当我安装指标时,Cpu 和 ram 使用情况是所有节点上低于 40%

这是 kubelet 的日志:

我的 kube-apiserver.yaml

  GNU nano 6.2                                                                                   kube-apiserver.yaml                                                                                             apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.149.141:6443
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=192.168.149.141
    - --allow-privileged=true
    - --authorization-mode=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --enable-admission-plugins=NodeRestriction
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
    - --requestheader-allowed-names=front-proxy-client
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --requestheader-extra-headers-prefix=X-Remote-Extra-
    - --requestheader-group-headers=X-Remote-Group
    - --requestheader-username-headers=X-Remote-User
    - --secure-port=6443
    - --service-account-issuer=https://kubernetes.default.svc.cluster.local
    - --service-account-key-file=/etc/kubernetes/pki/sa.pub
    - --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=10.96.0.0/12
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    image: registry.k8s.io/kube-apiserver:v1.29.3
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 192.168.149.141
      httpGet:
        host: 192.168.149.141
        path: /livez
        port: 6443
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: kube-apiserver
    readinessProbe:
      failureThreshold: 3
      httpGet:
        host: 192.168.149.141
        path: /readyz
        port: 6443
        scheme: HTTPS
      periodSeconds: 1
      timeoutSeconds: 15
    resources:
      requests:
        cpu: 250m
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 192.168.149.141
        path: /livez
        port: 6443
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/ssl/certs
      name: ca-certs
      readOnly: true
    - mountPath: /etc/ca-certificates
      name: etc-ca-certificates
      readOnly: true
    - mountPath: /etc/pki
      name: etc-pki
      readOnly: true
    - mountPath: /etc/kubernetes/pki
      name: k8s-certs
      readOnly: true
    - mountPath: /usr/local/share/ca-certificates
      name: usr-local-share-ca-certificates
      readOnly: true
    - mountPath: /usr/share/ca-certificates
      readOnly: true
    - mountPath: /usr/share/ca-certificates
      name: usr-share-ca-certificates
      readOnly: true
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/ssl/certs
      type: DirectoryOrCreate
    name: ca-certs
  - hostPath:
      path: /etc/ca-certificates
      type: DirectoryOrCreate
    name: etc-ca-certificates
  - hostPath:
      path: /etc/pki
      type: DirectoryOrCreate
    name: etc-pki
  - hostPath:
      path: /etc/kubernetes/pki
      type: DirectoryOrCreate
    name: k8s-certs
  - hostPath:
      path: /usr/local/share/ca-certificates
      type: DirectoryOrCreate
    name: usr-local-share-ca-certificates
  - hostPath:
      path: /usr/share/ca-certificates
      type: DirectoryOrCreate
    name: usr-share-ca-certificates
status: {}

我准备提供更多信息


2 个回答
2

经过更多研究和检查后发现,kubernetes 删除了对 docker 作为容器运行时的支持,因此我使用 cri-o 重新初始化了我的集群,一切正常,按照此博客进行安装:

您可以尝试执行以下步骤来缩小导致问题的组件范围:

  1. 登录其中之一Node(192.168.149.141),尝试从 localhost:6443 连接到 API 服务器

    这应该可以帮助您验证 API 服务器是否无法正常工作或其网络相关。

  2. 在 (192.168.149.141) 上打开一个nc -v -l 50000进程Node,并连接到Nodevia nc -v 192.168.149.141 50000,并尝试发送消息以查看远程是否Node收到它们。

    如果是网络相关问题,这将帮助您确定是否与 NAT(VMWare) 相关或与路由器相关。

  3. 当您尝试连接时,您可能想查看 Api 服务器的日志,看看它是否可以提供任何有用的信息。

如果你不耐烦,可以尝试使用Rancher(或RKE)来部署集群,而不是kubeadmin,它对初学者更友好。

1

  • 您好,感谢您尝试帮助我。阅读多篇文章后,我发现问题出在 docker 运行时,因为 kubernetes 已弃用它,我尝试使用 cri-dockerd 但没有帮助,所以我迁移到 cri-o一切似乎又恢复正常了,非常感谢


    –