请稍等 ...
×

采纳答案成功!

向帮助你的同学说点啥吧!感谢那些助人为乐的人

在k8s中部署服务,pod间通信出现问题

我的yaml文件是这样写的,docker版本是19.03.10,k8s版本是1.19.3,用的是calico提供网络。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: amf
spec:
  selector:
    matchLabels:
      app: amf
  replicas: 1
  template:
    metadata:
      labels:
        app: amf
      annotations:
        cni.projectcalico.org/ipAddrs: "[\"10.100.200.132\"]"
    spec:
      nodeSelector:
        type: k8smec
      containers:
      - image: amf-k8s:v3
        imagePullPolicy: Never
        name: amf
        securityContext:
          privileged: true
        command: ["/bin/bash"]
        args: ["-c", "/openair-amf/bin/oai_amf -c /openair-amf/etc/amf.conf -o"]

每一个微服务我都写了这样的yaml文件,apply之后微服务有部分不能正常运行,查看pod中运行日志看到:

what():  Cannot resolve a DNS name resolve: Host not found (authoritative)

此时pod中/etc/resolv.conf默认为:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

*但是我在yaml文件中加了修改 /etc/resolv.conf的部分,微服务就可以正常运行了,我不太理解为什么要这样。192.168.10.是我本机的网段。

        volumeMounts:
        - mountPath: /etc/resolv.conf
          name: amf-volume   
      volumes:
      - name: amf-volume
        hostPath:    
          path: /home/k8s-mec/resolv.conf  

文件的内容是:

nameserver 192.168.10.1
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

解决完上面的问题后,虽然都能running了,且我的镜像都是正确的,但是微服务放到k8s中之后,pod间不能正常通信,就是某一个微服务不能注册到另一个微服务,获取不到响应,我不知道是微服务迁移进k8s的哪里的配置出了问题。

[2022-05-09T10:35:25.789748] [spgwu] [spgwu_app] [info ] Send NF Instance Registration to NRF
[2022-05-09T10:35:25.890115] [spgwu] [spgwu_app] [warn ] Could not get response from NRF

更新我的一些测试:
这些都是正常的

sudo kubectl get pods -n kube-system

NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-659bd7879c-46n8c   1/1     Running   0          9d
calico-node-44vcq                          1/1     Running   0          9d
calico-node-sz6vf                          1/1     Running   0          9d
calico-node-vq8mh                          1/1     Running   0          9d
coredns-f9fd979d6-6kcxk                    1/1     Running   0          96m
coredns-f9fd979d6-9rzbl                    1/1     Running   0          96m
etcd-k8smec                                1/1     Running   1          9d
kube-apiserver-k8smec                      1/1     Running   0          8d
kube-controller-manager-k8smec             1/1     Running   6          9d
kube-proxy-bwl58                           1/1     Running   1          9d
kube-proxy-fwpck                           1/1     Running   1          9d
kube-proxy-n7s5v                           1/1     Running   1          9d
kube-scheduler-k8smec                      1/1     Running   6          9d
metrics-server-v0.5.1-544f94d7bf-4m6w4     2/2     Running   3          8d

一些k8s的配置文件:

sudo cat /var/lib/kubelet/config.yaml

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

sudo cat /run/systemd/resolve/resolv.conf

# This file is managed by systemd-resolved(8). Do not edit.
#
# Third party programs must not access this file directly, but
# only through the symlink at /etc/resolv.conf. To manage
# resolv.conf(5) in a different way, replace the symlink by a
# static file or a different symlink.

nameserver 8.8.8.8
nameserver 114.114.114.114

sudo cat /etc/resolv.conf 

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 8.8.8.8
nameserver 114.114.114.114

我的集群情况

sudo kubectl get nodes -o wide

NAME     STATUS   ROLES    AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
node2    Ready    <none>   9d    v1.19.3   192.168.10.11    <none>        Ubuntu 16.04.7 LTS   4.15.0-142-generic   docker://19.3.10
master   Ready    master   9d    v1.19.3   192.168.10.139   <none>        Ubuntu 16.04.7 LTS   4.15.0-142-generic   docker://19.3.10
node1    Ready    <none>   9d    v1.19.3   192.168.10.201   <none>        Ubuntu 16.04.6 LTS   4.15.0-142-generic   docker://19.3.10

我在master节点运行的结果是:

sudo calicoctl node status

Calico process is running.

IPv4 BGP status
+----------------+-------------------+-------+------------+-------------+
|  PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |    INFO     |
+----------------+-------------------+-------+------------+-------------+
| 192.168.10.201 | node-to-node mesh | up    | 2022-05-09 | Established |
| 192.168.10.11  | node-to-node mesh | up    | 2022-05-12 | Established |
+----------------+-------------------+-------+------------+-------------+

IPv6 BGP status
No IPv6 peers found.

node节点运行的结果是:

sudo calicoctl node status


Calico process is running.

IPv4 BGP status
+----------------+-------------------+-------+------------+-------------+
|  PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |    INFO     |
+----------------+-------------------+-------+------------+-------------+
| 192.168.10.139 | node-to-node mesh | up    | 2022-05-12 | Established |
| 192.168.10.201 | node-to-node mesh | up    | 2022-05-12 | Established |
+----------------+-------------------+-------+------------+-------------+

IPv6 BGP status
No IPv6 peers found.

图片描述

图片描述

图片描述

以上分别是master node1 node2的calico日志截图,没有看到问题所在

我查看kube-proxy的日志,我看到了一个w,难道是这里的问题,我暂时没有找到这个报错的解决办法,老师能帮我看一下吗

k8s-mec@k8smec:~$ sudo kubectl logs -f kube-proxy-fwpck  -n kube-system
I0509 07:15:49.186836       1 node.go:136] Successfully retrieved node IP: 192.168.10.139
I0509 07:15:49.187344       1 server_others.go:111] kube-proxy node IP is an IPv4 address (192.168.10.139), assume IPv4 operation
W0509 07:15:52.858142       1 server_others.go:579] Unknown proxy mode "", assuming iptables proxy
I0509 07:15:52.858279       1 server_others.go:186] Using iptables Proxier.
I0509 07:15:52.859119       1 server.go:650] Version: v1.19.3
I0509 07:15:52.859767       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0509 07:15:52.859820       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0509 07:15:52.859927       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0509 07:15:52.860019       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0509 07:15:52.861412       1 config.go:315] Starting service config controller
I0509 07:15:52.861442       1 shared_informer.go:240] Waiting for caches to sync for service config
I0509 07:15:52.861446       1 config.go:224] Starting endpoint slice config controller
I0509 07:15:52.861468       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0509 07:15:52.961679       1 shared_informer.go:247] Caches are synced for service config 
I0509 07:15:52.961706       1 shared_informer.go:247] Caches are synced for endpoint slice config 
I0518 08:56:57.879820       1 trace.go:205] Trace[1789821911]: "iptables Monitor CANARY check" (18-May-2022 08:56:52.868) (total time: 5009ms):
Trace[1789821911]: [5.009166574s] [5.009166574s] END
W0518 08:56:57.879839       1 iptables.go:568] Could not check for iptables canary mangle/KUBE-PROXY-CANARY: exit status 4

#########

老师我的calico的日志放到我的博客里了因为太多,在下面超链接,可以帮我看一下嘛,我没有看出问题。

正在回答 回答被采纳积分+3

2回答

刘果国 2022-05-20 09:35:37

calico启动日志不全。可以把pod kill掉触发重启,从头看完整日志。应该有发现邻居的部分,我记得是会打印出所有邻居的ip列表。

0 回复 有任何疑惑可以回复我~
  • 提问者 qq_慕前端3164363 #1
    老师我更新了我的问题,在最后面放了我的日志,可以帮我看看吗
    回复 有任何疑惑可以回复我~ 2022-05-21 16:48:08
  • 刘果国 回复 提问者 qq_慕前端3164363 #2
    calico启动日志没问题,试试pod ip是不是通的,同主机、跨主机
    回复 有任何疑惑可以回复我~ 2022-05-22 09:19:04
  • 提问者 qq_慕前端3164363 回复 刘果国 #3
    pod之间都是可以ping通的,但是有的pod DNS解析正常,有的不正常,不知道为什么会出现这种情况
    回复 有任何疑惑可以回复我~ 2022-06-22 14:45:16
刘果国 2022-05-10 10:27:59

确认下dns服务是否正常,就是10.96.0.10的服务。它首先会解决svc name的dns服务,如果找不到,会走它所在主机的dns

0 回复 有任何疑惑可以回复我~
  • 提问者 qq_慕前端3164363 #1
    确认下dns服务是否正常,就是10.96.0.10的服务 这个应该如何有效检查
    回复 有任何疑惑可以回复我~ 2022-05-18 17:33:13
  • 提问者 qq_慕前端3164363 #2
    老师我又重新编辑了一下我的问题,加入了一些测试,请老师帮我看看,我的错误还没有解决
    回复 有任何疑惑可以回复我~ 2022-05-18 17:48:58
  • 刘果国 回复 提问者 qq_慕前端3164363 #3
    你现在有两个问题,一个是pod间网络不通,一个是dns。别一起搞,要先把注意力放到pod间网络上。从calico的状态看,肯定是有问题。你有三台服务器应该每个都要显示三个点mesh,你都是俩俩mesh,仔细看看calico日志
    回复 有任何疑惑可以回复我~ 2022-05-19 09:14:43
问题已解决,确定采纳
还有疑问,暂不采纳
意见反馈 帮助中心 APP下载
官方微信