Troubleshooting Windows containers
This chapter is about Windows containers troubleshooting.
SSH to Windows Node
When checking Windows container issues, a common step is RDP to nodes and check component status and logs. You could allocate a public IP to the Node or do a port forwarding from router. But a simpler way is via a RDP service (replace with your own node-ip):
# rdp.yaml apiVersion: v1 kind: Service metadata: name: rdp spec: type: LoadBalancer ports: - protocol: TCP port: 3389 targetPort: 3389 kind: Endpoints apiVersion: v1 metadata: name: rdp subsets: - addresses: - ip: <node-ip> ports: - port: 3389
$ kubectl create -f rdp.yaml $ kubectl get svc rdp NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rdp LoadBalancer 10.0.99.149 220.127.116.11 3389:32008/TCP 5m
Then connect to the node via service rdp's external IP, e.g.
mstsc.exe -v 18.104.22.168.
Don't forget to delete the service after user:
kubectl delete -f rdp.yaml.
Windows Pod stuck in ContainerCreating
Besides reasons introduced in Troubleshooting Pod, there are also other causes including:
- the pause image is misconfigured
- the container image is not compatible with Windows. Note that containers on Windows Server 1709 should use images with 1709 tags, e.g.
Windows Pod failed to resolve DNS
This is a known issue. After Windows Node rebooted, HNS Policy need to be cleaned up (Should do this for each rebooting):
# On Windows Node Start-BitsTransfer -Source https://raw.githubusercontent.com/Microsoft/SDN/master/Kubernetes/windows/hns.psm1 Import-Module .\hns.psm1 Stop-Service kubeproxy Stop-Service kubelet Get-HnsNetwork | ? Name -eq l2Bridge | Remove-HnsNetwork Get-HnsPolicyList | Remove-HnsPolicyList Start-Service kubelet Start-Service kubeproxy
Even with this, kube-dns clusterIP may be still not working. A workaround is configure kube-dns Pod's IP address to normal Pods, e.g.
# In Windows container, e.g. kubectl exec -i -t <pod-name> powershell $adapter=Get-NetAdapter Set-DnsClientServerAddress -InterfaceIndex $adapter.ifIndex -ServerAddresses 10.244.0.2,10.244.0.3 Set-DnsClient -InterfaceIndex $adapter.ifIndex -ConnectionSpecificSuffix "default.svc.cluster.local"
The kube-dns Pod's IP could be got by
$ kubectl -n kube-system describe endpoints kube-dns Name: kube-dns Namespace: kube-system Labels: k8s-app=kube-dns kubernetes.io/cluster-service=true kubernetes.io/name=KubeDNS Annotations: <none> Subsets: Addresses: 10.244.0.2,10.244.0.3 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- dns 53 UDP dns-tcp 53 TCP Events: <none>
If your kubernetes cluster is deployed by acs-engine, then acs-engine#2378 could help to fix this issue (redeploy the cluster with this patch or change existing files according to it).
If kubernetes cluster is running on Azure and is using custom VNET, then the VNET should be attached with route table created by provisioning the cluster：
'..id') az network vnet subnet update -n KubernetesSubnet \ -g acs-custom-vnet \ --vnet-name KubernetesCustomVNET \ --route-table $rtrt=$(az network route-table list -g acs-custom-vnet -o json | jq -r
KubernetesSubnet is the name of the vnet subnet, and
KubernetesCustomVNET is the name of the custom VNET itself.
An example in bash form if the VNET is in a separate ResourceGroup:
'..id') az network vnet subnet update \ -g RESOURCE_GROUP_NAME_VNET \ --route-table $rt \ --ids "/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP_NAME_VNET/providers/Microsoft.Network/VirtualNetworks/KUBERNETES_CUSTOM_VNET/subnets/KUBERNETES_SUBNET"rt=$(az network route-table list -g RESOURCE_GROUP_NAME_KUBE -o json | jq -r
Remote endpoint creation failed: HNS failed with error: The switch-port was not found
This is an error happened in kube-proxy when provisioning load balancer rules for kubernetes services. KB4089848 should be installed to fix this issue:
Start-BitsTransfer http://download.windowsupdate.com/d/msdownload/update/software/updt/2018/03/windows10.0-kb4089848-x64_db7c5aad31c520c6983a937c3d53170e84372b11.msu wusa.exe windows10.0-kb4089848-x64_db7c5aad31c520c6983a937c3d53170e84372b11.msu Restart-Computer
After the Node rebooted, recheck the fix has been installed:
PS C:\k> Get-HotFix Source Description HotFixID InstalledBy InstalledOn ------ ----------- -------- ----------- ----------- 27171k8s9000 Update KB4087256 NT AUTHORITY\SYSTEM 3/22/2018 12:00:00 AM 27171k8s9000 Update KB4089848 NT AUTHORITY\SYSTEM 4/4/2018 12:00:00 AM
If there are still DNS resolve issues, the steps in previous steps should be applied, e.g. restart kubelet/kube-proxy and setup DnsClientServerAddress.
Windows Pod failed to get ServiceAccount Secret
This is also a known issue. There is no workaround for current windows yet, but its fix has been released in Windows 10 Insider and Windows Server Insider builds 17074+.
Windows Pod failed to access Kubernetes API
If you are using a Hyper-V virtual machine, ensure that MAC spoofing is enabled on the network adapter(s).
Windows node cannot access services clusterIP
This is a known limitation of the current networking stack on Windows. Only pods can refer to the Service ClusterIP.