NOMAD Prod Kubernetes Reinstall

By fawzi

December 17, 2018

NOMAD CoE

Reinstalling the Kubernetes Production Cluster

The main reason for this is a critical Kubernetes bug. Here is a bit a log of what I did (to hopefully help me next time). This is mostly taken from kubernetes doc (1, 2) and tweaks I added.

Master node:

Basically the same as the Visualization Setup minus the tainting and coredns editing:

# disable swap:
swapoff -a
# comment out swap lines (the command should do it, but I prefer manual edit)
# sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
vi /etc/fstab

# clean old installation, disable docker
kubeadm reset
systemctl stop kubelet
systemctl disable kubelet
systemctl stop docker
systemctl disable docker

# sources for new version of kubernetes
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF

# Set SELinux in permissive mode (effectively disabling it)
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

# ensure ip tables function
if ! sysctl --system | grep "net.bridge.bridge-nf-call-ip6tables = 1" > /dev/null ; then
   cat <<EOF >  /etc/sysctl.d/90-k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
fi
sysctl --system

# update & install kubernetes
yum install -y docker-ce kubelet kubeadm kubectl --disableexcludes=kubernetes
kubeadm reset # new version might reset better
reboot

systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet
kubeadm init --pod-network-cidr=10.244.0.0/16

Ideally after a node is up and added to the cluster, and coredns works

# get & install flannel
curl https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml > kube-flannel.yml
kubectl create -f kube-flannel.yml

Worker Nodes

For the normal worker nodes the setup operations are

# disable swap:
swapoff -a
# comment out swap lines (the command should do it, but I prefer manual edit)
# sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
vi /etc/fstab

# clean old installation, disable docker
kubeadm reset
systemctl stop kubelet
systemctl disable kubelet
systemctl stop docker
systemctl disable docker

# sources for new version of kubernetes
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF

# Set SELinux in permissive mode (effectively disabling it)
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

# ensure ip tables function
if ! sysctl --system | grep "net.bridge.bridge-nf-call-ip6tables = 1" > /dev/null ; then
   cat <<EOF >  /etc/sysctl.d/90-k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
fi
sysctl --system

# update & install kubernetes
yum install -y docker-ce kubelet kubeadm kubectl --disableexcludes=kubernetes
kubeadm reset # new version might reset better
reboot

systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet
kubeadm join <serviceAddr> --token <token> --discovery-token-ca-cert-hash <hash

Important notes to the above:

disable docker before the update, otherwise it might become unresponsive (pro2 node cough, chough)
reboot is safer after kubeadm reset and ip table fixes (flink-03 node I suspect)
avoid yum update it does update the kernel, and then we lose the gpfs kernel module and thus gpfs (ehm prod1, prod2, flink-01, flink-03):

systemctl status -l gpfs.service
● gpfs.service - General Parallel File System
   Loaded: loaded (/usr/lpp/mmfs/lib/systemd/gpfs.service; enabled; vendor preset: disabled)
   Active: active (exited) since Mon 2018-12-17 16:05:33 CET; 21h ago
 Main PID: 25305 (code=exited, status=0/SUCCESS)
    Tasks: 36
   Memory: 66.8M
   CGroup: /system.slice/gpfs.service
           ├─ 6177 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
           ├─25687 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
           ├─26595 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes no
           └─29399 python /usr/lpp/mmfs/bin/mmsysmon.py

Dec 17 16:05:30 nomad-toolkit-prod1 systemd[1]: Starting General Parallel File System...
Dec 17 16:05:33 nomad-toolkit-prod1 systemd[1]: Started General Parallel File System.
Dec 17 16:05:53 nomad-toolkit-prod1 mmsysmon[29399]: [I] Event raised: The IBM Spectrum Scale monitoring service has been started
Dec 17 16:05:55 nomad-toolkit-prod1 mmsysmon[29399]: [E] Event raised: The Spectrum Scale service process not running on this node. Normal operation cannot be done
Dec 17 16:05:55 nomad-toolkit-prod1 mmsysmon[29399]: [E] Event raised: The node is not able to form a quorum with the other available nodes.
Dec 17 16:05:55 nomad-toolkit-prod1 mmsysmon[29399]: [I] Event raised: All quorum nodes are reachable PC_QUORUM_NODES
Dec 17 16:05:56 nomad-toolkit-prod1 mmsysmon[29399]: [W] Event raised: The filesystem rzg_nomad1 is probably needed, but not mounted

When reinstalling gpfs: use the latest /usr/lpp, reinstall with

for p in $(rpm -qa | grep gpfs); do
  rpm -e $p
done
rpm -ihv *.rpm

might be better than rpm -Uhv *.rpm , and /usr/lpp/mmfs/mmbuildgpl rebuilds the kernel module. More info in IBM’s doc. Thanks Florian for installing the latest patches and make it work on flink-01.