NOMAD Dev Kubernetes Extension
By fawzi
Dev Kubernetes reinstall
The main reason for this is a critical Kubernetes bug. Here is a bit a log of what I did (to hopefully help me next time).
Development cluster extension
I installed a new cluster on labdev3, mostly following what I did last time. But labdev4 was having issues, so I asked a clean image. It took a bit to have it up. Here is the process to add it to the Kubernetes cluster. Strangely, it was CentOS 7.4 (all other machines are 7.6)
The machine had still the volume for the thinpool device mapper, but no docker. I did try to
yum install docker
systemctl enable docker.service
systemctl start docker.service
restablish the old /etc/docker/daemon.json
{
"storage-driver": "devicemapper",
"storage-opts": [
"dm.fs=xfs",
"dm.thinpooldev=/dev/mapper/docker-thinpool",
"dm.use_deferred_removal=true",l
"dm.use_deferred_deletion=true",
"dm.basesize=15G"
],
"group": "dockerroot" # this last line added down below due to the ownership problem
}
but it did fail:
systemctl status -l docker.service
Dec 17 12:29:48 labdev4-nomad dockerd-current[21024]: unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: storage-driver: (from flag: overlay2, from file: devicemapper)
overlay2 should be more efficient, and we could probably try to use it in production now (starting with Docker 17.08 which should be Docker Engine 1.13), but I do not want to expriment now, next time maybe
So I disabled (commented out) overlay2 from /etc/sysconfig/docker-storage /etc/sysconfig/docker-storage-setup
Dec 17 12:43:17 labdev4-nomad dockerd-current[22079]: time="2018-12-17T12:43:17.220707453+01:00" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
add "group": "dockerroot"
to /etc/docker/daemon.json
Dec 17 12:43:18 labdev4-nomad dockerd-current[22079]: Error starting daemon: error initializing graphdriver: devmapper: Unable to take ownership of thin-pool (docker-thinpool) that already has used data blocks
rebuild lvm thinpool
lvremove docker/thinpool
sudo lvcreate --wipesignatures y -n thinpool docker -l 95%VG
sudo lvcreate --wipesignatures y -n thinpoolmeta docker -l 1%VG
sudo lvconvert -y --zero n -c 512K --thinpool docker/thinpool --poolmetadata docker/thinpoolmeta
… and finally docker works :)
Following https://kubernetes.io/docs/setup/independent/install-kubeadm/
on the master kubeadm token create
on the new node kubeadm join 130.183.207.100:6443 –token <new_token> –discovery-token-ca-cert-hash <discovery_sha>
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
labdev3-nomad Ready master 9d v1.13.0
labdev4-nomad NotReady <none> 14s v1.13.1
woops we waited too much and a new version of kubernetes became available. Does a cluster with mixed versions create problems? Let’s not find out:
kubectl drain labdev4-nomad --delete-local-data --force --ignore-daemonsets
kubectl delete node labdev4-nomad
on the new node
yum remove kubelet kubeadm kubectl --disableexcludes=kubernetes
yum install kubelet-1.13.0 kubeadm-1.13.0 kubectl-1.13.0 --disableexcludes=kubernetes
kubeadm join 130.183.207.100:6443 --token <new_token> --discovery-token-ca-cert-hash <discovery_sha>
and $ kubectl get nodes NAME STATUS ROLES AGE VERSION labdev3-nomad Ready master 9d v1.13.0 labdev4-nomad NotReady 10s v1.13.0
and a bit later $ kubectl get node NAME STATUS ROLES AGE VERSION labdev3-nomad Ready master 10d v1.13.0 labdev4-nomad Ready 44m v1.13.0
all is fine… or at least hopefully so