Describeing the faulty pod (frappe-bench-erpnext-socketio-746c69bcd4-czjkl) gives:
Warning Unhealthy 6m28s (x3 over 6m48s) kubelet Liveness probe failed: dial tcp 172.16.0.84:9000: connect: connection refused
Normal Killing 6m3s kubelet Container socketio failed liveness probe, will be restarted
Warning Unhealthy 5m48s (x8 over 6m48s) kubelet Readiness probe failed: dial tcp 172.16.0.84:9000: connect: connection refused
Normal Started 5m47s (x2 over 6m58s) kubelet Started container socketio
Normal Pulled 5m38s (x3 over 6m59s) kubelet Container image "frappe/frappe-socketio:v14.13.0" already present on machine
Normal Created 5m38s (x3 over 6m59s) kubelet Created container socketio
At the moment other pods are restarting and are faulty, but I am not going to show them now, because they are probably failing because the previously mentioned one fails.
And after waiting about half and hour, the “steady” state of get pods looks like this:
# kubectl logs -n erpnext -f frappe-bench-erpnext-conf-bench-20221103212650-rwjdx
Defaulted container "configure" out of: configure, frappe-bench-ownership (init)
failed to create fsnotify watcher: too many open files
I am not if k3d environment is different from vanilla Kubernetes on Hetzner.
Probably to confirm that, I need to deploy a completely fresh and new cluster and try to deploy ERPNext on it. To be honest, I don’t think this is going to make any difference, because my current environment is controller and nothing “unuseful” is deployed on it. Though, I will try to deploy a completely new K8s cluster and try if it is going to work.
All worker pods (that are shown as failed) show the following (or very similar);
root@control-plane-01:~# kubectl logs -n erpnext frappe-bench-erpnext-worker-s-d4f7c8fbb-h2sfq
Defaulted container "short" out of: short, populate-assets (init)
15:13:12 Worker rq:worker:0dea130994db435bb0946302f797d00c.frappe-bench-erpnext-worker-s-d4f7c8fbb-h2sfq.7.home-frappe-frappe-bench:short: started, version 1.10.1
15:13:12 Subscribing to channel rq:pubsub:0dea130994db435bb0946302f797d00c.frappe-bench-erpnext-worker-s-d4f7c8fbb-h2sfq.7.home-frappe-frappe-bench:short
15:13:12 *** Listening on home-frappe-frappe-bench:short...
15:13:12 Cleaning registries for queue: home-frappe-frappe-bench:short
The scheduler pod shows no logs at all, just empty.
The events in describe of scheduler shows:
Normal Pulling 17m kubelet Pulling image "frappe/erpnext-worker:v14.5.1"
Normal Pulled 17m kubelet Successfully pulled image "frappe/erpnext-worker:v14.5.1" in 30.178706426s
Normal Created 17m (x3 over 17m) kubelet Created container scheduler
Normal Started 17m (x3 over 17m) kubelet Started container scheduler
Warning BackOff 14m (x9 over 17m) kubelet Back-off restarting failed container
Normal Pulled 4m30s (x34 over 17m) kubelet Container image "frappe/erpnext-worker:v14.5.1" already present on machine
The events in describe of one worker shows:
Normal Pulled 19m kubelet Successfully pulled image "frappe/erpnext-nginx:v14.5.1" in 16.703634324s
Normal Created 19m kubelet Created container populate-assets
Normal Started 19m kubelet Started container populate-assets
Normal Created 19m (x2 over 19m) kubelet Created container short
Normal Started 18m (x2 over 19m) kubelet Started container short
Warning BackOff 18m kubelet Back-off restarting failed container
Normal Pulled 93s (x57 over 19m) kubelet Container image "frappe/erpnext-worker:v14.5.1" already present on machine
I cannot see what the can be. @revant_one am I missing anything?
I still have, hopefully, a final issue in this phase. When I try to access erpnext over ingress or kubectl port-forward, I still get 404 error. Do I still need to do any further steps? So apparently I can access the ingress controller, but there is no backend from the side of ERPNext?
By the way, why do I have an nginx pod in erpnext namespace? Could that be the issue? Do I really need it? Because I am running my own ingress container.
I tried to create site as shown here:
but still have the same 404 error.
I sent you the kubeconfng (admin.conf) as a message.
I am just curious to know if it is “normal” that some pods keeps restarting (and after a lot of restarts recreated) until some when they succeed. Apparently these pods keeps restarting until other pods are ready.
This whole process with some not necessary restart takes about 5 minutes. Can’t these pods wait until the other pods are ready? Or is it absolutely not an issue that they are just restarted/recreated until later in the future they work?