Frapp - ERPNext Setup - Helm Based - Created Site disappered after nodes restart

Some Data First

Deployment via Frappe Helm Chart (This is a non-production evaluation setup)

  • All steps were followed as per the document.
  • Post this, created a new site and was able to login into the site with default credentials

The next day, for unknown reasons, the POD’s were restarted and the new website is not available now. I could also see that the nfs server was also restarted.

I am unable to figure out why the site would disappear on pods restart

Sharing more details of cluster

➜  frappe k -n nfs get pods
NAME                                  READY   STATUS    RESTARTS   AGE
in-cluster-nfs-server-provisioner-0   1/1     Running   0          29h
➜  frappe k -n erpnext get pods
NAME                                                   READY   STATUS      RESTARTS      AGE
frappe-bench-erpnext-conf-bench-20250212193545-vjmt5   0/1     Completed   0             38h
frappe-bench-erpnext-gunicorn-c4f7b7787-cqgjm          1/1     Running     0             30h
frappe-bench-erpnext-new-site-20250212213916-qhkd8     0/1     Completed   0             36h
frappe-bench-erpnext-nginx-65fc5ff858-bw6f5            1/1     Running     0             29h
frappe-bench-erpnext-scheduler-868745b846-67f6r        1/1     Running     0             7h10m
frappe-bench-erpnext-socketio-9bbcbfd4f-tvk8z          1/1     Running     1 (29h ago)   29h
frappe-bench-erpnext-worker-d-5bf86cdff-bxjpb          1/1     Running     0             3h5m
frappe-bench-erpnext-worker-l-797479899d-pnd72         1/1     Running     0             7h10m
frappe-bench-erpnext-worker-s-664d87976-77vn2          1/1     Running     2 (38h ago)   38h
frappe-bench-mariadb-0                                 1/1     Running     0             38h
frappe-bench-redis-cache-master-0                      1/1     Running     0             29h
frappe-bench-redis-queue-master-0                      1/1     Running     0             38h

The setup is 38 hours old as of this post and 10 hours post-setup the pods were restarted.

You need to understand and know your way around
Persistent Volume Claim
and how to debug them.

There are some posts in the forum which could get you started, as well as the official kubernetes documentation for these PVCs.

Let them restart, it means healthchecks are working. As long as service is not down and restarts are not caused by any known bug, let them happen. 2 restarts is okay for your test setup. Connect uptime bot to know about site’s uptime instead of checking restarts.

If restarts are due to your heavy usage or special infrastructure, configure better healthchecks.

You can describe pods to know what restarted them.

1 Like