How I upgraded my sites hosted on Kubernetes to v13 using Helm Chart

My Setup

  • Separately hosted NFS server, MariaDB and Redis in private network under firewall that allows access to only IPs of k8s nodes.
  • That also means SSH access into above server is only allowed from cluster IPs.
  • All application load is on cluster: erpnext nginx, erpnext python gunicorn, workers, scheduler and frappe socketio

Automatic upgrade

Following may work.

helm upgrade erpnext-stable \
  --namespace erpnext frappe/erpnext \
  -f erpnext-stable-values.yaml \
  --set migrateJob.enable=true

Replace namespace, values.yaml file and release name

The above step takes site database backups.
The manual steps are not taking any backup and depend on the backups taken in above command.

Out of my 10 sites only 3 migrations failed. So the command may just work for someone else.

Manual steps to fix failed migration

  1. Installed fresh new v13 helm release, I moved from the deprecated nfs-client provisioner helm chart to nfs-subdir-external-provisioner and used the new StorageClass in this release. Skip to Step 4 if new release is not going to be installed.
  2. Logged into NFS server so I could have faster access to files.
  3. move the individual site directory from old volume location to new location
  4. exec into new erpnext-python container with bash shell
  5. Once inside container run bench --site site.name.com migrate (FROM sites DIRECTORY ITSELF)
    1. If migration is successful update the service of existing ingress to new service. (not required if new helm release is not created)
    2. If migration fails from the container run bench --site site.name.com console, once things are fixed repeat bench --site site.name.com migrate and hope for success.
  6. Once all sites are moved and migrated delete old helm release. OR set pause_scheduler and maintenance_mode to 0 from common config if new helm release in not installed.

Patches that failed and fixes:

Issue with Customer migration:

There was an issue while migrating the DocType: Customer

console:

l = frappe.get_all("Customer", fields=["name","represents_company"])

for i in l:
    if i.get("represents_company") == '':
        e = frappe.get_doc("Customer", i.get("name"))
        e.represents_company = None
        e.save()

frappe.db.commit()

Problem with Therapy Session DocType from healthcare, during patch execution:

Executing erpnext.patches.v13_0.setup_patient_history_settings_for_standard_doctypes in abc.xyz.com (db_name)

console:

frappe.reload_doc("healthcare", "doctype", "Inpatient Medication Order")
frappe.reload_doc("healthcare", "doctype", "Therapy Session")
frappe.db.commit() # superstition

Paid ERPNext + Kubernetes hacking? castlecraft.in!

3 Likes

Hi @revant_one I don’t know if you have addressed this elsewhere, while trying to install Maridb the wget commad for the mariadb-prod values returns a 404 error.

From https://helm.erpnext.com/prepare-kubernetes/mariadb

What is the right configuration I should be using for frappe?

Values-prod doesn’t exist anymore

https://raw.githubusercontent.com/bitnami/charts/master/bitnami/mariadb/values.yaml this exists.

For frappe specific mariadb configuration refer MariaDB conf for Frappe · frappe/bench Wiki · GitHub

  1. When I have my current site on Ubuntu baremetal, how do I migrate the database and files into newly created Kubernetes cluster?

  2. Have you been able to demonstrate zero time update of the cluster even with database changes?

I’ve my data server separate from kubernetes. I move the backups to data server and restore it like standard db and file restore.

In case you’ve everything in Containers, you’ll have to move local files into pods. Refer kubectl cp. You can also mount volume with backups and restore it.

Whenever there are patches to run there will be down time.

Thank you very much. Let me start the Kubernetes implementation and learning journey!

How is this going?

How is what going?

Sorry, I was responding to @mwogi .

@Obinna_Ukwueze I have had to go through some tutorials on Kubernetes and building images with Docker.
The problem we are trying to solve is first make our deployment easier and secondly ensure that in production, we are actually using resources across our nodes in the cluster (proxmox).

So far so good

2 Likes