How I upgraded my sites hosted on Kubernetes to v13 using Helm Chart

My Setup

  • Separately hosted NFS server, MariaDB and Redis in private network under firewall that allows access to only IPs of k8s nodes.
  • That also means SSH access into above server is only allowed from cluster IPs.
  • All application load is on cluster: erpnext nginx, erpnext python gunicorn, workers, scheduler and frappe socketio

Automatic upgrade

Following may work.

helm upgrade erpnext-stable \
  --namespace erpnext frappe/erpnext \
  -f erpnext-stable-values.yaml \
  --set migrateJob.enable=true

Replace namespace, values.yaml file and release name

The above step takes site database backups.
The manual steps are not taking any backup and depend on the backups taken in above command.

Out of my 10 sites only 3 migrations failed. So the command may just work for someone else.

Manual steps to fix failed migration

  1. Installed fresh new v13 helm release, I moved from the deprecated nfs-client provisioner helm chart to nfs-subdir-external-provisioner and used the new StorageClass in this release. Skip to Step 4 if new release is not going to be installed.
  2. Logged into NFS server so I could have faster access to files.
  3. move the individual site directory from old volume location to new location
  4. exec into new erpnext-python container with bash shell
  5. Once inside container run bench --site site.name.com migrate (FROM sites DIRECTORY ITSELF)
    1. If migration is successful update the service of existing ingress to new service. (not required if new helm release is not created)
    2. If migration fails from the container run bench --site site.name.com console, once things are fixed repeat bench --site site.name.com migrate and hope for success.
  6. Once all sites are moved and migrated delete old helm release. OR set pause_scheduler and maintenance_mode to 0 from common config if new helm release in not installed.

Patches that failed and fixes:

Issue with Customer migration:

There was an issue while migrating the DocType: Customer

console:

l = frappe.get_all("Customer", fields=["name","represents_company"])

for i in l:
    if i.get("represents_company") == '':
        e = frappe.get_doc("Customer", i.get("name"))
        e.represents_company = None
        e.save()

frappe.db.commit()

Problem with Therapy Session DocType from healthcare, during patch execution:

Executing erpnext.patches.v13_0.setup_patient_history_settings_for_standard_doctypes in abc.xyz.com (db_name)

console:

frappe.reload_doc("healthcare", "doctype", "Inpatient Medication Order")
frappe.reload_doc("healthcare", "doctype", "Therapy Session")
frappe.db.commit() # superstition

Paid ERPNext + Kubernetes hacking? castlecraft.in!

3 Likes

Hi @revant_one I don’t know if you have addressed this elsewhere, while trying to install Maridb the wget commad for the mariadb-prod values returns a 404 error.

From https://helm.erpnext.com/prepare-kubernetes/mariadb

What is the right configuration I should be using for frappe?

Values-prod doesn’t exist anymore

https://raw.githubusercontent.com/bitnami/charts/master/bitnami/mariadb/values.yaml this exists.

For frappe specific mariadb configuration refer MariaDB conf for Frappe · frappe/bench Wiki · GitHub

  1. When I have my current site on Ubuntu baremetal, how do I migrate the database and files into newly created Kubernetes cluster?

  2. Have you been able to demonstrate zero time update of the cluster even with database changes?

I’ve my data server separate from kubernetes. I move the backups to data server and restore it like standard db and file restore.

In case you’ve everything in Containers, you’ll have to move local files into pods. Refer kubectl cp. You can also mount volume with backups and restore it.

Whenever there are patches to run there will be down time.

Thank you very much. Let me start the Kubernetes implementation and learning journey!

How is this going?

How is what going?

Sorry, I was responding to @mwogi .

@Obinna_Ukwueze I have had to go through some tutorials on Kubernetes and building images with Docker.
The problem we are trying to solve is first make our deployment easier and secondly ensure that in production, we are actually using resources across our nodes in the cluster (proxmox).

So far so good

2 Likes

@revant_one how to add health care to site? i have frappe bench installed on kubernetes and erpnext site

Build custom image with additional apps

1 Like

@revant_one thanks for answer
but how can i run bench commands on kubernetes ?
i tried to add image to custom-values.yaml like the code below but it didnt work

image:
repository: frappe/health
tag: v15.0.0
pullPolicy: IfNotPresent

I got this error
“frappe/health:v15.0.0”: failed to resolve reference “docker.io/frappe/health:v15.0.0”: pull access denied, repository does not exist or may require authorization

There is no such image. You have to build your custom image.

Run Jobs | Kubernetes to execute bench commands.

1 Like

@revant_one
i tried to create a custom image

From frappe/erpnext
RUN bench get-app healthcare

and pushed it to docker hub then used it in custom-values.yaml file

image:
repository: myimage
tag: latest
pullPolicy: IfNotPresent

every thing worked well but when I created new site i cant find health care app
it is just normal erpnext site

Build images using this documentation frappe_docker/docs/custom-apps.md at main · frappe/frappe_docker · GitHub

add apps to be installed here when you create the site.

1 Like

thanks @revant_one it works well
now im trying to install erpnext 14 image

 image:
  repository: frappe/erpnext
  tag: v14
  pullPolicy: IfNotPresent

facing this error in socketio pod :

Error: connect ECONNREFUSED 127.0.0.1:12311
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1278:16)
Emitted ‘error’ event on RedisClient instance at:
at RedisClient.on_error (/home/frappe/frappe-bench/apps/frappe/node_modules/redis/index.js:342:14)
at Socket. (/home/frappe/frappe-bench/apps/frappe/node_modules/redis/index.js:223:14)
at Socket.emit (node:events:513:28)
at emitErrorNT (node:internal/streams/destroy:157:8)
at emitErrorCloseNT (node:internal/streams/destroy:122:3)
at processTicksAndRejections (node:internal/process/task_queues:83:21) {
errno: -111,
code: ‘ECONNREFUSED’,
syscall: ‘connect’,
address: ‘127.0.0.1’,
port: 12311
}

any suggestions to solve this problem?

Configure bench job should be successfully complete.