ERPNext High Availability Reference Architecture

shareeef · April 7, 2021, 8:03pm

I didn’t see any reference documentation that talks about ErpNext high availability. Is there any documentation available to separate components(redis, mariadb, web app, etc) in different nodes.

Please let me know if any of you are working on multi nodes. Thanks.

Shinzuco · April 7, 2021, 9:13pm

This reply might be too simplistic but there options are implementing primary/ secondary setup for the db, creating a load balanced VM group, or docker orchestration.

Personally prefer the load balanced VM option ( since I can’t get docker orchestration to work with my preferred cloud provider.) Will switch the day @revant word a manual for kubernetes on azure.

shareeef · April 7, 2021, 9:19pm

I am experimenting with docker locally but I still feel it is flaky. I will try Kubernetes. I think MariaDb is only challenge with respect to container implementation. Maybe taking it out will make it more manageable.

Shinzuco · April 7, 2021, 9:25pm

It’s actually not with most providers, with azure it is, with azure you need to change the erpnext code base a bit, which I wasn’t able to figure out.

Shinzuco · April 7, 2021, 9:29pm

Docker itself works wonderfully well on localhost, but when you go for orchestration, it’s a bit tricky and the documentation seems to be outdated. One of the key requirements (as per docs) has been deprecated and like I said ,I hope @revant_one can find some time to update the same.

Kudos to @revant_one because this is the first time I tried docker and it works so well for me.

revant_one · April 8, 2021, 8:09am

You need to understand how Frappe/ERPNext containers work in case debugging is required.

Container can take any db host that is mentioned in common_site_config.json or site_config.json.

The frappe_docker/docker-compose.yml for docker-compose up -d adds all the containers in single file for easy of use. It is not mandatory to use that docker-compose.yml for advance cases.

For AWS Aurora (MySQL) its the same case, check this gist, I execute sed command to replace erpnext code as a part of job that does the site creation.
After site creation, all seems to work with existing code on AWS Aurora.

gist.github.com

https://gist.github.com/revant/1328e3367c3042fd91c447859fb23dd1#file-create-site-yaml

AWS-EKS-Fargate-MySQLAurora-Elasticache.md

### Prerequisites

- EKS Fargate profiles for `<fargate-profile-namespace>`. [Guide](https://docs.aws.amazon.com/eks/latest/userguide/fargate-getting-started.html)
- EKS Fargate coredns profile for coredns pods to run (create profile for `kube-system` namespace, refer guide from previous step)
- EFS CSI Driver installed, mount points set and pv.yaml created. Refer [Guide](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html)
- EKS ALB Controller, refer [Guide](https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html).

### Install Helm

Refer [Helm Installation](https://helm.sh/docs/intro/install/) to install helm command

This file has been truncated. show original

ECRImagePullSecrets.md

Note: Use this in pipelines along with aws cli.

```shell
login_cmd=$(aws ecr get-login --no-include-email --region ap-southeast-1 | sed 's;https://;;g')
username=$(echo $login_cmd | cut -d " " -f 4)
password=$(echo $login_cmd | cut -d " " -f 6)
endpoint=$(echo $login_cmd | cut -d " " -f 7)

# Delete old secret
kubectl -n <fargate-profile-namespace> delete secret aws-ecr-registry

This file has been truncated. show original

create-site.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: create-new-site-<site.name.com>
spec:
  backoffLimit: 1
  template:
    spec:
      containers:
      - name: create-site

This file has been truncated. show original

There are more than three files. show original

I’ll update docs soon. Currently it mentions the nfs-server helm chart which is deprecated.

Instead you need to use

Manually deploy it, Helm Chart will redirect you back to the deprecated chart.

check these resources from tests, helm/tests at main · frappe/helm · GitHub.

It uses all the non deprecated services. The tests are running using k3d.

revant_one · April 8, 2021, 8:42am

updated docs

https://github.com/frappe/helm/pull/80

Shinzuco · April 8, 2021, 9:45am

Awesome. Checking out the new docs

But with the Azure managed db, there is another issue that pops up.

Azure requires the username to be in the format username@hostname in the connection string. While frappe only uses the username part to create default databases, and for further communications. I think the corrections need to be done in database.py and Azure DB for mariadb can be used seamlessly.

revant_one · April 8, 2021, 9:47am

did you check the create-site.yaml from the gist?

bench new-site $SITE_NAME --no-mariadb-socket --db-name $DB_NAME --db-password $DB_PASSWORD --mariadb-root-username $DB_ROOT_USER --mariadb-root-password $MYSQL_ROOT_PASSWORD --admin-password $ADMIN_PASSWORD --install-app erpnext;

try substituting the environment variables with custom username, root user, etc.

Note that --db-name $DB_NAME is also username.

Shinzuco · April 8, 2021, 10:03am

I haven’t tried this in AWS but did try that a simple Azure DB. I also hope i have understood your code corerectly.

It throws an when frappe or erpnext tries to create a db with the same name that a db with name xyz@abc cannot be created.

The auzre db hostname is usually like dumbhostnameyougave.microsoft.azure.something.
The user id required is username@dumbhostnameyougave.

So the db connections succeeds but frappe proceeds to create a new database which is the same as the username and this creates an error because of the @. Tried escaping it but, no go.

JohnGwinner · March 14, 2022, 5:47pm

I’m interested in HA as well, including multiple front end servers.

Getting DB’s to work when not on the local machine does take some manual configuration, you have to be careful with “GRANTS”. We have that all worked out, but had a server crash through a mistake recently, would like to have a clustered front end (“multiple runtimes”) so that doesn’t happen.

== John ==

revant_one · March 14, 2022, 6:38pm

for general update on topic.

the updated helm chart has all the processes decoupled into separate pods. even the nginx and gunicorn are 2 separate pods now.

with the updated helm chart you can schedule different types of deployment/pods on different types of node pools using affinity in values.yaml