Frappe Docker Single Server or Docker Swarm or Kubernetes? Need Expert guidance

gsarunk · March 15, 2023, 3:56am

We are using Frappe Docker Single Server so far with multiple projects depending on the custom apps.
We create regional servers if customer asks for the same(GDPR compliance) and deploy the sites.

So far we don’t see much problem with this kind of setup.
However we are skeptical about future as our customer base increases we might hit a roadblock.

I have explored other two options

Now I need to make a decision on which path we should choose for our future infrastructure.

Clarifications required from experts for both Docker swarm and K8s.

Can we use the existing custom containers as different project use different custom container currently
can our existing regional single servers can be clustered
Can we use our existing volumes of single server
Can we use Traefik instead of nginx in the case of K8s
What is the migration path…

I have few more queries which I can post based on the guidance i get

Regards,
Arun

revant_one · March 15, 2023, 7:54am

Docker swarm? Read dockerswarm.rocks. In brief, simple to get started and understand.

Kubernetes? It is the go to orchestrator for cloud providers and large enterprises.

Yes you can run the same images that are used on single-server setup on any orchestrator. Read this section frappe_docker/docs/single-compose-setup.md at main · frappe/frappe_docker · GitHub

You can use existing servers for docker swarm cluster or self hosted kubernetes cluster provided servers are stopped and restarted with new setup.

You cannot use existing servers in case of managed kubernetes offering, generally you’ll have to provision nodes from provider’s api to attach them to managed cluster. There may be providers who only provide Kubernetes control-plane and allow you to attach servers from anywhere.

Yes you can, it is easy to replace ingress-nginx and use any ingress controller. It works with istio virtualservice as well.

Backup and restore is the best path for data migration

Take snapshots keep them in shared volume/pvc or push them to s3 to restore later
pull snapshots from s3 or shared volumes and restore
Use maria-backup for fast db backup and restores. If it doesn’t work for you use standard bench backup which internally uses mysqldump.
Use restic for files snapshot / restore
Run containers / pods with mounted shared volumes and access to s3 to execute the restore commands.

Infra progression

Start with Single Server (1 VM)
Move to docker swarm (1 VM), gives you nice portainer ui and web hooks for gitops/ci/automated pipelines.
Upgrade docker swarm to multi vm. Data moved to managed Shared File System and database moved to managed DB (MariaDB conf for Frappe · frappe/bench Wiki · GitHub, DBaaS)
Kubernetes (multi VM), Managed FS and DB. Loadbalancer for ingress.

If you are just using ERPNext, no customization, small setup, use single server.

If you are using development pipelines with staging, uat servers being auto-deployed use docker swarm + portainer.

If you have built SaaS / PaaS app, or you need to scale any of the workers horizontally, use any Kubernetes.

If you have enterprise/compliance/checklists, use managed Kubernetes.

gsarunk · March 15, 2023, 11:08am

Thanks for the detailed clarification @revant_one . Much needed for me at this time. Most importantly for me to know where to invest my time as we don’t want to redo everything again.

Now i understand what we should focus on

Move to docker swarm (1 VM), gives you nice portainer ui and web hooks for gitops/ci/automated pipelines.

Once we are familiar with the concepts we will move on docker swarm multi VM

We wanted to move to Kubernetes at the earliest.

Thanks for your expert advice.

gsarunk · March 16, 2023, 3:36am

@revant_one you are right. Docker Swarm is simple and easy to get started.

I should have done this before. I was assuming it to be hard to implement. But your docs helped me a lot.

for all who are using single server without portainer and swarm, i would sincerely advice to move to this setup immediately. Otherwise you will regret for all your ops effort getting spent on docker compose commands.

We have decided to use portainer and swarm.
Couple more questions:

I understood that bench commands can be executed as a stack. I can create individual stack or play around with environment variables to execute desired bench command. Is there other ways to perform bench operations?
We have decided to use gcp cloud storage for backups as our existing platform is gcp. can we configure custom_containers/erpnext-backup.yml at main · castlecraft/custom_containers · GitHub for gcp cloud storage? Please suggest the parameters to configure both in gcp and in the yml
Is there a restore yml similar to backup.yml from the storage. I tried to explore GitHub - castlecraft/custom_containers and GitHub - frappe/frappe_docker: Docker images for production and development setups of the Frappe framework and ERPNext. I am unable to find suitable example.

revant_one · March 16, 2023, 4:22am

Portainer also help in gitops https://www.youtube.com/watch?v=IZss2CziUnI

For ease and declarative setup I’m adding stacks yaml. You add simple single containers “Tasks”, The interface in portainer needs to be used instead of yaml making it less declarative. Another way to exec into the running container from portainer ui and execute any bench commands without any tracking in yaml or portainer task. In any case make sure you only use the bench commands that don’t change the application code.

Check Interoperability with other storage providers | Cloud Storage | Google Cloud, you’ll need to generate HMAC keys and use it as s3 endpoint. Claves HMAC | Cloud Storage | Google Cloud, After making it S3 compatible it should work as s3, I had used it with py/boto3 before. Restic docs for google cloud https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#google-cloud-storage, check other types of repos if they can be used as alternative.

There is no yaml. It depends on case to case.

if you backup using mariabackup snapshots then restore snapshots
if you backup using mysqldump then restore the sql file backups
if you backup using bench command you can restore using bench command
to restore files from restic use restic restore latest --target . (check restic docs for more)

gsarunk · March 16, 2023, 5:00am

Thanks @revant_one for as usual for the swift response and clear direction.

I will explore on your suggestions and revert in case i am clueless.

Will use this thread with further progress so that other who are with similar need can be benefitted

revant_one · March 16, 2023, 5:26am

If you are exploring the docker swarm alternative, then you can create many other posts with specific questions. Link this post there for reference if you wish.

gsarunk · March 17, 2023, 8:35pm

After reading all your links and with my understanding
I created a bucket my-bucket in GCP
Created a Service account
Created a HMAC
in the backup.yaml i changed the environment values like below

environment:
      - RESTIC_REPOSITORY=s3:https://storage.googleapis.com/my-bucket
      - AWS_ACCESS_KEY_ID=HMAC Access key
      - AWS_SECRET_ACCESS_KEY=HMAC Secret
      - RESTIC_PASSWORD=somePassword

The job failed citing site_config.json error and unable to reach s3:https://storage.googleapis.com/my-bucket. Should i enable public access in GCP?
Where am i going wrong?

revant_one · March 17, 2023, 9:44pm

if s3 api doesn’t work i think restic also has Google cloud specific config.

revant_one · March 18, 2023, 8:15am

If you are looking for HA cluster? Use Kubernetes.

Choose managed Kubernetes, managed FS, managed DB, managed load balancer. You’ll achieve scale and HA with the help of cloud provider. Target users are rich, large, MNC. Business needs to be proven to fund the cloud resources.

Resources at cost are cheaper if you know how to build things from raw material. To go with self managed Kubernetes be prepared to manage much more infrastructure. Managing following infrastructure is out of scope of Frappe Framework and ERPNext.

rook.io or openebs.io or any such project for storage. Needs 4GB+ RAM per node. Turns out to be expensive (management overhead and redundancy resources). Not as expensive as managed google’s storage.
Install MariaDB Galera, on labeled nodes (dedicated part of cluster is galera cluster).
For Ingress setup MetalLB or configure cloud lb if cloud vm are used.
You may also need control-plane LB for multi-server (multi-master) setup.

Check k3s.rocks

gsarunk · March 18, 2023, 9:02am

Sure. Kubernetes is way forward for us. If migrating from docker swarm to K8s is going to be easy, we would prefer adapt K8s at a later period.

Resources at cost are cheaper if you know how to build things from raw material

Rightly said. I am sure there is a lot of learning required.

How about microk8s?

revant_one · March 18, 2023, 9:11am

If it is not managed or if you don’t have OEM support, you can go for any distribution. I’ve used k3s in containers for testing the official helm charts. Check the official helm chart tests. I’ve used kind, k3d (k3s in docker), I tried microk8s as well. All are good.

gsarunk · July 14, 2023, 5:19am

In short. Portainer not listing container from nodes.

I have been testing docker swarm with portainer ui. As per my initial exploration and setup from https://github.com/castlecraft/custom_containers/blob/main/docs/docker-swarm.md, I found docker swarm and portainer as good choice.

for the past few days, i upgraded the swarm with few more nodes. Deployed mariadb stack to a specific node with placement constraints. The deployment was successful as expected.
The problem here is portainer is not listing containers from other nodes.
I can access, stacks networks and services. But not containers.

Is this a limitation of portainer, or i am missing something?

revant_one · July 14, 2023, 6:22am

Check firewall for docker swarm related ports

The following ports must be available. On some systems, these ports are open by default.

 - Port 2377 TCP for communication with and between manager nodes
 - Port 7946 TCP/UDP for overlay network node discovery
 - Port 4789 UDP (configurable) for overlay network traffic

gsarunk · July 14, 2023, 10:24am

Thanks @revant_one, We use GCP. Initially i allowed these ports using firewall rules. Even then we have same issue. Now for testing purpose I allowed all the ports. No improvement in the portainer. Is there something i need to do inside VM

revant_one · July 14, 2023, 10:53am

Manager node ip should not change. The one used to join swarm. The one specified in command docker swarm init --advertise-addr.

From vm check ufw/firewall. It should not be a problem it is disabled by default.

What error do you face? any logs in portainer containers?

gsarunk · July 14, 2023, 11:39am

Manager node ip hasn’t changed. Its a static ip

Below is the message from portainer agent running in the manager node

2023/07/14 11:32:53 http: TLS handshake error from 10.0.2.3:40724: EOF

The portainer agent from the worker node has no specific logs

revant_one · July 14, 2023, 11:54am

Are you able to manage swarm through cli from manager node?

Can you list nodes? docker node ls

Confirm if it’s a problem with swarm or portainer.

gsarunk · July 15, 2023, 5:19am

From CLI i am able to list nodes services and tasks for the nodes

I am not sure how to list the containers for a node from manager and exec bash into it. As docker nodes ps <node_id> only lists the tasks.

From Portainer, all nodes, services and its tasks are listed. Please note the console option is disabled for tasks running from swarm2 node(worker node)

One can observer from below Containers of mariadb-amr_db is not listed in the below image as it belong to the worker node.

revant_one · July 15, 2023, 8:51am

I faced similar situation when,

I connected to wrong IP, While docker swarm init --advertise-addr I used public static ip and used private static ip when I did docker swarm join, or other way private <-> public.
firewall or restriction on port access between nodes