Frappe Docker Single Server or Docker Swarm or Kubernetes? Need Expert guidance

Docker swarm? Read dockerswarm.rocks. In brief, simple to get started and understand.

Kubernetes? It is the go to orchestrator for cloud providers and large enterprises.

Yes you can run the same images that are used on single-server setup on any orchestrator. Read this section frappe_docker/docs/single-compose-setup.md at main · frappe/frappe_docker · GitHub

You can use existing servers for docker swarm cluster or self hosted kubernetes cluster provided servers are stopped and restarted with new setup.

You cannot use existing servers in case of managed kubernetes offering, generally you’ll have to provision nodes from provider’s api to attach them to managed cluster. There may be providers who only provide Kubernetes control-plane and allow you to attach servers from anywhere.

Yes you can, it is easy to replace ingress-nginx and use any ingress controller. It works with istio virtualservice as well.

Backup and restore is the best path for data migration

  • Take snapshots keep them in shared volume/pvc or push them to s3 to restore later
  • pull snapshots from s3 or shared volumes and restore
  • Use maria-backup for fast db backup and restores. If it doesn’t work for you use standard bench backup which internally uses mysqldump.
  • Use restic for files snapshot / restore
  • Run containers / pods with mounted shared volumes and access to s3 to execute the restore commands.

Infra progression

  • Start with Single Server (1 VM)
  • Move to docker swarm (1 VM), gives you nice portainer ui and web hooks for gitops/ci/automated pipelines.
  • Upgrade docker swarm to multi vm. Data moved to managed Shared File System and database moved to managed DB (MariaDB conf for Frappe · frappe/bench Wiki · GitHub, DBaaS)
  • Kubernetes (multi VM), Managed FS and DB. Loadbalancer for ingress.

If you are just using ERPNext, no customization, small setup, use single server.

If you are using development pipelines with staging, uat servers being auto-deployed use docker swarm + portainer.

If you have built SaaS / PaaS app, or you need to scale any of the workers horizontally, use any Kubernetes.

If you have enterprise/compliance/checklists, use managed Kubernetes.

5 Likes

Thanks for the detailed clarification @revant_one . Much needed for me at this time. Most importantly for me to know where to invest my time as we don’t want to redo everything again.

Now i understand what we should focus on

  • Move to docker swarm (1 VM), gives you nice portainer ui and web hooks for gitops/ci/automated pipelines.

Once we are familiar with the concepts we will move on docker swarm multi VM

We wanted to move to Kubernetes at the earliest.

Thanks for your expert advice.

1 Like

@revant_one you are right. Docker Swarm is simple and easy to get started.

I should have done this before. I was assuming it to be hard to implement. But your docs helped me a lot.

for all who are using single server without portainer and swarm, i would sincerely advice to move to this setup immediately. Otherwise you will regret for all your ops effort getting spent on docker compose commands.

We have decided to use portainer and swarm.
Couple more questions:

Portainer also help in gitops https://www.youtube.com/watch?v=IZss2CziUnI

For ease and declarative setup I’m adding stacks yaml. You add simple single containers “Tasks”, The interface in portainer needs to be used instead of yaml making it less declarative. Another way to exec into the running container from portainer ui and execute any bench commands without any tracking in yaml or portainer task. In any case make sure you only use the bench commands that don’t change the application code.

Check Interoperability with other storage providers  |  Cloud Storage  |  Google Cloud, you’ll need to generate HMAC keys and use it as s3 endpoint. Claves HMAC  |  Cloud Storage  |  Google Cloud, After making it S3 compatible it should work as s3, I had used it with py/boto3 before. Restic docs for google cloud https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#google-cloud-storage, check other types of repos if they can be used as alternative.

There is no yaml. It depends on case to case.

  • if you backup using mariabackup snapshots then restore snapshots
  • if you backup using mysqldump then restore the sql file backups
  • if you backup using bench command you can restore using bench command
  • to restore files from restic use restic restore latest --target . (check restic docs for more)

Thanks @revant_one for as usual for the swift response and clear direction.

I will explore on your suggestions and revert in case i am clueless.

Will use this thread with further progress so that other who are with similar need can be benefitted

If you are exploring the docker swarm alternative, then you can create many other posts with specific questions. Link this post there for reference if you wish.

1 Like

After reading all your links and with my understanding
I created a bucket my-bucket in GCP
Created a Service account
Created a HMAC
in the backup.yaml i changed the environment values like below

environment:
      - RESTIC_REPOSITORY=s3:https://storage.googleapis.com/my-bucket
      - AWS_ACCESS_KEY_ID=HMAC Access key
      - AWS_SECRET_ACCESS_KEY=HMAC Secret
      - RESTIC_PASSWORD=somePassword

The job failed citing site_config.json error and unable to reach s3:https://storage.googleapis.com/my-bucket. Should i enable public access in GCP?
Where am i going wrong?

if s3 api doesn’t work i think restic also has Google cloud specific config.

If you are looking for HA cluster? Use Kubernetes.

Choose managed Kubernetes, managed FS, managed DB, managed load balancer. You’ll achieve scale and HA with the help of cloud provider. Target users are rich, large, MNC. Business needs to be proven to fund the cloud resources.

Resources at cost are cheaper if you know how to build things from raw material. To go with self managed Kubernetes be prepared to manage much more infrastructure. Managing following infrastructure is out of scope of Frappe Framework and ERPNext.

  • rook.io or openebs.io or any such project for storage. Needs 4GB+ RAM per node. Turns out to be expensive (management overhead and redundancy resources). Not as expensive as managed google’s storage.
  • Install MariaDB Galera, on labeled nodes (dedicated part of cluster is galera cluster).
  • For Ingress setup MetalLB or configure cloud lb if cloud vm are used.
  • You may also need control-plane LB for multi-server (multi-master) setup.

Check k3s.rocks

1 Like

Sure. Kubernetes is way forward for us. If migrating from docker swarm to K8s is going to be easy, we would prefer adapt K8s at a later period.

Resources at cost are cheaper if you know how to build things from raw material

Rightly said. I am sure there is a lot of learning required.

How about microk8s?

If it is not managed or if you don’t have OEM support, you can go for any distribution. I’ve used k3s in containers for testing the official helm charts. Check the official helm chart tests. I’ve used kind, k3d (k3s in docker), I tried microk8s as well. All are good.

In short. Portainer not listing container from nodes.

I have been testing docker swarm with portainer ui. As per my initial exploration and setup from https://github.com/castlecraft/custom_containers/blob/main/docs/docker-swarm.md, I found docker swarm and portainer as good choice.

for the past few days, i upgraded the swarm with few more nodes. Deployed mariadb stack to a specific node with placement constraints. The deployment was successful as expected.
The problem here is portainer is not listing containers from other nodes.
I can access, stacks networks and services. But not containers.

Is this a limitation of portainer, or i am missing something?

Check firewall for docker swarm related ports

The following ports must be available. On some systems, these ports are open by default.

 - Port 2377 TCP for communication with and between manager nodes
 - Port 7946 TCP/UDP for overlay network node discovery
 - Port 4789 UDP (configurable) for overlay network traffic

Thanks @revant_one, We use GCP. Initially i allowed these ports using firewall rules. Even then we have same issue. Now for testing purpose I allowed all the ports. No improvement in the portainer. Is there something i need to do inside VM

Manager node ip should not change. The one used to join swarm. The one specified in command docker swarm init --advertise-addr.

From vm check ufw/firewall. It should not be a problem it is disabled by default.

What error do you face? any logs in portainer containers?

Manager node ip hasn’t changed. Its a static ip

Below is the message from portainer agent running in the manager node

2023/07/14 11:32:53 http: TLS handshake error from 10.0.2.3:40724: EOF

The portainer agent from the worker node has no specific logs

Are you able to manage swarm through cli from manager node?

Can you list nodes? docker node ls

Confirm if it’s a problem with swarm or portainer.

From CLI i am able to list nodes services and tasks for the nodes

I am not sure how to list the containers for a node from manager and exec bash into it. As docker nodes ps <node_id> only lists the tasks.

From Portainer, all nodes, services and its tasks are listed. Please note the console option is disabled for tasks running from swarm2 node(worker node)

One can observer from below Containers of mariadb-amr_db is not listed in the below image as it belong to the worker node.

I faced similar situation when,

  1. I connected to wrong IP, While docker swarm init --advertise-addr I used public static ip and used private static ip when I did docker swarm join, or other way private <-> public.
  2. firewall or restriction on port access between nodes

You are right, It works when we internal IP, but partially. The containers are getting identified. But portainer is finding difficulty in getting the volumes, due to which, logs and console are not accessible.

Further portainer becomes very slow as soon as we add a swarm node. With single manager node it works blazingly fast.

There is some serious configuration issue with respect to networks and firewall or even the way I had setup the volumes.

Anyone was successful setting up a cluster for the VMs in GCP can help here with steps or articles.