I have an ERPNext 15.2 setup running on Docker with a custom image using Supervisor, and another one running on Kubernetes with the official image frappe/erpnext:15.3.8
. Both setups have similar VM configurations, and each is using 8 workers. When I try to run schedulers on a long queue in Docker, they execute quickly, but in Kubernetes, they run slowly, even with similar data.
I’m assuming you’ve same MariaDB service serving both the docker and kubernetes environment. Both MariaDB need to be same to skip looking into database as bottleneck.
Kubernetes uses NFS volumes for shared sites. Any things that needs to read or write files from sites (NFS) mount will be slower than volume located on same VM.
- Use fastest NAS storage possible.
- Design the app such that there is no disk usage. Use redis, db or pod/container memory processing.
Are you processing files in the background jobs?
Thanks @revant_one for response.
No do not process any file in background jobs, we just use select query to fetch data from database.
How is db setup for both environments?
Db is running on a VM which has 24 core cpu and 128 of ram.
External ssd is also attached for storage.
One strange thing I’ve noticed is that when we are running approximately 100 schedulers in a queue, a few batches (2-3) finish very fast initially, taking around 30 minutes each. However, the completion time gradually increases, first to 35 minutes and then continues to rise with subsequent batches. (8 workers per batch).
This behavior occurs in both Kubernetes and Docker.
Is there a specific reason for this?
Do not scale scheduler FAQ | ERPNext Helm Chart
Do not scale socketio 400 Errors with Multiple socketio replicas - affinity? · Issue #229 · frappe/helm · GitHub
Configure sentry, open telemetry or any traces monitoring and try to understand queries by application. Opentelemetry with Frappe framework? - #5 by revant_one