Frappe Docker health checks are failing in AWS ALB

Hi Team,

We are currently working on a Frappe application running as containers on an AWS EC2 instance. We have attached an ALB to route the traffic to a target group (EC2 instance).

Issue: The health checks are failing. I can see the requests are reaching the EC2 instance container logs, but they are getting 404 responses.

frontend-1      | 172.31.43.179 - - [06/Jun/2024:02:44:33 +0000] "GET /api/method/frappe.handler.ping HTTP/1.1" 404 114 "-" "ELB-HealthChecker/2.0"
frontend-1      | 172.31.5.152 - - [06/Jun/2024:02:44:33 +0000] "GET /api/method/frappe.handler.ping HTTP/1.1" 404 114 "-" "ELB-HealthChecker/2.0"
frontend-1      | 172.31.43.179 - - [06/Jun/2024:02:45:03 +0000] "GET /api/method/frappe.handler.ping HTTP/1.1" 404 114 "-" "ELB-HealthChecker/2.0"
frontend-1      | 172.31.5.152 - - [06/Jun/2024:02:45:03 +0000] "GET /api/method/frappe.handler.ping HTTP/1.1" 404 114 "-" "ELB-HealthChecker/2.0"
frontend-1      | 172.31.43.179 - - [06/Jun/2024:02:45:33 +0000] "GET /api/method/frappe.handler.ping HTTP/1.1" 404 114 "-" "ELB-HealthChecker/2.0"
frontend-1      | 172.31.5.152 - - [06/Jun/2024:02:45:33 +0000] "GET /api/method/frappe.handler.ping HTTP/1.1" 404 114 "-" "ELB-HealthChecker/2.0"

When I run the same health check without the ALB, it returns a 200 response. However, with the ALB, the request results in a 404. Due to the failed health check, the application is not functioning correctly.

Observations: After SSHing into the instance and executing into both the frontend and backend containers, the result is the same:

http://localhost:8080/api/method/frappe.handler.ping - 404

Frappe version - 14

Docker compose:

name: ajna
services:
  backend:
    depends_on:
      configurator:
        condition: service_completed_successfully
        required: true
    image: XXXXXX.dkr.ecr.ap-south-1.amazonaws.com/shodana:v1.0.0
    networks:
      default: null
    platform: linux/amd64
    volumes:
      - type: volume
        source: sites
        target: /home/frappe/frappe-bench/sites
        volume: {}

  configurator:
    command:
      - |
        ls -1 apps > sites/apps.txt; bench set-config -g db_host $$DB_HOST; bench set-config -gp db_port $$DB_PORT; bench set-config -g redis_cache "redis://$$REDIS_CACHE"; bench set-config -g redis_queue "redis://$$REDIS_QUEUE"; bench set-config -g redis_socketio "redis://$$REDIS_QUEUE"; bench set-config -gp socketio_port $$SOCKETIO_PORT;
    depends_on:
      redis-cache:
        condition: service_started
        required: true
      redis-queue:
        condition: service_started
        required: true
    entrypoint:
      - bash
      - -c
    environment:
      DB_HOST: ajna-docker-rds.XXXXXX.amazonaws.com
      DB_PORT: "3306"
      REDIS_CACHE: redis-cache:6379
      REDIS_QUEUE: redis-queue:6379
      SOCKETIO_PORT: "9000"
    image: XXXXXX.dkr.ecr.ap-south-1.amazonaws.com/shodana:v1.0.0
    networks:
      default: null
    platform: linux/amd64
    volumes:
      - type: volume
        source: sites
        target: /home/frappe/frappe-bench/sites
        volume: {}

  frontend:
    command:
      - nginx-entrypoint.sh
    depends_on:
      backend:
        condition: service_started
        required: true
      websocket:
        condition: service_started
        required: true
    environment:
      BACKEND: backend:8000
      CLIENT_MAX_BODY_SIZE: 50m
      FRAPPE_SITE_NAME_HEADER: $$host
      PROXY_READ_TIMEOUT: "120"
      SOCKETIO: websocket:9000
      UPSTREAM_REAL_IP_ADDRESS: 127.0.0.1
      UPSTREAM_REAL_IP_HEADER: X-Forwarded-For
      UPSTREAM_REAL_IP_RECURSIVE: "off"
    image: XXXXXX.dkr.ecr.ap-south-1.amazonaws.com/shodana:v1.0.0
    networks:
      default: null
    platform: linux/amd64
    ports:
      - mode: ingress
        target: 8080
        published: "8080"
        protocol: tcp
    volumes:
      - type: volume
        source: sites
        target: /home/frappe/frappe-bench/sites
        volume: {}

  queue-long:
    command:
      - bench
      - worker
      - --queue
      - long,default,short
    depends_on:
      configurator:
        condition: service_completed_successfully
        required: true
    image: XXXXXX.dkr.ecr.ap-south-1.amazonaws.com/shodana:v1.0.0
    networks:
      default: null
    platform: linux/amd64
    volumes:
      - type: volume
        source: sites
        target: /home/frappe/frappe-bench/sites
        volume: {}

  queue-short:
    command:
      - bench
      - worker
      - --queue
      - short,default
    depends_on:
      configurator:
        condition: service_completed_successfully
        required: true
    image: XXXXXX.dkr.ecr.ap-south-1.amazonaws.com/shodana:v1.0.0
    networks:
      default: null
    platform: linux/amd64
    volumes:
      - type: volume
        source: sites
        target: /home/frappe/frappe-bench/sites
        volume: {}

  redis-cache:
    image: redis:6.2-alpine
    networks:
      default: null
    volumes:
      - type: volume
        source: redis-cache-data
        target: /data
        volume: {}

  redis-queue:
    image: redis:6.2-alpine
    networks:
      default: null
    volumes:
      - type: volume
        source: redis-queue-data
        target: /data
        volume: {}

  scheduler:
    command:
      - bench
      - schedule
    depends_on:
      configurator:
        condition: service_completed_successfully
        required: true
    image: XXXXXX.dkr.ecr.ap-south-1.amazonaws.com/shodana:v1.0.0
    networks:
      default: null
    platform: linux/amd64
    volumes:
      - type: volume
        source: sites
        target: /home/frappe/frappe-bench/sites
        volume: {}

  websocket:
    command:
      - node
      - /home/frappe/frappe-bench/apps/frappe/socketio.js
    depends_on:
      configurator:
        condition: service_completed_successfully
        required: true
    image: XXXXXX.dkr.ecr.ap-south-1.amazonaws.com/shodana:v1.0.0
    networks:
      default: null
    platform: linux/amd64
    volumes:
      - type: volume
        source: sites
        target: /home/frappe/frappe-bench/sites
        volume: {}

networks:
  default:
    name: ajna_default

volumes:
  redis-cache-data:
    name: ajna_redis-cache-data
  redis-queue-data:
    name: ajna_redis-queue-data
  sites:
    name: ajna_sites
x-backend-defaults:
  depends_on:
    configurator:
      condition: service_completed_successfully
  image: XXXXXX.dkr.ecr.ap-south-1.amazonaws.com/shodana:v1.0.0
  volumes:
    - sites:/home/frappe/frappe-bench/sites
x-customizable-image:
  image: XXXXXX.dkr.ecr.ap-south-1.amazonaws.com/shodana:v1.0.0
x-depends-on-configurator:
  depends_on:
    configurator:
      condition: service_completed_successfully

Any insights or suggestions to resolve this issue would be greatly appreciated!

@revant_one Could you please help.

pass host header with site name and it will serve site.

Thank you for the response @revant_one

FRAPPE_SITE_NAME_HEADER: $$host to 
FRAPPE_SITE_NAME_HEADER: paramesh.ajna.cloud

here the sitename is paramesh.ajna.cloud is it correct ?

Thank you @revant_one it worked, able to serve the site using the ALB and with SSL using the ACM and Route53.

Hello @revant_one, I am having a similar problem, but the deployment is in eks through helm, i have created the alb ingress-controller and alb ingress resource with host as the site name= erp.cluster.local
but the target is showing unhealthy, can you help me with this, been trying for 20+ days.
Everythings running in the deployment as below,

C:\Users\toufi>k get ingress -n frappe
NAME             CLASS   HOSTS               ADDRESS                                                                  PORTS   AGE
frappe-ingress   alb     erp.cluster.local   k8s-frappe-frappein-********************.ap-south-1.elb.amazonaws.com   80      40s

C:\Users\toufi>k get all -n frappe
NAME                                                 READY   STATUS      RESTARTS      AGE
pod/frappe-erpnext-conf-bench-20250107171704-n52bh   0/1     Completed   0             33m
pod/frappe-erpnext-gunicorn-66cf7bbf57-scpwn         1/1     Running     0             36m
pod/frappe-erpnext-new-site-20250107171704-692cm     0/1     Completed   0             33m
pod/frappe-erpnext-nginx-6f75c946d-5795f             1/1     Running     0             36m
pod/frappe-erpnext-scheduler-5f7b9fdd87-zwbws        1/1     Running     0             33m
pod/frappe-erpnext-socketio-66b5dbbbdd-9q7xk         1/1     Running     2 (36m ago)   36m
pod/frappe-erpnext-worker-d-79f5dfd4b-ptd46          1/1     Running     0             33m
pod/frappe-erpnext-worker-l-6fd55bc959-7kwfx         1/1     Running     0             33m
pod/frappe-erpnext-worker-s-5769d7b79-8kf2q          1/1     Running     0             33m
pod/frappe-redis-cache-master-0                      1/1     Running     0             36m
pod/frappe-redis-queue-master-0                      1/1     Running     0             36m

NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/frappe-erpnext                ClusterIP   172.20.145.22    <none>        8080/TCP   36m
service/frappe-erpnext-gunicorn       ClusterIP   172.20.15.236    <none>        8000/TCP   36m
service/frappe-erpnext-socketio       ClusterIP   172.20.240.12    <none>        9000/TCP   36m
service/frappe-redis-cache-headless   ClusterIP   None             <none>        6379/TCP   36m
service/frappe-redis-cache-master     ClusterIP   172.20.162.240   <none>        6379/TCP   36m
service/frappe-redis-queue-headless   ClusterIP   None             <none>        6379/TCP   36m
service/frappe-redis-queue-master     ClusterIP   172.20.165.153   <none>        6379/TCP   36m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/frappe-erpnext-gunicorn    1/1     1            1           36m
deployment.apps/frappe-erpnext-nginx       1/1     1            1           36m
deployment.apps/frappe-erpnext-scheduler   1/1     1            1           36m
deployment.apps/frappe-erpnext-socketio    1/1     1            1           36m
deployment.apps/frappe-erpnext-worker-d    1/1     1            1           36m
deployment.apps/frappe-erpnext-worker-l    1/1     1            1           36m
deployment.apps/frappe-erpnext-worker-s    1/1     1            1           36m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/frappe-erpnext-gunicorn-66cf7bbf57    1         1         1       36m
replicaset.apps/frappe-erpnext-nginx-6f75c946d        1         1         1       36m
replicaset.apps/frappe-erpnext-scheduler-565dbfcb85   0         0         0       36m
replicaset.apps/frappe-erpnext-scheduler-5f7b9fdd87   1         1         1       33m
replicaset.apps/frappe-erpnext-socketio-66b5dbbbdd    1         1         1       36m
replicaset.apps/frappe-erpnext-worker-d-6b7c46b8c6    0         0         0       36m
replicaset.apps/frappe-erpnext-worker-d-79f5dfd4b     1         1         1       33m
replicaset.apps/frappe-erpnext-worker-l-5fdc995557    0         0         0       36m
replicaset.apps/frappe-erpnext-worker-l-6fd55bc959    1         1         1       33m
replicaset.apps/frappe-erpnext-worker-s-5769d7b79     1         1         1       33m
replicaset.apps/frappe-erpnext-worker-s-7b754d6b9c    0         0         0       36m

NAME                                         READY   AGE
statefulset.apps/frappe-redis-cache-master   1/1     37m
statefulset.apps/frappe-redis-queue-master   1/1     37m

NAME                                                 STATUS     COMPLETIONS   DURATION   AGE
job.batch/frappe-erpnext-conf-bench-20250107171704   Complete   1/1           9s         33m
job.batch/frappe-erpnext-new-site-20250107171704     Complete   1/1           6s         33m

C:\Users\toufi>k get all -n nfs
NAME                                      READY   STATUS    RESTARTS   AGE
pod/in-cluster-nfs-server-provisioner-0   1/1     Running   0          50m


C:\Users\toufi>k get sc
NAME   PROVISIONER                                       RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2    kubernetes.io/aws-ebs                             Delete          WaitForFirstConsumer   false                  133d
nfs    cluster.local/in-cluster-nfs-server-provisioner   Delete          Immediate              true                   51m

Please help solve this problem, Thankyou