Error: signal only works in main thread of the main interpreter

I built custom image by referring to the fappe_docker repo and operating a production env.

Sometimes, execute “docker compose down” and “docker compose up” for restart service the above error appears.

CMD [
“/home/frappe/frappe-bench/env/bin/gunicorn”,
“–chdir=/home/frappe/frappe-bench/sites”,
“–bind=0.0.0.0:8000”,
“–threads=4”,
“–workers=33”,
“–worker-class=gthread”,
“–worker-tmp-dir=/dev/shm”,
“–timeout=120”,
“–preload”,
“frappe.app:application”
]

This is my command settings for custom image.
Can anyone tell me what I can try to avoid the above error when restarting.

Thank you.

I solved this problem.

The problem was kill main process while execute background job.

Then, redis queue try to response to frappe for failed reason with abandoned job.

In this process, a problem occurs because the pid of the frappe that the redis queue is trying to connect to changes after restarting.

So I run this command before shutting down the server.
bench disable-scheduler && bench execute frappe.core.doctype.rq_job.rq_job.remove_failed_job

You can avoid this conflict by executing remove_failed_job.

3 Likes

docker exec -i ${redis queue container name} redis-cli FLUSHDB
docker exec -i ${redis queue container name} redis-cli FLUSHALL

You should also run this command before restarting server.

Hi @MinHyeong-Lee
I am also facing the same issue. It’s happening whenever a job fails (in my case pulling emails from email account fails) and I have to manually remove those failed jobs from the redis-queue everytime.
Is there any stable solution that you may have figured out ?

Hi @root
In my experience, there was an issue with the code. I believed it was caused by a conflict with how RQ jobs operate.
When trying to run a function that works asynchronously—such as one using async or threading—within an RQ job, I remember encountering an error like the one in the attached image.
It’s been a long time, so I don’t remember the details exactly—sorry about that.

But what I do remember clearly is that, in my case, the problem was resolved when I reimplemented the function using a different approach that performed the same task.