Bench Restart Socket.io Spawn Error

Self-hosted: Digital Ocean

Erpnext: 15.21.2
Frappe: 15.24.1
Bench: 5.22.6
Hrms: 15.20.0
Environment: Production
Setup: Manually (Forum Guide)

Referencing: Issue 19062 and Forum Discussion
because my node version is up to requirements

Node: v18.20.2
NPM: v10.5.0

Running all the commands from the frappe user I had created since initial installation.

ERPNext has been running flawlessly for about a month or two since its inception, recently I had to create a new app to override some core methods of manufacturing module.

Used

bench new-app custom_app

Everything builds successfully up until where bench restarts (bench restart stage), i get the following output at that point:

$ supervisorctl restart frappe-bench-web:
frappe-bench-web:frappe-bench-frappe-web: stopped
frappe-bench-web:frappe-bench-node-socketio: stopped
frappe-bench-web:frappe-bench-node-socketio: ERROR (spawn error)
frappe-bench-web:frappe-bench-frappe-web: started
WARN: restarting supervisor group `frappe-bench-web:` failed. Use `bench restart` to retry.
$ supervisorctl restart frappe-bench-workers:
frappe-bench-workers:frappe-bench-frappe-schedule: stopped
frappe-bench-workers:frappe-bench-frappe-short-worker-0: stopped
frappe-bench-workers:frappe-bench-frappe-long-worker-0: stopped
frappe-bench-workers:frappe-bench-frappe-schedule: started
frappe-bench-workers:frappe-bench-frappe-short-worker-0: started
frappe-bench-workers:frappe-bench-frappe-long-worker-0: started
INFO: A newer version of bench is available: 5.22.3 → 5.22.6

On manually restarting as instructed using bench restart, I get:

$ supervisorctl restart frappe-bench-web:
frappe-bench-web:frappe-bench-node-socketio: stopped
frappe-bench-web:frappe-bench-frappe-web: stopped
frappe-bench-web:frappe-bench-node-socketio: ERROR (spawn error)
frappe-bench-web:frappe-bench-frappe-web: started
ERROR: supervisorctl restart frappe-bench-web:
subprocess.CalledProcessError: Command 'supervisorctl restart frappe-bench-web:' returned non-zero exit status 7.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/bench", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/bench/cli.py", line 132, in cli
    bench_command()
  File "/usr/local/lib/python3.10/dist-packages/bench/commands/utils.py", line 41, in restart
    Bench(".").reload(web, supervisor, systemd)
  File "/usr/local/lib/python3.10/dist-packages/bench/utils/render.py", line 126, in wrapper_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/bench/bench.py", line 152, in reload
    restart_supervisor_processes(bench_path=self.name, web_workers=web, _raise=_raise)
  File "/usr/local/lib/python3.10/dist-packages/bench/utils/bench.py", line 342, in restart_supervisor_processes
    failure = bench.run(f"{sudo}supervisorctl restart {group}", _raise=_raise)
  File "/usr/local/lib/python3.10/dist-packages/bench/bench.py", line 48, in run
    return exec_cmd(cmd, cwd=cwd or self.cwd, _raise=_raise)
  File "/usr/local/lib/python3.10/dist-packages/bench/utils/__init__.py", line 169, in exec_cmd
    raise CommandFailedError(cmd) from subprocess.CalledProcessError(return_code, cmd)
bench.exceptions.CommandFailedError: supervisorctl restart frappe-bench-web:

INFO: A newer version of bench is available: 5.22.3 → 5.22.6

It is worthwhile to mention that the app has been created, but I am awaiting clarity on the cause of the issue.

Kindly provide some pointers as to how to solve this. Also, I’ll be happy to provide logs as necessary.

@Ace_Bliss try the below commands

bench setup socketio

bench setup supervisor

bench setup redis

sudo supervisorctl reload

sudo service supervisor stop all

sudo service supervisor start all

sudo service supervisor restart

Ran all the commands

ended with Restarted supervisord
Didn’t throw any errors

Still on using

bench restart

I get the same traceback (2nd one on the original post)

Are you running in production mode?
if so, try doing the following:

FILE="/etc/supervisor/supervisord.conf"
SEARCH_PATTERN="chown=$USER:$USER"
sudo sed -i "5a $SEARCH_PATTERN" "$FILE"
sudo service supervisor restart
yes | sudo bench setup production $USER
bench scheduler enable
bench scheduler resume
bench setup socketio
yes | bench setup supervisor
bench setup redis
sudo supervisorctl reload

This may fix the supervisorctl restart frappe-bench-web error

Are you running in production mode?

Yeah

It printed a traceback on executing:

yes | sudo bench setup production $USER

The Traceback

$USER
Setting Up prerequisites...
Setting Up supervisor...
ERROR: While reading from '/etc/supervisor/supervisord.conf' [line  6]: option 'chown' in section 'unix_http_server' already exists
Traceback (most recent call last):
  File "/usr/local/bin/bench", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/bench/cli.py", line 132, in cli
    bench_command()
  File "/usr/lib/python3/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/bench/commands/setup.py", line 110, in setup_production
    setup_production(user=user, yes=yes)
  File "/usr/local/lib/python3.10/dist-packages/bench/config/production_setup.py", line 51, in setup_production
    check_supervisord_config(user=user)
  File "/usr/local/lib/python3.10/dist-packages/bench/config/supervisor.py", line 140, in check_supervisord_config
    config.read(supervisord_conf)
  File "/usr/lib/python3.10/configparser.py", line 699, in read
    self._read(fp, filename)
  File "/usr/lib/python3.10/configparser.py", line 1098, in _read
    raise DuplicateOptionError(sectname, optname,
configparser.DuplicateOptionError: While reading from '/etc/supervisor/supervisord.conf' [line  6]: option 'chown' in section 'unix_http_server' already exists
INFO: A newer version of bench is available: 5.22.3 → 5.22.6

Still i moved ahead and executed the rest of commands, which went through flawlessly.

Furthermore, I had already added my frappe user in supervisord.conf while troubleshooting

Hello,

You have “frappe user in supervisord.conf” but you run command with “sudo”, so isn’t the sudo user that need to have access to the file ?
I’m not familiar with Digital Ocean hosting specification, but it just ring a bell to me by reading your post.

On the other hand, the error message tell “unix_http_server” already exists, may be by removing it manually, it will let “bench setup production” do his job by adding it.
Probably “bench setup production” don’t implement your configuration case and need a “fresh” file to complete his job

First and foremost, thanks for replying, I appreciate the responses.

You have “frappe user in supervisord.conf” but you run command with “sudo”, so isn’t the sudo user that need to have access to the file ?

I had given that user permissions while installing the ERPNext instance for the first time, following this forum guide as I mentioned on the original post. The following is the code snippet that I used.

sudo adduser [frappe-user]
usermod -aG sudo [frappe-user]
su [frappe-user] 
cd /home/[frappe-user]/

On the other hand, the error message tell “unix_http_server” already exists, may be by removing it manually, it will let “bench setup production” do his job by adding it.
Probably “bench setup production” don’t implement your configuration case and need a “fresh” file to complete his job

Originally it didn’t had the frappe user, so I added following this post while resolving Supervisorctl restart frappe error

Issue solved, cause unknown

In contrast to socket.io, redis was also not working

frappe-user@ubuntu-erp:~/frappe-bench$ supervisorctl status

frappe-bench-redis:frappe-bench-redis-cache               FATAL     Exited too quickly (process log may have details)
frappe-bench-redis:frappe-bench-redis-queue               FATAL     Exited too quickly (process log may have details)
frappe-bench-web:frappe-bench-frappe-web                  RUNNING   pid 1675110, uptime 0:00:01
frappe-bench-web:frappe-bench-node-socketio               BACKOFF   Exited too quickly (process log may have details)
frappe-bench-workers:frappe-bench-frappe-long-worker-0    RUNNING   pid 1439880, uptime 16:19:48
frappe-bench-workers:frappe-bench-frappe-schedule         RUNNING   pid 1439878, uptime 16:19:48
frappe-bench-workers:frappe-bench-frappe-short-worker-0   RUNNING   pid 1439879, uptime 16:19:48

Checked out each of their logs

less ~/frappe-bench/logs/redis-cache.log
less ~/frappe-bench/logs/redis-queue.log
less ~/frappe-bench/logs/node-socketio.log

Last two were for especial significance for resolving the issue for redis-cache

1440115:C 29 May 2024 22:19:51.660 # Redis version=6.0.16, bits=64, commit=00000000, modified=0, pid=1440115, just started
1440115:C 29 May 2024 22:19:51.660 # Configuration loaded
1440115:M 29 May 2024 22:19:51.661 * Increased maximum number of open files to 10032 (it was originally set to 1024).
1440115:M 29 May 2024 22:19:51.661 # Could not create server TCP listening socket 127.0.0.1:13000: bind: Address already in use
1440139:C 29 May 2024 22:20:01.059 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

Something similar was for the case of redis-queue and socket.io , found out each of there ports.
Like in this case it is port 13000

sudo lsof -i :13000

listed all the processes on port 13000

Used the following kill command to switch off all the current running redis process

sudo kill -9 <PID>
redis-ser   32054 frappe-user    6u  IPv4   197500      0t0  TCP localhost:13000 (LISTEN)
redis-ser   32054 frappe-user    7u  IPv4   197573      0t0  TCP localhost:13000->localhost:44670 (ESTABLISHED)
redis-ser   32054 frappe-user    8u  IPv4   197575      0t0  TCP localhost:13000->localhost:44686 (ESTABLISHED)
redis-ser   32054 frappe-user    9u  IPv4   197586      0t0  TCP localhost:13000->localhost:44700 (ESTABLISHED)
redis-ser   32054 frappe-user   10u  IPv4   197590      0t0  TCP localhost:13000->localhost:44716 (ESTABLISHED)
redis-ser   32054 frappe-user   11u  IPv4 15095695      0t0  TCP localhost:13000->localhost:43352 (ESTABLISHED)
redis-ser   32054 frappe-user   12u  IPv4   200049      0t0  TCP localhost:13000->localhost:60924 (ESTABLISHED)
redis-ser   32054 frappe-user   13u  IPv4   200052      0t0  TCP localhost:13000->localhost:60940 (ESTABLISHED)
redis-ser   32054 frappe-user   14u  IPv4   199561      0t0  TCP localhost:13000->localhost:60956 (ESTABLISHED)
redis-ser   32054 frappe-user   15u  IPv4 15810021      0t0  TCP localhost:13000->localhost:43550 (ESTABLISHED)
redis-ser   32054 frappe-user   16u  IPv4 15814004      0t0  TCP localhost:13000->localhost:55960 (ESTABLISHED)
redis-ser   32054 frappe-user   17u  IPv4 16248351      0t0  TCP localhost:13000->localhost:34014 (ESTABLISHED)
redis-ser   32054 frappe-user   18u  IPv4 15096763      0t0  TCP localhost:13000->localhost:45800 (ESTABLISHED)
redis-ser   32054 frappe-user   19u  IPv4 15098274      0t0  TCP localhost:13000->localhost:45816 (ESTABLISHED)
redis-ser   32054 frappe-user   20u  IPv4 15728964      0t0  TCP localhost:13000->localhost:32828 (ESTABLISHED)
redis-ser   32054 frappe-user   21u  IPv4 15735257      0t0  TCP localhost:13000->localhost:56048 (ESTABLISHED)
redis-ser   32054 frappe-user   22u  IPv4 15735263      0t0  TCP localhost:13000->localhost:56054 (ESTABLISHED)
redis-ser   32054 frappe-user   23u  IPv4 16248353      0t0  TCP localhost:13000->localhost:34020 (ESTABLISHED)
redis-ser   32054 frappe-user   24u  IPv4 16248356      0t0  TCP localhost:13000->localhost:34034 (ESTABLISHED)

These are the ones already related to redis. However, there were other python stale connections with status
CLOSE_WAIT

For example,

python  1439878 frappe-user    3u  IPv4 16249297      0t0  TCP localhost:34014->localhost:13000 (CLOSE_WAIT)

Similarly killed them.

Rinse and repeated the process for all three and the bench had automatically restarted the services correctly. Which was expected from the logs shared above

frappe-bench-redis:frappe-bench-redis-cache               RUNNING   pid 1687630, uptime 4:55:06
frappe-bench-redis:frappe-bench-redis-queue               RUNNING   pid 1687631, uptime 4:55:06
frappe-bench-web:frappe-bench-frappe-web                  RUNNING   pid 1702630, uptime 0:21:25
frappe-bench-web:frappe-bench-node-socketio               RUNNING   pid 1702631, uptime 0:21:25
frappe-bench-workers:frappe-bench-frappe-long-worker-0    RUNNING   pid 1702646, uptime 0:21:23
frappe-bench-workers:frappe-bench-frappe-schedule         RUNNING   pid 1702644, uptime 0:21:23
frappe-bench-workers:frappe-bench-frappe-short-worker-0   RUNNING   pid 1702645, uptime 0:21:23

It took some more tinkering than this, but this is a distilled version one may use to resolve issue at their end.

Also frappe team and other contributors have built a impeccable architecture, easy to navigate it. Well done guys.

Not to mention, bench restart also works flawlessly now.

$ supervisorctl restart frappe-bench-web:
frappe-bench-web:frappe-bench-node-socketio: stopped
frappe-bench-web:frappe-bench-frappe-web: stopped
frappe-bench-web:frappe-bench-frappe-web: started
frappe-bench-web:frappe-bench-node-socketio: started
$ supervisorctl restart frappe-bench-workers:
frappe-bench-workers:frappe-bench-frappe-schedule: stopped
frappe-bench-workers:frappe-bench-frappe-short-worker-0: stopped
frappe-bench-workers:frappe-bench-frappe-long-worker-0: stopped
frappe-bench-workers:frappe-bench-frappe-schedule: started
frappe-bench-workers:frappe-bench-frappe-short-worker-0: started
frappe-bench-workers:frappe-bench-frappe-long-worker-0: started
INFO: A newer version of bench is available: 5.22.3 → 5.22.6

Still I have error with socket on bench restart…even I kill alle the processes binding on 13000 port