Redis cache server not running - debug help needed

ericmoon · September 2, 2016, 3:30pm

I did a bench update this morning and hit a problem with the redis-cache server. During the update (and on subsequent ‘bench restart’ commands, I got the following:

frappe-bench-frappe-schedule: stopped
frappe-bench-frappe-default-worker-0: stopped
frappe-bench-frappe-long-worker-0: stopped
frappe-bench-frappe-short-worker-0: stopped
frappe-bench-frappe-web: stopped
frappe-bench-frappe-schedule: started
frappe-bench-frappe-default-worker-0: ERROR (abnormal termination)
frappe-bench-frappe-long-worker-0: ERROR (abnormal termination)
frappe-bench-frappe-short-worker-0: ERROR (abnormal termination)
frappe-bench-frappe-web: started
frappe-bench-node-socketio: ERROR (abnormal termination)

So, of course, when I loaded the site afterwards, I got the “Redis cache server not running. Please contact Administrator / Tech support” pop-up window.

I’ve been reading through logs and and older posts to try and debug, but have not found the magic bullet yet.

I’ve tried ‘bench retry-upgrade’, ‘bench start’, ‘bench setupio’… all with no luck.

Any pointers on where to start the debug?

Thanks,

alec_ruizramon1 · September 2, 2016, 3:42pm

what is the output of sudo supervisorctl status ?

ericmoon · September 2, 2016, 3:59pm

frappe-bench-redis:frappe-bench-redis-cache RUNNING    pid 14781, uptime 1:11:01
frappe-bench-redis:frappe-bench-redis-queue RUNNING    pid 14780, uptime 1:11:01
frappe-bench-redis:frappe-bench-redis-socketio RUNNING    pid 14782, uptime 1:11:01
frappe-bench-web:frappe-bench-frappe-web RUNNING    pid 10852, uptime 0:44:53
frappe-bench-web:frappe-bench-node-socketio FATAL      Exited too quickly (process log may have details)
frappe-bench-workers:frappe-bench-frappe-default-worker-0 RUNNING    pid 26876, uptime 0:00:01
frappe-bench-workers:frappe-bench-frappe-long-worker-0 STARTING
frappe-bench-workers:frappe-bench-frappe-schedule RUNNING    pid 26615, uptime 0:00:16
frappe-bench-workers:frappe-bench-frappe-short-worker-0 STARTING

alec_ruizramon1 · September 2, 2016, 4:03pm

Hm, it’s interesting that the workers are still starting. It also looks like redis is working.

At any rate, you can try to re-set-up production.
sudo bench setup production
sudo supervisorctl reread
sudo supervisorctl restart all

ericmoon · September 2, 2016, 4:15pm

sudo bench setup production is asking for a user…?

> sudo bench setup production
Usage: bench setup production [OPTIONS] USER

Error: Missing argument "user".

alec_ruizramon1 · September 2, 2016, 4:24pm

should be the user you installed frappe with - usually it’s frappe.

So the command will be sudo bench setup production frappe

ericmoon · September 2, 2016, 4:27pm

Thanks, but no luck:

> sudo bench setup production frappe
supervisor.conf already exists and this will overwrite it. Do you want to continue? [y/N]: y
nginx.conf already exists and this will overwrite it. Do you want to continue? [y/N]: y
No config updates to processes
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful



> sudo supervisorctl reread
No config updates to processes


> sudo supervisorctl restart all
frappe-bench-frappe-schedule: stopped
frappe-bench-frappe-default-worker-0: stopped
frappe-bench-frappe-long-worker-0: stopped
frappe-bench-frappe-short-worker-0: stopped
frappe-bench-frappe-web: stopped
frappe-bench-redis-queue: stopped
frappe-bench-redis-cache: stopped
frappe-bench-redis-socketio: stopped
frappe-bench-frappe-schedule: started
frappe-bench-frappe-default-worker-0: ERROR (abnormal termination)
frappe-bench-frappe-long-worker-0: started
frappe-bench-frappe-short-worker-0: started
frappe-bench-frappe-web: started
frappe-bench-node-socketio: ERROR (abnormal termination)
frappe-bench-redis-queue: started
frappe-bench-redis-cache: started
frappe-bench-redis-socketio: started

alec_ruizramon1 · September 2, 2016, 4:32pm

Interesting. What does sites/common_site_config.json look like?

Also, the output of current socketio + workers may help -
ps aux | grep socketio
ps aux | grep default

ericmoon · September 2, 2016, 4:35pm

> cat sites/common_site_config.json
{
 "auto_update": false,
 "background_workers": 1,
 "dns_multitenant": true,
 "frappe_user": "frappe",
 "gunicorn_workers": 4,
 "rebase_on_pull": false,
 "redis_cache": "redis://localhost:13000",
 "redis_queue": "redis://localhost:11000",
 "redis_socketio": "redis://localhost:12000",
 "restart_supervisor_on_update": true,
 "serve_default_site": true,
 "shallow_clone": true,
 "socketio_port": 9000,
 "update_bench_on_update": true,
 "webserver_port": 8000
}

and

'ps' aux |grep default
frappe    2232  0.0  0.5  77560 22440 ?        S    09:34   0:00 /usr/bin/python /usr/local/bin/bench worker --queue default

Nothing is running with socketio

alec_ruizramon1 · September 2, 2016, 4:40pm

Ok, cool.

It looks like a default worker queue is running, so I don’t know why there was an abnormal termination. Maybe it didn’t shut down properly? You can kill the process and restart supervisor again.

In terms of socketio, is it installed? Or just not running? Check with npm list socket.io .

ericmoon · September 2, 2016, 4:55pm

Socketio is installed

> npm list socket.io
/home/frappe/frappe-bench
└── socket.io@1.4.8

And i can’t seem to kill the process…this is strange :

> echo "1" && 'ps' aux |grep default && sleep 2 && echo "2" && 'ps' aux |grep default
1
frappe   23768 59.0  0.6 106096 27244 ?        R    09:54   0:00 /home/frappe/frappe-bench/env/bin/python -m frappe.utils.bench_helper frappe worker --queue default
frappe   23792  0.0  0.0  12672  1020 pts/1    S+   09:54   0:00 grep default
2
frappe   23805 70.0  0.6 106484 27780 ?        R    09:54   0:00 /home/frappe/frappe-bench/env/bin/python -m frappe.utils.bench_helper frappe worker --queue default
frappe   23828  0.0  0.0  12672  1020 pts/1    S+   09:54   0:00 grep default

Process keeps changing on me… ???

alec_ruizramon1 · September 2, 2016, 5:02pm

Not sure how to proceed on the pid changing

Another thing to check is if you stop (i.e. sudo supervisorctl stop all ) and see if that worker is running. It shouldn’t be, but I still don’t know why it’s listed as abnormal termination in supervisor but still running.

Another thing I remembered - what version of node are you using? node --version

ericmoon · September 2, 2016, 5:04pm

node --version
v0.10.40

ericmoon · September 2, 2016, 5:05pm

And after shutting down supervisor, no worker process running anymore

alec_ruizramon1 · September 2, 2016, 5:09pm

That could do it! Frappe/ERPNext 7 require node 5/6 → upgrade node and restart supervisor processes.

sudo apt-get install nodejs
sudo supervisorctl restart all

Edit: if you don’t have the repository, do this:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 68576280
sudo apt-add-repository "deb https://deb.nodesource.com/node_5.x $(lsb_release -sc) main"
sudo apt-get update
sudo apt-get install nodejs

ericmoon · September 2, 2016, 5:14pm

Gah! of course… I installed rocket.chat on the same server, which looks like it blasted the node version to an ancient version… Let me give it a try to fix the version and I’ll report back.

ericmoon · September 2, 2016, 5:37pm

Good news: socketio is now running:

> ps |grep socketio
frappe    7537  0.0  0.0  12672  1024 pts/1    S+   10:35   0:00  |           \_ grep socketio
frappe    6420  1.2  0.9 1217544 40576 ?       Sl   10:34   0:00  \_ /usr/local/bin/node /home/frappe/frappe-bench/apps/frappe/socketio.js

bad news is that I still have the “Redis cache server not running. Please contact Administrator / Tech support” pop up.

> sudo supervisorctl status
frappe-bench-redis:frappe-bench-redis-cache RUNNING    pid 6391, uptime 0:02:21
frappe-bench-redis:frappe-bench-redis-queue RUNNING    pid 6380, uptime 0:02:21
frappe-bench-redis:frappe-bench-redis-socketio RUNNING    pid 6395, uptime 0:02:21
frappe-bench-web:frappe-bench-frappe-web RUNNING    pid 6350, uptime 0:02:22
frappe-bench-web:frappe-bench-node-socketio RUNNING    pid 6420, uptime 0:02:20
frappe-bench-workers:frappe-bench-frappe-default-worker-0 RUNNING    pid 8828, uptime 0:00:01
frappe-bench-workers:frappe-bench-frappe-long-worker-0 STARTING
frappe-bench-workers:frappe-bench-frappe-schedule RUNNING    pid 8465, uptime 0:00:22
frappe-bench-workers:frappe-bench-frappe-short-worker-0 STARTING

thoughts?

(node is now version 6.5)

alec_ruizramon1 · September 2, 2016, 5:43pm

Well, we got one problem solved!

Is redis running on the default port (6379)?

ps aux | grep redis yields this for me:

frappe@erp-production:~/frappe-bench$ ps aux | grep redis
redis      930  0.0  0.2  39844  4340 ?        Ssl  Jul22  26:17 /usr/bin/redis-server 127.0.0.1:6379       
frappe    9914  0.0  0.3  39848  8144 ?        Sl   Aug24   8:21 /usr/bin/redis-server 127.0.0.1:11000                                  
frappe    9916  0.0  0.5  76712 10484 ?        Sl   Aug24   6:57 /usr/bin/redis-server 127.0.0.1:13000                                  
frappe    9932  0.0  0.2  39848  5160 ?        Sl   Aug24   4:07 /usr/bin/redis-server 127.0.0.1:12000                                     
frappe   14330  0.0  0.0  11744   936 pts/1    S+   13:42   0:00 grep --color=auto redis

ericmoon · September 2, 2016, 5:47pm

It wasnt:

> ps aux | grep redis
frappe   16734  0.0  0.0  12672  1020 pts/1    S+   10:44   0:00  |           \_ grep redis
frappe    6380  0.0  0.0  41008  2044 ?        Sl   10:34   0:00  \_ /usr/bin/redis-server 127.0.0.1:11000
frappe    6391  0.0  0.0  41008  2048 ?        Sl   10:34   0:00  \_ /usr/bin/redis-server 127.0.0.1:13000
frappe    6395  0.0  0.0  41008  2116 ?        Sl   10:34   0:00  \_ /usr/bin/redis-server 127.0.0.1:12000

so i restarted it:

> sudo systemctl restart redis-server

And now it is:

> ps aux | grep redis
frappe   17136  0.0  0.0  12672  1020 pts/1    S+   10:44   0:00  |           \_ grep redis
frappe    6380  0.0  0.0  41008  2044 ?        Sl   10:34   0:00  \_ /usr/bin/redis-server 127.0.0.1:11000
frappe    6391  0.0  0.0  41008  2048 ?        Sl   10:34   0:00  \_ /usr/bin/redis-server 127.0.0.1:13000
frappe    6395  0.0  0.0  41008  2116 ?        Sl   10:34   0:00  \_ /usr/bin/redis-server 127.0.0.1:12000
redis    17056  0.0  0.0  41004  1672 ?        Ssl  10:44   0:00 /usr/bin/redis-server 127.0.0.1:6379

Reloaded ERPNext site and still get the Redis cache error popup

alec_ruizramon1 · September 2, 2016, 5:58pm

You may have to reread the supervisor file from before, and restart all.

Apart from that I’m out of ideas right now