Sorry! We will be back soon! after bench update, no previously posted solutions have worked

Andrew_Cook · June 2, 2017, 5:26am

My ERPNext server was running just fine all day, then suddenly very slow at the end of the day, then ERPNext stopped working altogether. I’m running in production mode on an Ubuntu 14.04 server. I ran some quick diagnostics (netstat, status checks on nginx & supervisor) but everything looked like it was working properly. I got on the forums here and found some previous posts with similar issues, so I first tried running a bench update, and I had issues with that telling me that I had unsaved changes to my version of ERPNext, so I wound up doing bench update --reset instead and that seemed to go through without any errors. After the update completed, I was still unable to connect to ERPNext, and now I only get a page saying:

Sorry!
We will be back soon.

Don’t panic. It’s not you, it’s us.
Most likely, our engineers are updating the code, and it should take a minute for the new code to load into memory.

Try refreshing after a minute or two.

After researching on here, I tried some of the solutions offered in other posts, but nothing has worked for me. I checked to make sure supervisor was running, and it is. I checked nginx, that is also running. I tried restarting both, but that didn’t work either. I then found a post that suggested to reconfigure by running a bench setup production command. When I did that, I was presented with messages saying that I was going to overwrite the existing nginx and supervisor conf files, and I proceeded. Now I don’t see my site folder anymore in the frappe-bench folder, and I still can’t access anything. I really need help to get back up and running in the next few hours if at all possible. Can anyone out there help me out with this?

saurabh6790 · June 2, 2017, 5:34am

Its seems your site stuck in maintenance mode.

Please check by running

bench --site site-name migrate
sudo supervisorctl restart all
sudo service nginx restart

Andrew_Cook · June 2, 2017, 5:46am

When I run this it returns:
Migrating site-name
site-name does not exist

Andrew_Cook · June 2, 2017, 5:48am

Oops, just realized I needed to type in my actual site name. Sorry!

Andrew_Cook · June 2, 2017, 5:56am

The first command seemed to work without issue, here’s what it returned:

bench --site xxxxxxx.xxxxxxx.com migrate
Migrating xxxxxxx.xxxxxxx.com
Updating DocTypes for frappe : [========================================]
Updating DocTypes for erpnext : [========================================]
Syncing help database…

The second command returned the following:

sudo supervisorctl restart all
frappe-bench-frappe-schedule: stopped
frappe-bench-frappe-default-worker-0: stopped
frappe-bench-frappe-long-worker-0: stopped
frappe-bench-frappe-short-worker-0: stopped
frappe-bench-frappe-web: stopped
frappe-bench-node-socketio: stopped
frappe-bench-redis-queue: stopped
frappe-bench-redis-cache: stopped
frappe-bench-redis-socketio: stopped
frappe-bench-frappe-schedule: started
frappe-bench-frappe-default-worker-0: ERROR (abnormal termination)
frappe-bench-frappe-long-worker-0: started
frappe-bench-frappe-short-worker-0: started
frappe-bench-frappe-web: started
frappe-bench-node-socketio: started
frappe-bench-redis-queue: started
frappe-bench-redis-cache: started
frappe-bench-redis-socketio: started

and the third command returned the following:

service nginx restart
nginx stop/waiting
nginx stop/pre-start, process 7152

Now I don’t see the “Sorry! We will be back soon!” message, but I still can not connect. Now it just says that the page can not be reached. Can you think of what may cause that?

Thank you very much for your help!

Andrew_Cook · June 2, 2017, 7:18am

A little more info… Since I am still stuck without access to ERPNext, I looked around a bit more and decided to try running nginx -t which told me there was a duplicate upstream issue. I found the issue and resolved it by deleting the conflicting files and then reconfiguring nginx. Now the tests come back successful, however when I reloaded nginx, I still did not have access to ERPNext. I decided to once again try reconfiguring by running “bench setup production” and this is what was returned:

bench setup production fubar
supervisor.conf already exists and this will overwrite it. Do you want to continue? [y/N]: y
nginx.conf already exists and this will overwrite it. Do you want to continue? [y/N]: y
INFO:bench.utils:sudo /usr/bin/supervisorctl reread
frappe-bench-redis: changed
frappe-bench-web: changed
frappe-bench-workers: changed
INFO:bench.utils:sudo /usr/bin/supervisorctl update
frappe-bench-redis: stopped
frappe-bench-redis: updated process group
frappe-bench-web: stopped
error: <class ‘xmlrpclib.Fault’>, <Fault 91: ‘STILL_RUNNING’>: file: /usr/lib/python2.7/xmlrpclib.py line: 794
INFO:bench.utils:sudo /usr/bin/supervisorctl reload
Restarted supervisord
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
INFO:bench.utils:sudo service nginx reload

It is still not working though, it just tells me that there is a problem loading the page.

If anyone can help me get back up and running I would greatly appreciate it

Andrew_Cook · June 2, 2017, 7:22am

Oh, and now the “Sorry! We will be back soon…” message is back when I try to access the server via its LAN IP address

Julian_Robbins · June 2, 2017, 8:05am

Hello Andrew

Have you checked the erpnext log itself in /logs ?

Andrew_Cook · June 2, 2017, 10:25am

Do you know which log I should take a look at? Here are the contents of the /logs folder:

backup.log frappe.log.6 web.error.log.3
bench.log frappe.log.7 web.error.log.4
frappe.log frappe.log.8 web.log
frappe.log.1 frappe.log.9 workerbeat.error.log
frappe.log.10 node-socketio.error.log workerbeat.log
frappe.log.11 node-socketio.log worker.error.log
frappe.log.12 redis-async-broker.error.log worker.error.log.1
frappe.log.13 redis-async-broker.log worker.error.log.10
frappe.log.14 redis-cache.error.log worker.error.log.2
frappe.log.15 redis-cache.log worker.error.log.3
frappe.log.16 redis-queue.error.log worker.error.log.4
frappe.log.17 redis-queue.log worker.error.log.5
frappe.log.18 redis-socketio.error.log worker.error.log.6
frappe.log.19 redis-socketio.log worker.error.log.7
frappe.log.2 schedule.error.log worker.error.log.8
frappe.log.20 schedule.log worker.error.log.9
frappe.log.3 web.error.log worker.log
frappe.log.4 web.error.log.1
frappe.log.5 web.error.log.2

Andrew_Cook · June 2, 2017, 1:47pm

Here is what Saurabh told me to do, and the results:

Hi Andrew,

Please check by executing,

bench --site site-name migrate
bench --site site-name set-maintenance-mode off
bench --site site-name scheduler resume
bench setup supervisor
sudo supervisorctl restart all
bench setup nginx
sudo server nginx restart

If won’t work then, please post the output of frappe-bench/logs/web.error.log.

I ran all of the commands successfully, but it still does not work. I checked the web error log, and it is extremely long and repetitive, but here is the copy from around the time that the issues started coming up:

[2017-06-01 08:47:55 +0000] [32153] [INFO] Worker exiting (pid: 32153)
[2017-06-01 08:47:55 +0000] [9028] [INFO] Booting worker with pid: 9028
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
[2017-06-01 13:33:32 +0000] [1311] [INFO] Worker exiting (pid: 1311)
[2017-06-01 13:33:33 +0000] [11336] [INFO] Booting worker with pid: 11336
[2017-06-01 13:48:09 +0000] [16236] [INFO] Worker exiting (pid: 16236)
[2017-06-01 13:48:09 +0000] [11421] [INFO] Booting worker with pid: 11421
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
[2017-06-01 18:45:49 +0000] [2971] [INFO] Handling signal: term
[2017-06-01 18:45:49 +0000] [9028] [INFO] Worker exiting (pid: 9028)
[2017-06-01 18:45:49 +0000] [389] [INFO] Worker exiting (pid: 389)
[2017-06-01 18:45:49 +0000] [3886] [INFO] Worker exiting (pid: 3886)
[2017-06-01 18:45:49 +0000] [4860] [INFO] Worker exiting (pid: 4860)
[2017-06-01 18:45:49 +0000] [390] [INFO] Worker exiting (pid: 390)
[2017-06-01 18:45:49 +0000] [11336] [INFO] Worker exiting (pid: 11336)
[2017-06-01 18:45:49 +0000] [11421] [INFO] Worker exiting (pid: 11421)
[2017-06-01 18:45:49 +0000] [25633] [INFO] Worker exiting (pid: 25633)
[2017-06-01 18:45:49 +0000] [2971] [INFO] Shutting down: Master
[2017-06-01 18:45:57 +0000] [17126] [INFO] Starting gunicorn 19.3.0
[2017-06-01 18:45:57 +0000] [17126] [INFO] Listening at: http://127.0.0.1:8000 (17126)
[2017-06-01 18:45:57 +0000] [17126] [INFO] Using worker: sync
[2017-06-01 18:45:57 +0000] [17134] [INFO] Booting worker with pid: 17134
[2017-06-01 18:45:57 +0000] [17140] [INFO] Booting worker with pid: 17140
[2017-06-01 18:45:57 +0000] [17141] [INFO] Booting worker with pid: 17141
[2017-06-01 18:45:57 +0000] [17142] [INFO] Booting worker with pid: 17142
[2017-06-01 18:45:57 +0000] [17143] [INFO] Booting worker with pid: 17143
[2017-06-01 18:45:57 +0000] [17144] [INFO] Booting worker with pid: 17144
[2017-06-01 18:45:57 +0000] [17145] [INFO] Booting worker with pid: 17145
[2017-06-01 18:45:57 +0000] [17146] [INFO] Booting worker with pid: 17146
[2017-06-01 19:37:57 +0000] [17143] [INFO] Worker exiting (pid: 17143)
[2017-06-01 19:37:57 +0000] [17441] [INFO] Booting worker with pid: 17441
[2017-06-01 19:37:57 +0000] [17144] [INFO] Worker exiting (pid: 17144)
[2017-06-01 19:37:57 +0000] [17442] [INFO] Booting worker with pid: 17442
[2017-06-01 19:37:57 +0000] [17134] [INFO] Worker exiting (pid: 17134)
[2017-06-01 19:37:57 +0000] [17443] [INFO] Booting worker with pid: 17443
[2017-06-01 19:38:07 +0000] [17442] [INFO] Worker exiting (pid: 17442)
[2017-06-01 19:38:07 +0000] [17449] [INFO] Booting worker with pid: 17449
[2017-06-01 19:38:31 +0000] [17126] [INFO] Handling signal: term
[2017-06-01 19:38:31 +0000] [17441] [INFO] Worker exiting (pid: 17441)
[2017-06-01 19:38:31 +0000] [17443] [INFO] Worker exiting (pid: 17443)
[2017-06-01 19:38:31 +0000] [17146] [INFO] Worker exiting (pid: 17146)
[2017-06-01 19:38:31 +0000] [17145] [INFO] Worker exiting (pid: 17145)
[2017-06-01 19:38:31 +0000] [17142] [INFO] Worker exiting (pid: 17142)
[2017-06-01 19:38:31 +0000] [17140] [INFO] Worker exiting (pid: 17140)
[2017-06-01 19:38:31 +0000] [17141] [INFO] Worker exiting (pid: 17141)
[2017-06-01 19:38:31 +0000] [17449] [INFO] Worker exiting (pid: 17449)
[2017-06-01 19:38:31 +0000] [17126] [INFO] Shutting down: Master
[2017-06-01 19:55:07 +0000] [2885] [INFO] Starting gunicorn 19.3.0
[2017-06-01 19:55:07 +0000] [2885] [INFO] Listening at: http://127.0.0.1:8000 (2885)
[2017-06-01 19:55:07 +0000] [2885] [INFO] Using worker: sync
[2017-06-01 19:55:07 +0000] [3547] [INFO] Booting worker with pid: 3547
[2017-06-01 19:55:07 +0000] [3548] [INFO] Booting worker with pid: 3548
[2017-06-01 19:55:07 +0000] [3550] [INFO] Booting worker with pid: 3550
[2017-06-01 19:55:07 +0000] [3551] [INFO] Booting worker with pid: 3551
[2017-06-01 19:55:07 +0000] [3554] [INFO] Booting worker with pid: 3554
[2017-06-01 19:55:07 +0000] [3555] [INFO] Booting worker with pid: 3555
[2017-06-01 19:55:07 +0000] [3556] [INFO] Booting worker with pid: 3556
[2017-06-01 19:55:07 +0000] [3559] [INFO] Booting worker with pid: 3559

Andrew_Cook · June 2, 2017, 1:49pm

I am still hoping to find a fix and am willing to pay someone for their time if they can help get this resolved right away. My business needs this software to operate and we’re over 2 hours into our day with no access yet. Any help at all is very much appreciated!!

Julian_Robbins · June 2, 2017, 2:13pm

I appreciate this won’t help you know but I always find it helpful to host on a VPS where automatic snapshots are made of the whole server that you can roll back to if you ever need to in the case of a bad update

Julian_Robbins · June 2, 2017, 2:20pm

Does running supervisorctl restart result in all the processes being restarted cleanly ?

The log you show doesn’t show any issues.

It sounds like ERPNEXT is working fine but it may be the proxying with Nginx that is wrong somewhere.

May be worth checking the nginx logs

Ie tail - f /var/log/nginx

I had something happen like this a little while back.

Julian_Robbins · June 2, 2017, 2:22pm

Where are you based? I’m normally at work but off work today. I’m in the UK. To help I would need to ssh in to look at logs. I’m not an ERPNEXT expert but been running servers for a business for 10 years or more.

Andrew_Cook · June 2, 2017, 2:24pm

I am in the US. I very much appreciate the offer and can give you remote access if you are available.

Andrew · June 2, 2017, 2:31pm

OP here under my other account. I reached the max number of posts on my other account since I had just opened that one, then i remembered I already had this one set up from a while ago.

Oh, and supervisor stops and starts cleanly. I will check the nginx logs now and post the results.

Will definitely have to come up with a better backup solution moving forward, it seems like I have downtime almost every time I update lately for one reason or another.

Julian_Robbins · June 2, 2017, 2:33pm

Hi Andrew

I’m out with my daughter this afternoon but I have my Chromebook handy if I can help.

It may be best if we take this offline . I cant promise I can fix it for you but I will try and help you

Julian_Robbins · June 2, 2017, 3:18pm

How’s it going getting anywhere now ?

Andrew · June 2, 2017, 3:22pm

Still working on it… Just purged nginx and reinstalled, reconfigured. Only thing in the error log was a message from last night about a duplicate upstream issue which I had fixed, so I’m not sure where to go from here

Julian_Robbins · June 2, 2017, 3:30pm

I can’t think of much else at the moment. I’m out raspberry picking so I can’t help login I’m afraid now.

Just have a close look thru your nginx config file it might be something there