My ERPNext server was running just fine all day, then suddenly very slow at the end of the day, then ERPNext stopped working altogether. I’m running in production mode on an Ubuntu 14.04 server. I ran some quick diagnostics (netstat, status checks on nginx & supervisor) but everything looked like it was working properly. I got on the forums here and found some previous posts with similar issues, so I first tried running a bench update, and I had issues with that telling me that I had unsaved changes to my version of ERPNext, so I wound up doing bench update --reset instead and that seemed to go through without any errors. After the update completed, I was still unable to connect to ERPNext, and now I only get a page saying:
Sorry!
We will be back soon.
Don’t panic. It’s not you, it’s us.
Most likely, our engineers are updating the code, and it should take a minute for the new code to load into memory.
Try refreshing after a minute or two.
After researching on here, I tried some of the solutions offered in other posts, but nothing has worked for me. I checked to make sure supervisor was running, and it is. I checked nginx, that is also running. I tried restarting both, but that didn’t work either. I then found a post that suggested to reconfigure by running a bench setup production command. When I did that, I was presented with messages saying that I was going to overwrite the existing nginx and supervisor conf files, and I proceeded. Now I don’t see my site folder anymore in the frappe-bench folder, and I still can’t access anything. I really need help to get back up and running in the next few hours if at all possible. Can anyone out there help me out with this?
The first command seemed to work without issue, here’s what it returned:
bench --site xxxxxxx.xxxxxxx.com migrate
Migrating xxxxxxx.xxxxxxx.com
Updating DocTypes for frappe : [========================================]
Updating DocTypes for erpnext : [========================================]
Syncing help database…
The second command returned the following:
sudo supervisorctl restart all
frappe-bench-frappe-schedule: stopped
frappe-bench-frappe-default-worker-0: stopped
frappe-bench-frappe-long-worker-0: stopped
frappe-bench-frappe-short-worker-0: stopped
frappe-bench-frappe-web: stopped
frappe-bench-node-socketio: stopped
frappe-bench-redis-queue: stopped
frappe-bench-redis-cache: stopped
frappe-bench-redis-socketio: stopped
frappe-bench-frappe-schedule: started
frappe-bench-frappe-default-worker-0: ERROR (abnormal termination)
frappe-bench-frappe-long-worker-0: started
frappe-bench-frappe-short-worker-0: started
frappe-bench-frappe-web: started
frappe-bench-node-socketio: started
frappe-bench-redis-queue: started
frappe-bench-redis-cache: started
frappe-bench-redis-socketio: started
and the third command returned the following:
service nginx restart
nginx stop/waiting
nginx stop/pre-start, process 7152
Now I don’t see the “Sorry! We will be back soon!” message, but I still can not connect. Now it just says that the page can not be reached. Can you think of what may cause that?
A little more info… Since I am still stuck without access to ERPNext, I looked around a bit more and decided to try running nginx -t which told me there was a duplicate upstream issue. I found the issue and resolved it by deleting the conflicting files and then reconfiguring nginx. Now the tests come back successful, however when I reloaded nginx, I still did not have access to ERPNext. I decided to once again try reconfiguring by running “bench setup production” and this is what was returned:
bench setup production fubar
supervisor.conf already exists and this will overwrite it. Do you want to continue? [y/N]: y
nginx.conf already exists and this will overwrite it. Do you want to continue? [y/N]: y
INFO:bench.utils:sudo /usr/bin/supervisorctl reread
frappe-bench-redis: changed
frappe-bench-web: changed
frappe-bench-workers: changed
INFO:bench.utils:sudo /usr/bin/supervisorctl update
frappe-bench-redis: stopped
frappe-bench-redis: updated process group
frappe-bench-web: stopped
error: <class ‘xmlrpclib.Fault’>, <Fault 91: ‘STILL_RUNNING’>: file: /usr/lib/python2.7/xmlrpclib.py line: 794
INFO:bench.utils:sudo /usr/bin/supervisorctl reload
Restarted supervisord
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
INFO:bench.utils:sudo service nginx reload
It is still not working though, it just tells me that there is a problem loading the page.
If anyone can help me get back up and running I would greatly appreciate it
Here is what Saurabh told me to do, and the results:
Hi Andrew,
Please check by executing,
bench --site site-name migrate
bench --site site-name set-maintenance-mode off
bench --site site-name scheduler resume
bench setup supervisor
sudo supervisorctl restart all
bench setup nginx
sudo server nginx restart
If won’t work then, please post the output of frappe-bench/logs/web.error.log.
I ran all of the commands successfully, but it still does not work. I checked the web error log, and it is extremely long and repetitive, but here is the copy from around the time that the issues started coming up:
[2017-06-01 08:47:55 +0000] [32153] [INFO] Worker exiting (pid: 32153)
[2017-06-01 08:47:55 +0000] [9028] [INFO] Booting worker with pid: 9028
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
[2017-06-01 13:33:32 +0000] [1311] [INFO] Worker exiting (pid: 1311)
[2017-06-01 13:33:33 +0000] [11336] [INFO] Booting worker with pid: 11336
[2017-06-01 13:48:09 +0000] [16236] [INFO] Worker exiting (pid: 16236)
[2017-06-01 13:48:09 +0000] [11421] [INFO] Booting worker with pid: 11421
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
[2017-06-01 18:45:49 +0000] [2971] [INFO] Handling signal: term
[2017-06-01 18:45:49 +0000] [9028] [INFO] Worker exiting (pid: 9028)
[2017-06-01 18:45:49 +0000] [389] [INFO] Worker exiting (pid: 389)
[2017-06-01 18:45:49 +0000] [3886] [INFO] Worker exiting (pid: 3886)
[2017-06-01 18:45:49 +0000] [4860] [INFO] Worker exiting (pid: 4860)
[2017-06-01 18:45:49 +0000] [390] [INFO] Worker exiting (pid: 390)
[2017-06-01 18:45:49 +0000] [11336] [INFO] Worker exiting (pid: 11336)
[2017-06-01 18:45:49 +0000] [11421] [INFO] Worker exiting (pid: 11421)
[2017-06-01 18:45:49 +0000] [25633] [INFO] Worker exiting (pid: 25633)
[2017-06-01 18:45:49 +0000] [2971] [INFO] Shutting down: Master
[2017-06-01 18:45:57 +0000] [17126] [INFO] Starting gunicorn 19.3.0
[2017-06-01 18:45:57 +0000] [17126] [INFO] Listening at: http://127.0.0.1:8000 (17126)
[2017-06-01 18:45:57 +0000] [17126] [INFO] Using worker: sync
[2017-06-01 18:45:57 +0000] [17134] [INFO] Booting worker with pid: 17134
[2017-06-01 18:45:57 +0000] [17140] [INFO] Booting worker with pid: 17140
[2017-06-01 18:45:57 +0000] [17141] [INFO] Booting worker with pid: 17141
[2017-06-01 18:45:57 +0000] [17142] [INFO] Booting worker with pid: 17142
[2017-06-01 18:45:57 +0000] [17143] [INFO] Booting worker with pid: 17143
[2017-06-01 18:45:57 +0000] [17144] [INFO] Booting worker with pid: 17144
[2017-06-01 18:45:57 +0000] [17145] [INFO] Booting worker with pid: 17145
[2017-06-01 18:45:57 +0000] [17146] [INFO] Booting worker with pid: 17146
[2017-06-01 19:37:57 +0000] [17143] [INFO] Worker exiting (pid: 17143)
[2017-06-01 19:37:57 +0000] [17441] [INFO] Booting worker with pid: 17441
[2017-06-01 19:37:57 +0000] [17144] [INFO] Worker exiting (pid: 17144)
[2017-06-01 19:37:57 +0000] [17442] [INFO] Booting worker with pid: 17442
[2017-06-01 19:37:57 +0000] [17134] [INFO] Worker exiting (pid: 17134)
[2017-06-01 19:37:57 +0000] [17443] [INFO] Booting worker with pid: 17443
[2017-06-01 19:38:07 +0000] [17442] [INFO] Worker exiting (pid: 17442)
[2017-06-01 19:38:07 +0000] [17449] [INFO] Booting worker with pid: 17449
[2017-06-01 19:38:31 +0000] [17126] [INFO] Handling signal: term
[2017-06-01 19:38:31 +0000] [17441] [INFO] Worker exiting (pid: 17441)
[2017-06-01 19:38:31 +0000] [17443] [INFO] Worker exiting (pid: 17443)
[2017-06-01 19:38:31 +0000] [17146] [INFO] Worker exiting (pid: 17146)
[2017-06-01 19:38:31 +0000] [17145] [INFO] Worker exiting (pid: 17145)
[2017-06-01 19:38:31 +0000] [17142] [INFO] Worker exiting (pid: 17142)
[2017-06-01 19:38:31 +0000] [17140] [INFO] Worker exiting (pid: 17140)
[2017-06-01 19:38:31 +0000] [17141] [INFO] Worker exiting (pid: 17141)
[2017-06-01 19:38:31 +0000] [17449] [INFO] Worker exiting (pid: 17449)
[2017-06-01 19:38:31 +0000] [17126] [INFO] Shutting down: Master
[2017-06-01 19:55:07 +0000] [2885] [INFO] Starting gunicorn 19.3.0
[2017-06-01 19:55:07 +0000] [2885] [INFO] Listening at: http://127.0.0.1:8000 (2885)
[2017-06-01 19:55:07 +0000] [2885] [INFO] Using worker: sync
[2017-06-01 19:55:07 +0000] [3547] [INFO] Booting worker with pid: 3547
[2017-06-01 19:55:07 +0000] [3548] [INFO] Booting worker with pid: 3548
[2017-06-01 19:55:07 +0000] [3550] [INFO] Booting worker with pid: 3550
[2017-06-01 19:55:07 +0000] [3551] [INFO] Booting worker with pid: 3551
[2017-06-01 19:55:07 +0000] [3554] [INFO] Booting worker with pid: 3554
[2017-06-01 19:55:07 +0000] [3555] [INFO] Booting worker with pid: 3555
[2017-06-01 19:55:07 +0000] [3556] [INFO] Booting worker with pid: 3556
[2017-06-01 19:55:07 +0000] [3559] [INFO] Booting worker with pid: 3559
I am still hoping to find a fix and am willing to pay someone for their time if they can help get this resolved right away. My business needs this software to operate and we’re over 2 hours into our day with no access yet. Any help at all is very much appreciated!!
I appreciate this won’t help you know but I always find it helpful to host on a VPS where automatic snapshots are made of the whole server that you can roll back to if you ever need to in the case of a bad update
Where are you based? I’m normally at work but off work today. I’m in the UK. To help I would need to ssh in to look at logs. I’m not an ERPNEXT expert but been running servers for a business for 10 years or more.
OP here under my other account. I reached the max number of posts on my other account since I had just opened that one, then i remembered I already had this one set up from a while ago.
Oh, and supervisor stops and starts cleanly. I will check the nginx logs now and post the results.
Will definitely have to come up with a better backup solution moving forward, it seems like I have downtime almost every time I update lately for one reason or another.
Still working on it… Just purged nginx and reinstalled, reconfigured. Only thing in the error log was a message from last night about a duplicate upstream issue which I had fixed, so I’m not sure where to go from here