Sorry! We will be back soon! after bench update, no previously posted solutions have worked

Hello Andrew

Have you checked the erpnext log itself in /logs ?

Do you know which log I should take a look at? Here are the contents of the /logs folder:

backup.log frappe.log.6 web.error.log.3
bench.log frappe.log.7 web.error.log.4
frappe.log frappe.log.8 web.log
frappe.log.1 frappe.log.9 workerbeat.error.log
frappe.log.10 node-socketio.error.log workerbeat.log
frappe.log.11 node-socketio.log worker.error.log
frappe.log.12 redis-async-broker.error.log worker.error.log.1
frappe.log.13 redis-async-broker.log worker.error.log.10
frappe.log.14 redis-cache.error.log worker.error.log.2
frappe.log.15 redis-cache.log worker.error.log.3
frappe.log.16 redis-queue.error.log worker.error.log.4
frappe.log.17 redis-queue.log worker.error.log.5
frappe.log.18 redis-socketio.error.log worker.error.log.6
frappe.log.19 redis-socketio.log worker.error.log.7
frappe.log.2 schedule.error.log worker.error.log.8
frappe.log.20 schedule.log worker.error.log.9
frappe.log.3 web.error.log worker.log
frappe.log.4 web.error.log.1
frappe.log.5 web.error.log.2

Here is what Saurabh told me to do, and the results:

Hi Andrew,

Please check by executing,

  1. bench --site site-name migrate
  2. bench --site site-name set-maintenance-mode off
  3. bench --site site-name scheduler resume
  4. bench setup supervisor
  5. sudo supervisorctl restart all
  6. bench setup nginx
  7. sudo server nginx restart

If won’t work then, please post the output of frappe-bench/logs/web.error.log.

I ran all of the commands successfully, but it still does not work. I checked the web error log, and it is extremely long and repetitive, but here is the copy from around the time that the issues started coming up:

[2017-06-01 08:47:55 +0000] [32153] [INFO] Worker exiting (pid: 32153)
[2017-06-01 08:47:55 +0000] [9028] [INFO] Booting worker with pid: 9028
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
[2017-06-01 13:33:32 +0000] [1311] [INFO] Worker exiting (pid: 1311)
[2017-06-01 13:33:33 +0000] [11336] [INFO] Booting worker with pid: 11336
[2017-06-01 13:48:09 +0000] [16236] [INFO] Worker exiting (pid: 16236)
[2017-06-01 13:48:09 +0000] [11421] [INFO] Booting worker with pid: 11421
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/home/fubar/frappe-bench/env/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
[2017-06-01 18:45:49 +0000] [2971] [INFO] Handling signal: term
[2017-06-01 18:45:49 +0000] [9028] [INFO] Worker exiting (pid: 9028)
[2017-06-01 18:45:49 +0000] [389] [INFO] Worker exiting (pid: 389)
[2017-06-01 18:45:49 +0000] [3886] [INFO] Worker exiting (pid: 3886)
[2017-06-01 18:45:49 +0000] [4860] [INFO] Worker exiting (pid: 4860)
[2017-06-01 18:45:49 +0000] [390] [INFO] Worker exiting (pid: 390)
[2017-06-01 18:45:49 +0000] [11336] [INFO] Worker exiting (pid: 11336)
[2017-06-01 18:45:49 +0000] [11421] [INFO] Worker exiting (pid: 11421)
[2017-06-01 18:45:49 +0000] [25633] [INFO] Worker exiting (pid: 25633)
[2017-06-01 18:45:49 +0000] [2971] [INFO] Shutting down: Master
[2017-06-01 18:45:57 +0000] [17126] [INFO] Starting gunicorn 19.3.0
[2017-06-01 18:45:57 +0000] [17126] [INFO] Listening at: http://127.0.0.1:8000 (17126)
[2017-06-01 18:45:57 +0000] [17126] [INFO] Using worker: sync
[2017-06-01 18:45:57 +0000] [17134] [INFO] Booting worker with pid: 17134
[2017-06-01 18:45:57 +0000] [17140] [INFO] Booting worker with pid: 17140
[2017-06-01 18:45:57 +0000] [17141] [INFO] Booting worker with pid: 17141
[2017-06-01 18:45:57 +0000] [17142] [INFO] Booting worker with pid: 17142
[2017-06-01 18:45:57 +0000] [17143] [INFO] Booting worker with pid: 17143
[2017-06-01 18:45:57 +0000] [17144] [INFO] Booting worker with pid: 17144
[2017-06-01 18:45:57 +0000] [17145] [INFO] Booting worker with pid: 17145
[2017-06-01 18:45:57 +0000] [17146] [INFO] Booting worker with pid: 17146
[2017-06-01 19:37:57 +0000] [17143] [INFO] Worker exiting (pid: 17143)
[2017-06-01 19:37:57 +0000] [17441] [INFO] Booting worker with pid: 17441
[2017-06-01 19:37:57 +0000] [17144] [INFO] Worker exiting (pid: 17144)
[2017-06-01 19:37:57 +0000] [17442] [INFO] Booting worker with pid: 17442
[2017-06-01 19:37:57 +0000] [17134] [INFO] Worker exiting (pid: 17134)
[2017-06-01 19:37:57 +0000] [17443] [INFO] Booting worker with pid: 17443
[2017-06-01 19:38:07 +0000] [17442] [INFO] Worker exiting (pid: 17442)
[2017-06-01 19:38:07 +0000] [17449] [INFO] Booting worker with pid: 17449
[2017-06-01 19:38:31 +0000] [17126] [INFO] Handling signal: term
[2017-06-01 19:38:31 +0000] [17441] [INFO] Worker exiting (pid: 17441)
[2017-06-01 19:38:31 +0000] [17443] [INFO] Worker exiting (pid: 17443)
[2017-06-01 19:38:31 +0000] [17146] [INFO] Worker exiting (pid: 17146)
[2017-06-01 19:38:31 +0000] [17145] [INFO] Worker exiting (pid: 17145)
[2017-06-01 19:38:31 +0000] [17142] [INFO] Worker exiting (pid: 17142)
[2017-06-01 19:38:31 +0000] [17140] [INFO] Worker exiting (pid: 17140)
[2017-06-01 19:38:31 +0000] [17141] [INFO] Worker exiting (pid: 17141)
[2017-06-01 19:38:31 +0000] [17449] [INFO] Worker exiting (pid: 17449)
[2017-06-01 19:38:31 +0000] [17126] [INFO] Shutting down: Master
[2017-06-01 19:55:07 +0000] [2885] [INFO] Starting gunicorn 19.3.0
[2017-06-01 19:55:07 +0000] [2885] [INFO] Listening at: http://127.0.0.1:8000 (2885)
[2017-06-01 19:55:07 +0000] [2885] [INFO] Using worker: sync
[2017-06-01 19:55:07 +0000] [3547] [INFO] Booting worker with pid: 3547
[2017-06-01 19:55:07 +0000] [3548] [INFO] Booting worker with pid: 3548
[2017-06-01 19:55:07 +0000] [3550] [INFO] Booting worker with pid: 3550
[2017-06-01 19:55:07 +0000] [3551] [INFO] Booting worker with pid: 3551
[2017-06-01 19:55:07 +0000] [3554] [INFO] Booting worker with pid: 3554
[2017-06-01 19:55:07 +0000] [3555] [INFO] Booting worker with pid: 3555
[2017-06-01 19:55:07 +0000] [3556] [INFO] Booting worker with pid: 3556
[2017-06-01 19:55:07 +0000] [3559] [INFO] Booting worker with pid: 3559

3 Likes

I am still hoping to find a fix and am willing to pay someone for their time if they can help get this resolved right away. My business needs this software to operate and we’re over 2 hours into our day with no access yet. Any help at all is very much appreciated!!

I appreciate this won’t help you know but I always find it helpful to host on a VPS where automatic snapshots are made of the whole server that you can roll back to if you ever need to in the case of a bad update

Does running supervisorctl restart result in all the processes being restarted cleanly ?

The log you show doesn’t show any issues.

It sounds like ERPNEXT is working fine but it may be the proxying with Nginx that is wrong somewhere.

May be worth checking the nginx logs

Ie tail - f /var/log/nginx

I had something happen like this a little while back.

Where are you based? I’m normally at work but off work today. I’m in the UK. To help I would need to ssh in to look at logs. I’m not an ERPNEXT expert but been running servers for a business for 10 years or more.

1 Like

I am in the US. I very much appreciate the offer and can give you remote access if you are available.

OP here under my other account. I reached the max number of posts on my other account since I had just opened that one, then i remembered I already had this one set up from a while ago.

Oh, and supervisor stops and starts cleanly. I will check the nginx logs now and post the results.

Will definitely have to come up with a better backup solution moving forward, it seems like I have downtime almost every time I update lately for one reason or another.

Hi Andrew

I’m out with my daughter this afternoon but I have my Chromebook handy if I can help.

It may be best if we take this offline . I cant promise I can fix it for you but I will try and help you

How’s it going getting anywhere now ?

Still working on it… Just purged nginx and reinstalled, reconfigured. Only thing in the error log was a message from last night about a duplicate upstream issue which I had fixed, so I’m not sure where to go from here

I can’t think of much else at the moment. I’m out raspberry picking so I can’t help login I’m afraid now.

Just have a close look thru your nginx config file it might be something there

No problem, I’ll keep working at it. I am going to reach out directly to some folks who have helped me with customizations in the past as well. I appreciate all of your help that you were able to provide, you’re a very kind person! Thanks very much and have fun picking raspberries! :slight_smile:

Recently I also faced such behavior. The problem was that it some how switched the default site to site1.local. But I didnt have such site in my sites folder. I realized that by looking at my log files and in one of them I dont remember exatly which, there was an error saying site site1.local not found. Even doing ‘bench use name of my site’, ‘bench setup nginx’ and restarting nginx and bench didnt help. In currentsite.txt it was showing the name of my site but error still appear. After a few restart of the system it started to work normal. Maybe knowing people can say where bench stores the value of default site and you should look at there for mismatch.

“Don’t panic. It’s not you, it’s us. Most likely, our engineers are updating the code, and it should take a minute for the new code to load into memory.”

I hope you didn’t lose too much sleep!

“running just fine all day, then suddenly very slow at the end of the day, then ERPNext stopped working altogether”

For a forensic audit, a copy of these up to the time of the crash would help
frappe-bench/logs
/var/log/syslog
/var/log/nginx/error.log
/var/log/mysql.log

“downtime almost every time I update lately”

Rather than update a ‘stable’ production instance (assuming that’s the case), better first to test the update on an offline staging instance with a copy of the prod data. Basic smoketests - for eg the instance starts up, with login and read access between the client, server and mysql db - give some peace of mind before you commit the update to prod.

To run a herd of VM-based instances makes life easy to juggle the server code + data as distinct disk image ‘snapshot versions’. When you are stuck with a sick instance, being able to bring up a healthy one offers a config setting and log sanity check, to compare and confirm when all were behaving themselves. Another option is to use github to store the instance but that would require more maintenance imo.

2 Likes

@clarkej

Trust you are doing very well

Please i am having similar issues with my sites

I am get error below when i tr bench restart or update, any idea how to fix this ?

:~/frappe-bench$ bench restart
INFO:bench.utils:sudo supervisorctl restart frappe:
error: <class ‘xmlrpclib.Fault’>, <Fault 10: ‘BAD_NAME: frappe’>: file: /usr/lib/python2.7/xmlrpclib.py line: 794
Traceback (most recent call last):
File “/usr/local/bin/bench”, line 11, in
load_entry_point(‘bench’, ‘console_scripts’, ‘bench’)()
File “/home/ubuntu/bench-repo/bench/cli.py”, line 40, in cli
bench_command()
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 722, in call
return self.main(*args, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 697, in main
rv = self.invoke(ctx)
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 535, in invoke
return callback(*args, **kwargs)
File “/home/ubuntu/bench-repo/bench/commands/utils.py”, line 19, in restart
restart_supervisor_processes(bench_path=‘.’, web_workers=web)
File “/home/ubuntu/bench-repo/bench/utils.py”, line 387, in restart_supervisor_processes
exec_cmd(‘sudo supervisorctl restart {group}’.format(group=group), cwd=bench_path)
File “/home/ubuntu/bench-repo/bench/utils.py”, line 140, in exec_cmd
raise CommandFailedError(cmd)
bench.utils.CommandFailedError: sudo supervisorctl restart frappe:

1 Like

For future reference, when I had a similar problem the solution was to configure Selinux to allow local httpd access.

setsebool -P httpd_can_network_connect 1

Original solution reference: django - (13: Permission denied) while connecting to upstream:[nginx] - Stack Overflow

1 Like

I was getting the Sorry! We will.. issue on ubuntu, after turning on dns_mutitenant. I am running it on a AWS EC2 Ubuntu 18.04. In a nutshell these are the commands I ran from my ssh terminal.

sudo apt update
export LC_ALL=C.UTF-8
sudo apt install python3-minimal build-essential python3-setuptools
sudo wget https://raw.githubusercontent.com/frappe/bench/develop/install.py
sudo python3 install.py --production

sudo bench config dns_multitenant on
sudo bench setup production

sudo bench new-site erp.xyz.com
sudo bench setup nginx
sudo service nginx reload
sudo bench --site erp.xyz.com install-app erpnext
sudo bench --site erp.xyz.com enable-scheduler
curl http://erp.xyz.in:80 // the back soon that's never soon!

/*tried these random suggestions*/
sudo ./env/bin/pip install werkzeug==0.16.1 // this was based on a community suggestion which is old, not sure if it makes any sense at all to have done
sudo bench drop-site site1.local //just that I did not need that hanging there
sudo bench use erp.xyz.com
sudo bench migrate
sudo bench setup nginx
sudo service nginx reload
sudo bench setup add-domain erp.xyz.com // not sure if this was necessary, may be it is
sudo rm -rf /etc/nginx/sites-available/default // again this had some configurations that seemed to be conflicting. 
sudo ls /etc/ngnix/conf.d //ensured you see freppe conf is symlink
service nginx status // all was good
sudo supervisorctl status // all was good
sudo supervisorctl restart all // jlt

Nothing specific I could derive out of ngnix error logs except for logs like

connect() failed (111: Connection refused) while connecting to upstream, client: 1XX.XXX.XXX.XXX, server:

ngnix configuration looked fine. When I was going through various deployment steps I realised I had not created a freppe user with sudo right. Instead I ran…

sudo bench setup production ubuntu and got it working. Best approach to do would be create a sudo user as part of deployment step.

1 Like

for me, it got resolved after running bench migrate (though it failed)