Sorry! We will be back soon! after bench update, no previously posted solutions have worked

No problem, I’ll keep working at it. I am going to reach out directly to some folks who have helped me with customizations in the past as well. I appreciate all of your help that you were able to provide, you’re a very kind person! Thanks very much and have fun picking raspberries! :slight_smile:

Recently I also faced such behavior. The problem was that it some how switched the default site to site1.local. But I didnt have such site in my sites folder. I realized that by looking at my log files and in one of them I dont remember exatly which, there was an error saying site site1.local not found. Even doing ‘bench use name of my site’, ‘bench setup nginx’ and restarting nginx and bench didnt help. In currentsite.txt it was showing the name of my site but error still appear. After a few restart of the system it started to work normal. Maybe knowing people can say where bench stores the value of default site and you should look at there for mismatch.

“Don’t panic. It’s not you, it’s us. Most likely, our engineers are updating the code, and it should take a minute for the new code to load into memory.”

I hope you didn’t lose too much sleep!

“running just fine all day, then suddenly very slow at the end of the day, then ERPNext stopped working altogether”

For a forensic audit, a copy of these up to the time of the crash would help
frappe-bench/logs
/var/log/syslog
/var/log/nginx/error.log
/var/log/mysql.log

“downtime almost every time I update lately”

Rather than update a ‘stable’ production instance (assuming that’s the case), better first to test the update on an offline staging instance with a copy of the prod data. Basic smoketests - for eg the instance starts up, with login and read access between the client, server and mysql db - give some peace of mind before you commit the update to prod.

To run a herd of VM-based instances makes life easy to juggle the server code + data as distinct disk image ‘snapshot versions’. When you are stuck with a sick instance, being able to bring up a healthy one offers a config setting and log sanity check, to compare and confirm when all were behaving themselves. Another option is to use github to store the instance but that would require more maintenance imo.

2 Likes

@clarkej

Trust you are doing very well

Please i am having similar issues with my sites

I am get error below when i tr bench restart or update, any idea how to fix this ?

:~/frappe-bench$ bench restart
INFO:bench.utils:sudo supervisorctl restart frappe:
error: <class ‘xmlrpclib.Fault’>, <Fault 10: ‘BAD_NAME: frappe’>: file: /usr/lib/python2.7/xmlrpclib.py line: 794
Traceback (most recent call last):
File “/usr/local/bin/bench”, line 11, in
load_entry_point(‘bench’, ‘console_scripts’, ‘bench’)()
File “/home/ubuntu/bench-repo/bench/cli.py”, line 40, in cli
bench_command()
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 722, in call
return self.main(*args, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 697, in main
rv = self.invoke(ctx)
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/usr/local/lib/python2.7/dist-packages/click/core.py”, line 535, in invoke
return callback(*args, **kwargs)
File “/home/ubuntu/bench-repo/bench/commands/utils.py”, line 19, in restart
restart_supervisor_processes(bench_path=‘.’, web_workers=web)
File “/home/ubuntu/bench-repo/bench/utils.py”, line 387, in restart_supervisor_processes
exec_cmd(‘sudo supervisorctl restart {group}’.format(group=group), cwd=bench_path)
File “/home/ubuntu/bench-repo/bench/utils.py”, line 140, in exec_cmd
raise CommandFailedError(cmd)
bench.utils.CommandFailedError: sudo supervisorctl restart frappe:

1 Like

For future reference, when I had a similar problem the solution was to configure Selinux to allow local httpd access.

setsebool -P httpd_can_network_connect 1

Original solution reference: django - (13: Permission denied) while connecting to upstream:[nginx] - Stack Overflow

1 Like

I was getting the Sorry! We will.. issue on ubuntu, after turning on dns_mutitenant. I am running it on a AWS EC2 Ubuntu 18.04. In a nutshell these are the commands I ran from my ssh terminal.

sudo apt update
export LC_ALL=C.UTF-8
sudo apt install python3-minimal build-essential python3-setuptools
sudo wget https://raw.githubusercontent.com/frappe/bench/develop/install.py
sudo python3 install.py --production

sudo bench config dns_multitenant on
sudo bench setup production

sudo bench new-site erp.xyz.com
sudo bench setup nginx
sudo service nginx reload
sudo bench --site erp.xyz.com install-app erpnext
sudo bench --site erp.xyz.com enable-scheduler
curl http://erp.xyz.in:80 // the back soon that's never soon!

/*tried these random suggestions*/
sudo ./env/bin/pip install werkzeug==0.16.1 // this was based on a community suggestion which is old, not sure if it makes any sense at all to have done
sudo bench drop-site site1.local //just that I did not need that hanging there
sudo bench use erp.xyz.com
sudo bench migrate
sudo bench setup nginx
sudo service nginx reload
sudo bench setup add-domain erp.xyz.com // not sure if this was necessary, may be it is
sudo rm -rf /etc/nginx/sites-available/default // again this had some configurations that seemed to be conflicting. 
sudo ls /etc/ngnix/conf.d //ensured you see freppe conf is symlink
service nginx status // all was good
sudo supervisorctl status // all was good
sudo supervisorctl restart all // jlt

Nothing specific I could derive out of ngnix error logs except for logs like

connect() failed (111: Connection refused) while connecting to upstream, client: 1XX.XXX.XXX.XXX, server:

ngnix configuration looked fine. When I was going through various deployment steps I realised I had not created a freppe user with sudo right. Instead I ran…

sudo bench setup production ubuntu and got it working. Best approach to do would be create a sudo user as part of deployment step.

1 Like

for me, it got resolved after running bench migrate (though it failed)