HTTP Requests to port 80 are sometimes not working

manasan · December 19, 2020, 2:57pm

Background
A few weeks back I was hosting my fresh ERPNext instance on a not-so-famous VPS company. Everything was working fine for some days and then I started to get some random issues. I thought this is because of the Server that I am using so I migrated to AWS EC2, again with a fresh installation with the same specs as before. Yet again everything was working fine for a few days and then the EXACT same issue appeared again.

What is the issue
I am unable to connect to port 80 on some random occasions. I was getting random ERR_CONNECTION_TIMED_OUT in Chrome and when I tried to CURL the IP, I got curl: (7) Failed to connect to <ip_address> port 80: Timed out. When I used VPN, the site worked fine and then the same issue appeared again after some time and If I turned off the VPN, the site again started to work on the ISP’s IP for some brief moment.

Weirdly SSH and PING to the server is working fine always, Only the site is giving me random issues.

What have I tried so far

Disabling the firewall
Double-checked amazon’s EC2 Inbound Rules and security groups
bench update, bench clear-cache, bench migrate
stopping and restarting nginx and supervisorctl
Reboot the server several times
Suspecting the issue might be because of my ISP or browser cache, I tried a different ISP on a different computer and surely enough the same issue of random connection loss started to appear again
Tried changing the IP address of my EC2 instance
Followed this tutorial of common nginx issues and got all results as positive(nginx is showing active status, “nginx -t” is presenting no errors)
Tried turning on the Nginx Debug mode and looked at /var/log/nginx/error.log, even though I wasn’t able to understand it, there was no changes/movements in the error.log while trying to CURL the server
When I changed the port from 80 to 8888 in /etc/nginx/conf.d/frappe-bench.conf the site was accessible from the port 8888 ALWAYS, without any random connection losses
Tried adding the default welcome site of Nginx in the nginx.conf through the port 7777 and surprisingly I was able to access that Nginx welcome page even when the port 80 was not accessible
Tried cloning the exact EC2 instance to the same availability region and weirdly enough, I am not seeing any issues whatsoever on the cloned instance yet

How did I Install ERPNext
I installed ERPNext version-13-beta following this tutorial which uses Easy Install script and the installation completed without any issues. It was installed on Ubuntu 18.04 64bit with 1vCPU, 2GB ram(1GB swap) and 30 GB storage and I didn’t do any files customization in apps or NGINX customizations. All the changes were done through the front-end including some Custom Scripts. In Fact, I was able to get to the same state as of the old VPS server in AWS EC2 by following the same guide and doing a database backup after that.

What to try next
I’ve been struggling with this issue for some days and now I am stumped as what to try next. If you have any suggestions, clues or If you think I should ask this somewhere else, do let me know. If you need to view any logs or configs ask me here or in DM. I’ll keep this Instance alive as long as I find the root cause of this issue or a plausible explanation for this issue.

Logs and Configs
nginx.conf
systemctl status nginx
netstat -plant
htop
top
Wireshark Debug

Paul_Frydlewicz · December 19, 2020, 3:08pm

Do you have the same problem with port 443 (https)?
Do you use a rewrite from port 80 to 443?
It’s very much recommendet to use https instead http.

Just my two cents, maybe it helps.

Paul_Frydlewicz · December 19, 2020, 3:15pm

I just saw you added the nginx.conf. it looks a bit weird to me. The http block has ssl certificates included,I’m not sure if that makes sense. But I’m not a pro in nginx configuration.

However I know that the bench cli can generate nginx confs for all your instances. Maybe you could have a look at “bench setup nginx” command.

Works like a charm for single and multi-tenant systems.

manasan · December 19, 2020, 3:15pm

I am not using port 443(https) as it is not needed for my use case right now.

and I didn’t change anything in the nginx.conf, it was created by the bench cli itself.

mrjurin · December 19, 2020, 4:22pm

Just check on the fail2ban log. I was having issue on my erpnext instance but, finally solve it by configuring the fail2ban.by the way what version are you using?if its the v13 beta ,then disabling the fail2ban is the faster solution. I have version 12 running with fail2ban, just no issue. I discovered that some of the cron job in v13 are not working,due to the module was not existed any more. So may check all the posibilities.

manasan · December 22, 2020, 1:28pm

thanks @mrjurin, Fail2ban was the issue. I’ve removed the jail for nginx_proxy as a temporary solution. I am not sure why fail2ban was triggered even though I was doing pretty normal actions, maybe this is an issue with V13 or the default fail2ban config as you suggest.

Muzzy · December 22, 2020, 3:24pm

This is classic case of Fail2Ban. If you have static IP at remote location then you can whitelist it so that it does not block it.

Check you cron jobs. Something maybe stuck and is repeatedly trying it.