Background
A few weeks back I was hosting my fresh ERPNext instance on a not-so-famous VPS company. Everything was working fine for some days and then I started to get some random issues. I thought this is because of the Server that I am using so I migrated to AWS EC2, again with a fresh installation with the same specs as before. Yet again everything was working fine for a few days and then the EXACT same issue appeared again.
What is the issue
I am unable to connect to port 80 on some random occasions. I was getting random ERR_CONNECTION_TIMED_OUT in Chrome and when I tried to CURL the IP, I got curl: (7) Failed to connect to <ip_address> port 80: Timed out. When I used VPN, the site worked fine and then the same issue appeared again after some time and If I turned off the VPN, the site again started to work on the ISP’s IP for some brief moment.
Weirdly SSH and PING to the server is working fine always, Only the site is giving me random issues.
What have I tried so far
- Disabling the firewall
- Double-checked amazon’s EC2 Inbound Rules and security groups
- bench update, bench clear-cache, bench migrate
- stopping and restarting nginx and supervisorctl
- Reboot the server several times
- Suspecting the issue might be because of my ISP or browser cache, I tried a different ISP on a different computer and surely enough the same issue of random connection loss started to appear again
- Tried changing the IP address of my EC2 instance
- Followed this tutorial of common nginx issues and got all results as positive(nginx is showing active status, “nginx -t” is presenting no errors)
- Tried turning on the Nginx Debug mode and looked at /var/log/nginx/error.log, even though I wasn’t able to understand it, there was no changes/movements in the error.log while trying to CURL the server
- When I changed the port from 80 to 8888 in /etc/nginx/conf.d/frappe-bench.conf the site was accessible from the port 8888 ALWAYS, without any random connection losses
- Tried adding the default welcome site of Nginx in the nginx.conf through the port 7777 and surprisingly I was able to access that Nginx welcome page even when the port 80 was not accessible
- Tried cloning the exact EC2 instance to the same availability region and weirdly enough, I am not seeing any issues whatsoever on the cloned instance yet
How did I Install ERPNext
I installed ERPNext version-13-beta following this tutorial which uses Easy Install script and the installation completed without any issues. It was installed on Ubuntu 18.04 64bit with 1vCPU, 2GB ram(1GB swap) and 30 GB storage and I didn’t do any files customization in apps or NGINX customizations. All the changes were done through the front-end including some Custom Scripts. In Fact, I was able to get to the same state as of the old VPS server in AWS EC2 by following the same guide and doing a database backup after that.
What to try next
I’ve been struggling with this issue for some days and now I am stumped as what to try next. If you have any suggestions, clues or If you think I should ask this somewhere else, do let me know. If you need to view any logs or configs ask me here or in DM. I’ll keep this Instance alive as long as I find the root cause of this issue or a plausible explanation for this issue.
Logs and Configs
nginx.conf
systemctl status nginx
netstat -plant
htop
top
Wireshark Debug