My CPU load is constantly at 100%

Hi,
I have observed that my CPU load is at 100% constantly, also system become slow little bit, any help/advice is welcome.

Note: I have functional background, please give detailed step by step, thanks a lot.

htop output:

Instance details:
Production,
frappe@vps632278:~/frappe-bench$ bench version
erpnext 11.1.21
foundation 0.0.1
frappe 11.1.22

Rgds
Nofal

The error trace shows a crontab/cronjob process is the culprit. What is running on that? (I think) You can check with (as user frappe)

crontab -l

Hi,
Thanks for the update.

This what I get running crontab -l

image

How I can read it and any valuable information you can get it that helps?

rgds
Nofal

None of those look like they belong there. I would comment them all out
Add # at the start of each line once youā€™ve done thisā€¦

crontab -e

I have added # for all lines, yes done
what is next step?

I have killed process of cron, then I did reboot

Fail2ban shows up and consume 100% of CPU load, after 10 min disappeared and server comes back to normal.

Sorry I cannot explain reasons, I do not have proper skills for the time being

Rgds
Nofal

It may not be a cron task started by the default frappe user. It may be one by another user or the root user. Try logging in as root and have a look again at the cron tasks it may have running.

Normally when I see this, it turns out to be ā€˜fail2banā€™ running by some user and I have to track that down, but yours seems different.

You can also use the ā€˜topā€™ command to see all tasks taking up CPU time.

BKM

In my case, the scheduler was the culprit.

This cron process resubmitted again even if I login using root user itā€™s there and 100% load of the CPU is back again.

Shall I keep killing this process manually?

Can you show the output of

crontab -l

again. I donā€™t understand why you have these items in the cron?

as frappe user
crontab -l output

image

I donā€™t think those are correct - they are not part of any standard install that I have seen. I would remove them with

crontab -r
sudo systemctl restart cron

You might want to check your memory utilization, itā€™s possible the server is swapping. Sometimes kswap daemon may use a lot of CPU if your server swaps heavily.

I have commented existing lines under crontab

now fail2ban comes into the picture

Your memory looks okay actually, no swap too so forget what I said last time.

If you are using nginx, try to look at the access log. I got a feeling if fail2ban is going nuts, you are probably getting attackedā€¦

Have you checked that you donā€™t have a ā€œbad ruleā€ in fail2ban?

I am not understanding lot of words since I am not a system administrator so if you give steps it will be easy for me to understand what you are saying, thanks a lot for your help.

how to check and where ? I will give you a screenshot ā€¦

how to check and where ? I will give you a screenshot ā€¦

thanks

Should be under /var/log/nginx

image

Do tail -f access.log
if you got rapid hits all the time while there are no active usersā€¦then youā€™ve got an issue. Especially if the hits are mostly resulting in failed response.

I did above command

my public IP is 102.156.96.20

what information I can read from above screenshot, thanks

If you tail it and you donā€™t get like a whole bunch of hits per second then itā€™s probably nothing. Looks like itā€™s mostly you anyway.

Althoughā€¦ youā€™ve got xmlrpc.php too. Did you host this with wordpress as well? In this case you are sharing the load with wordpress. Itā€™s possible that the one wasting resources is your wordpress as well.

Not sure what else you have installed. I would suggest installing ERPNext on a dedicated instance to isolate the performance problem. Better yet if you can separate the application and the database instance.