My CPU load is constantly at 100%

Hi,
I have observed that my CPU load is at 100% constantly, also system become slow little bit, any help/advice is welcome.

Note: I have functional background, please give detailed step by step, thanks a lot.

htop output:

Instance details:
Production,
frappe@vps632278:~/frappe-bench$ bench version
erpnext 11.1.21
foundation 0.0.1
frappe 11.1.22

Rgds
Nofal

The error trace shows a crontab/cronjob process is the culprit. What is running on that? (I think) You can check with (as user frappe)

crontab -l

Hi,
Thanks for the update.

This what I get running crontab -l

image

How I can read it and any valuable information you can get it that helps?

rgds
Nofal

None of those look like they belong there. I would comment them all out
Add # at the start of each line once you’ve done this…

crontab -e

I have added # for all lines, yes done
what is next step?

I have killed process of cron, then I did reboot

Fail2ban shows up and consume 100% of CPU load, after 10 min disappeared and server comes back to normal.

Sorry I cannot explain reasons, I do not have proper skills for the time being

Rgds
Nofal

It may not be a cron task started by the default frappe user. It may be one by another user or the root user. Try logging in as root and have a look again at the cron tasks it may have running.

Normally when I see this, it turns out to be ‘fail2ban’ running by some user and I have to track that down, but yours seems different.

You can also use the ‘top’ command to see all tasks taking up CPU time.

BKM

In my case, the scheduler was the culprit.

This cron process resubmitted again even if I login using root user it’s there and 100% load of the CPU is back again.

Shall I keep killing this process manually?

Can you show the output of

crontab -l

again. I don’t understand why you have these items in the cron?

as frappe user
crontab -l output

image

I don’t think those are correct - they are not part of any standard install that I have seen. I would remove them with

crontab -r
sudo systemctl restart cron

You might want to check your memory utilization, it’s possible the server is swapping. Sometimes kswap daemon may use a lot of CPU if your server swaps heavily.

I have commented existing lines under crontab

now fail2ban comes into the picture

Your memory looks okay actually, no swap too so forget what I said last time.

If you are using nginx, try to look at the access log. I got a feeling if fail2ban is going nuts, you are probably getting attacked…

Have you checked that you don’t have a “bad rule” in fail2ban?

I am not understanding lot of words since I am not a system administrator so if you give steps it will be easy for me to understand what you are saying, thanks a lot for your help.

how to check and where ? I will give you a screenshot …

how to check and where ? I will give you a screenshot …

thanks

Should be under /var/log/nginx

image

Do tail -f access.log
if you got rapid hits all the time while there are no active users…then you’ve got an issue. Especially if the hits are mostly resulting in failed response.

I did above command

my public IP is 102.156.96.20

what information I can read from above screenshot, thanks

If you tail it and you don’t get like a whole bunch of hits per second then it’s probably nothing. Looks like it’s mostly you anyway.

Although… you’ve got xmlrpc.php too. Did you host this with wordpress as well? In this case you are sharing the load with wordpress. It’s possible that the one wasting resources is your wordpress as well.

Not sure what else you have installed. I would suggest installing ERPNext on a dedicated instance to isolate the performance problem. Better yet if you can separate the application and the database instance.