Internal Server Error on VPS in random intervals

I’m running ERPnext on Digital Ocean Ubuntu 14.04 x64 droplet and everything works very smoothly and as expected. But on random intervals the server responds with “Internal Server Error” in the browser until I reboot the machine. I did’t find a pattern, yet, and I’m not sure which log files I should check to trace back the root of the cause. Any help is highly appreciated.

Bst, PS

2 Likes

You can check logs in the logs dir of the bench. Web.error.log is what you’re looking for. You can restart the frappe process by running bench update --build (it’ll trigger supervisor restart without pulling/patching)

Web.error.log is just an empty file. But there are plenty of entries like this in worker.error.log:

[2014-10-11 19:04:25,543: ERROR/MainProcess] Task frappe.tasks.enqueue_events_for_site[d7aba8cc-4572-4dde-b5c9-349d9d1f2cc9] raised unexpected: LockTimeoutError(u'./erpnext.mydomain./locks/scheduler.lock',)
Traceback (most recent call last):
  File "/home/frappe/frappe-bench/env/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/frappe/frappe-bench/env/local/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/tasks.py", line 108, in enqueue_events_for_site
    enqueue_events(site)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/scheduler.py", line 25, in enqueue_events
    lock = create_lock('scheduler')
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/file_lock.py", line 16, in create_lock
    if not check_lock(lock_path):
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/file_lock.py", line 25, in check_lock
    raise LockTimeoutError(path)
LockTimeoutError: ./erpnext.mydomain./locks/scheduler.lock

The server crashed again, showing Internal Server Error in the browser. bench update --build didn’t fix it but a server reboot did. I fount the following entries at the end of syslog but I am not sure if they are related.

Oct 12 18:29:38 erp ntpd[2072]: Listening on routing socket on fd #22 for interface updates
Oct 12 18:29:43 erp kernel: [   35.826440] random: nonblocking pool is initialized
Oct 12 18:54:42 erp kernel: [ 1534.568023] perf samples too long (5455 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
Oct 12 19:17:01 erp CRON[2103]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)

In kern.log I found the following:

Oct 12 18:29:30 erp kernel: [   22.058819] perf samples too long (2657 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Oct 12 18:29:43 erp kernel: [   35.826440] random: nonblocking pool is initialized
Oct 12 18:54:42 erp kernel: [ 1534.568023] perf samples too long (5455 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
Oct 12 19:29:24 erp kernel: [ 3616.806495] perf samples too long (10096 > 10000), lowering kernel.perf_event_max_sample_rate to 12500

Are there any other places I should have a look at?

Or any hints how to resolve this issue?

Yes remove the lock files for the scheduler to continue

Yes remove the lock files for the scheduler to continue

Can you disable the backups via cron and see if it solves your problem? During the backup, it might show “Internal Server Error”

How can I remove the files for the scheduler? How can I disable the backup? Also the Internal Server Error seems to stay even after the backup has finished?

remove *.lock in sites/{sitename}/locks

comment the backup cronjob. Login as frappe and run crontab -e

Should not but there’s not much data to work on, can you try disabling backups and see if it stops?

remove *.lock in sites/{sitename}/locks

Did it, but still not sure what this is supposed to do?

comment the backup cronjob. Login as frappe and run crontab -e
Also the Internal Server Error seems to stay even after the backup has finished?

Did it, let’s see if the server keeps running, now.

So after commenting out the cronjob the server’s still running after ~24h. I also tried to run the cron commands from the CLI several times without any problems. I would like to keep the automated backups running, so what to do?

0 */6 * * * cd /home/frappe/frappe-bench/sites &&  /home/frappe/frappe-bench/env/bin/frappe --backup all >> /home/frappe/frappe-bench/logs/backup.log 2>&1

Oh, I see the Internal Server error again.
Even without cronjob. What can it be?

Can you check what is in the mysql processlist when this happens?

Login into mysql, mysql -u root -p and run the query

show processlist;

For further diagnosis, see what happens when you stop celery workers

supervisorctl stop frappe:frappe-worker
supervisorctl stop frappe:frappe-workerbeat

OK. After stopping the celery workers, the server seems to keep running. What can I do to find out why they crashed the server? I deleted the worker.error.log and workerbeat.error.log files and restarted them with supervisorctl stop frappe:frappe-worker and supervisorctl stop frappe:frappe-workerbeat and after a minute the worker.error.log shows the following entries (The server’s still running):

[2014-10-17 17:11:05,557: WARNING/MainProcess] celery@erp ready.
[2014-10-17 17:11:17,985: ERROR/MainProcess] Task frappe.tasks.enqueue_events_for_site[8a178308-ecb6-445c-a6b4-864df9cf937d] raised unexpected: LockTimeoutError(u'./office.maxwel.de/locks/scheduler.lock',)
Traceback (most recent call last):
  File "/home/frappe/frappe-bench/env/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/frappe/frappe-bench/env/local/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/tasks.py", line 108, in enqueue_events_for_site
    enqueue_events(site)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/scheduler.py", line 25, in enqueue_events
    lock = create_lock('scheduler')
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/file_lock.py", line 16, in create_lock
    if not check_lock(lock_path):
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/file_lock.py", line 25, in check_lock
    raise LockTimeoutError(path)
LockTimeoutError: ./erpnext.dev/locks/scheduler.lock
[2014-10-17 17:11:17,995: ERROR/MainProcess] Task frappe.tasks.enqueue_events_for_site[a008bba4-860c-4174-8046-9b7d8a84e38f] raised unexpected: LockTimeoutError(u'./erpnext.maxwel.de/locks/scheduler.lock',)
Traceback (most recent call last):
  File "/home/frappe/frappe-bench/env/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/frappe/frappe-bench/env/local/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/tasks.py", line 108, in enqueue_events_for_site
    enqueue_events(site)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/scheduler.py", line 25, in enqueue_events
    lock = create_lock('scheduler')
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/file_lock.py", line 16, in create_lock
    if not check_lock(lock_path):
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/file_lock.py", line 25, in check_lock
    raise LockTimeoutError(path)
LockTimeoutError: ./erpnext.dev/locks/scheduler.lock

workerbeat.error.log looks ok:

Stale pidfile exists. Removing it.
[2014-10-17 17:11:17,719: INFO/MainProcess] beat: Starting...
[2014-10-17 17:11:17,789: INFO/MainProcess] Scheduler: Sending due task scheduler (frappe.tasks.enqueue_scheduler_events)
[2014-10-17 17:16:17,841: INFO/MainProcess] Scheduler: Sending due task scheduler (frappe.tasks.enqueue_scheduler_events)

Yes, logs are your best bet. Also, if you get the internal server error again, then run



sudo supervisorctl stop frappe


bench frappe --serve


```</p>


<p dir="ltr">and check. In the best case you should see the exception now as frappe --serve runs a development server.</p>

I’m still hustling to trace back the Internal Server Error, which occurred again right now (shortly after restarting the frappe-worker and frappe worker beat). I want to have a reliable deployment on Digital Ocean before I publish my installation instructions for ERPnext on Digital Ocean.

I tried your suggestions (with some modifications):

$ sudo supervisorctl stop frappe:*
$ su frappe
$ cd ~/frappe-bench
$ bench frappe --serve

Running on http://0.0.0.0:8000/
Restarting with reloader

Let’s see what the ssh output says over time…

Btw, this is the last entry of my worker.error.log. Can you explain what it’s about?

[2014-10-18 10:01:51,321: ERROR/MainProcess] Task frappe.tasks.enqueue_events_for_site[1fa9d7c0-3f48-456c-b528-ec6bea0c0f47] raised unexpected: LockTimeoutError(u'./erpnext.dev/locks/scheduler.lock',)
Traceback (most recent call last):
  File "/home/frappe/frappe-bench/env/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/frappe/frappe-bench/env/local/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/tasks.py", line 108, in enqueue_events_for_site
    enqueue_events(site)
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/scheduler.py", line 25, in enqueue_events
    lock = create_lock('scheduler')
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/file_lock.py", line 16, in create_lock
    if not check_lock(lock_path):
  File "/home/frappe/frappe-bench/apps/frappe/frappe/utils/file_lock.py", line 25, in check_lock
    raise LockTimeoutError(path)
LockTimeoutError: ./erpnext.dev/locks/scheduler.lock

About the scheduler.lock, it seems that the scheduler caught an exception and did not exit cleanly. You should remove the lock file

Have you checked mysql error logs?

I think you edited your reply and removed the traceback that said “cannot connect to mysql”

I have a hunch that your mysql is crashing. Please check the logs.

No mysql.err is empty. It must be frappe-worker. When I disable it I don’t experience the error.

must be logging to some other file because it should not be empty.

eg, mine is

141017 12:51:54 [Note] InnoDB: Initializing buffer pool, size = 128.0M
141017 12:51:54 [Note] InnoDB: Completed initialization of buffer pool
141017 12:51:54 [Note] InnoDB: Highest supported file format is Barracuda.
141017 12:51:55 [Note] InnoDB: 128 rollback segment(s) are active.
141017 12:51:55 [Note] InnoDB: Waiting for purge to start
141017 12:51:55 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.19-67.0 started; log sequence number 29765891
141017 12:51:55 [Note] Server socket created on IP: '::'.
141017 12:51:55 [Note] Event Scheduler: Loaded 0 events