Deadlock issue on Frappe v14

Hello,

I have encountered issues with Deadlocks and Lock Wait Timeout Exceeded in ERPNext.

We have multiple systems operating in the production environment, such as POS, Sales, and more. We are building an intermediate system to synchronize data to ERPNext, with the goal of eventually transitioning the entire system to utilize ERPNext. However, our system is complex and consists of multiple components, making it necessary to transition one part at a time. Thus, we have a service in place to synchronize data from our existing system to ERPNext, which runs at 3 am every day.

We are currently facing problems related to Deadlocks and Lock Timeout errors. Our synchronization service retrieves data from the running system and creates documents in ERPNext. Specifically, we are synchronizing data related to Stock Entry. Sales data from our POS, Sales, and other systems are synchronized to ERPNext. Everything works fine until this synchronization service is executed in the production environment. Each day, we have around 2-3 thousand orders being processed. When the synchronization service processes around 200-300 documents, Deadlock or Lock Timeout errors start occurring. We have tried restarting the bench, but the same errors persist upon reattempting the process.

Currently, we are using the Rest API to create documents. We have attempted to use the api/resource with the frappe.client.insert_many command to create multiple documents, and we’ve also tried the api/resource/Stock%20Entry to create individual documents, but the same errors occur.

What can be done to improve and resolve this situation? Thank you for taking the time to assist me. It would be greatly appreciated if we could address and rectify this issue.

This is our Frappe ERPNext versions:

#### Installed Apps

**ERPNext:** v14.36.0 (version-14)
**Frappe Framework:** v14.46.0 (version-14)
**Frappe Helper:** v0.3.3 (master)
**Frappe HR:** v14.10.2 (version-14)
**Restaurant:** v1.3.7 (master)

if possible, try to customize form, Stock Entry, uncheck Track Changes.

also please override the below method

refer to this app https://gitee.com/yuzelin/zelin_am/blob/master/zelin_am/overrides.py

class CustomStockEntry(StockEntry):
def validate_qty(self):
pass

here the similar case https://gitee.com/yuzelin/erpnext-chinese-docs/issues/I6SCJQ, unfortunately in Chinese, but the code and setting changes in English, maybe you can use online translation tools.

1 Like

Thank you so much @szufisher

I see that you are skip validate qty. Can you please explain this solution? How can It works?

Thank you again.

the validate qty function does nothing useful, so in the app, just simply override the function with a pass statement.

1 Like

Thank you so much. Let’s me try it

On version 13, I have experienced a -ton- of problems with SQL Locks, Deadlocks, Timeouts. I’ve probably spent a few hundred hours working on these topics.

This is extremely difficult work. I will try to offer some advice, but it’s hard to explain without writing a lot more.

Problem #1 - Missing/Bad Indexes

ERPNext is not fully optimized for performance. Sometimes, SQL indexes are missing that should exist. Other times, the Python code is doing inefficient or incorrect logic.

When Locks/Deadlocks/Timeouts happen, try to capture the related SQL queries.

  1. Configure your MariaDB server to use its Slow Query Log.
  2. Try to see what’s happening in realtime, using SQL statements like SHOW FULL PROCESSLIST;

If you can find the queries causing the problems, you can take action. Such as optimizing indexes, or improving the Python code. The SQL command 'EXPLAIN' can help you determine if a query needs index optimization.

Problem #2 - Concurrency can produce Deadlocks

Many ERPNext tasks involve a lot of SQL tables. For example, Submitting a 'Delivery Note or ā€˜Sales Invoice’ can interact with SQL tables (DocTypes) like:

  • Item
  • Customer
  • Sales Order and Sales Order Item
  • Delivery Note and Delivery Note Item
  • Sales Invoice and Sales Invoice Item
  • Sales Taxes and Charges
  • GL Entry
  • Stock Ledger Entry (this table is especially challenging)

If you are Submitting these documents sequentially? Normally that is okay. But what happens if you try to post multiple documents concurrently? (for example, by using background Workers)

What can sometimes happen is a Deadlock. Two or more processes/threads trying to CRUD the same SQL records, at the same time. Neither can complete until the other releases locks. So they deadlock.

Solving this is extremely challenging. It took me a long while to gain control over this:

  1. In many cases I tried to improve the performance by editing Python, SQL, or improving indexes. The faster the process, the less time SQL objects are locked.
  2. If my task did not need a full rollback? Then I would frappe.db.commit() where possible, to remove the locks earlier.
  3. I introduced my own ā€œSafety Lockingā€ mechanism:
    • First, I changed the MariaDB transaction isolation level to ā€œREAD COMMITTEDā€
    • Next, at the beginning of each risky Python function, I manually lock all the records I need.

By manually locking at the beginning, I force all other concurrent transactions to ā€œwaitā€ for locks to release. For example, Invoice #2 may have to wait a few seconds until Invoice #1 releases some locks.

Yes, this reduces concurrency (not ideal). But helps prevent deadlocks (really great). Ever since I introduced this concept into my code, Deadlocks are very, very rare in my environments.

Problem #3 - MariaDB doesn’t handle Locks gracefully

When MariaDB encounters long-running queries, or locks, the database tables can easily get ā€œstuckā€ in 2 different states:

  • ā€œCreating sort indexā€
  • ā€œSending dataā€

See screenshot below (from SQL command 'SHOW FULL PROCESSLIST')

Sometimes, MariaDB will remain ā€œstuckā€ in these states literally forever. The queries will never time out. Even if you use SQL to ā€œKILLā€ them, they will not disappear. Your database tables freeze. You cannot interact with them anymore. You must manually end the MariaDB process at the OS level.

It’s awful when this happens. :face_exhaling: I’ve spend a ton of time reading about these 2 states. So far, I’ve found no way of teaching MariaDB to gracefully end them. Nothing I’ve tried makes MariaDB unfreeze.

After dealing with this for 2+ years, my only plan is to get rid of MariaDB. And start using PostgreSQL instead. I’ve worked with other database software for over a decade (e.g. MS SQL). I’ve never had as many problems then, as I do today with MariaDB.

Problem #4 - ERPNext Design Decisions

Some of the ERPNext designs and architecture are adding to the problem. Things like:

  • The way ā€˜Stock Ledger Entry’ and ā€˜GL Entry’ were designed.
  • The number of hits that SQL table tabSeries experiencing to issue new Naming Series values.
  • Child and Parent table architecture.
  • A few bugs in Frappe Framework around Document deletions.
  • Certain stock and ledger code doing a lot of Looping.
  • SQL queries executing without the User’s consent/intent (many Reports)

Finally, sometimes MariaDB also just chooses the wrong index, such as when users perform List Page filters.

There are no quick fixes to the above. To fully solve these problems, you’d have to fork your ERPNext, and begin maintaining your own version with your own custom solutions.

Summary

Hopefully some of this information was helpful. If nothing else, know this: You Are Not Alone. These problems are happening for other people too. The larger your database? The more transactions you try to process each day? The more-likely these problems will appear.

7 Likes

Thank you very much, @brian_pond.

Seems like the issue with MariaDB is quite significant here. I have tried integrating my site with PostgresDB, but there might be some apps that haven’t been updated for use with Postgres. Sometimes errors still occur with queries using Postgres.

Perhaps we need to wait for major changes from Frappe and ERPNext, as the interdependencies here are quite substantial. I’ve only used ERPNext and HRMS with Frappe. I really hope ERPNext will update soon to fix these issues. I see many others facing similar problems, and I’m trying to find the simplest and quickest way to resolve it. We’re trying to transition from Odoo to ERPNext because ERPNext’s business process flow aligns well with our team’s needs.

Are you currently using ERPNext for your system, @brian_pond ? Have you had to make many modifications to the Python code of Frappe or ERPNext?

The best advice I can give is this:

  • Do your best to try to solve these problems yourself.
  • Make friends with other developers, and work together to solve them.
  • Hire some help.

I wouldn’t suggest waiting and hoping something changes. You may wait a long time, and ultimately end up disappointed. I’ve been working with Frappe framework and ERPNext for over 6 years. Yes, in that time some things have changed. But others things have never changed. What we want, versus what the official maintainers want, are often very different things.

Hope isn’t a reliable strategy. If you really want something done, you’ll need to work on it.

Technically, I am using ERPNext for my own, personal accounting. But that isn’t where most of my work happens. My day job is a consultant & developer who helps other companies with their ERPNext.

I have made many modifications to the Python code of both Frappe and ERPNext. I’ve also written my own Apps, and created some Apps for my clients. The past few years, probably 85%+ of my job is software development.

If you’re looking for more info/inspiration…

7 Likes

Thank you very much, @brian_pond ,

There’s been a lot of helpful information for me. I will search for ways to resolve the issue I’m facing and reach out to someone for assistance if possible. It would be great if you know anyone who has encountered the same problem and needs like mine and has managed to fix this issue. Please let me know. I will get in touch with them to learn more.

Thank you very much once again.

Haha, me too, I also developed some customized apps which mostly used by local Chinese users, these APPs are available here https://gitee.com/yuzelin.

3 Likes