We are facing lots of deadlock situation and it is frequently occurring from the stock ledger entry table. Below is a sample of the deadlock log. From the log we can find that close to 2Lakh Records are locked for updating the Stock ledger entry. This is resulting in lots of lock timeouts and deadlock.
We find that except posting_time and is_cancelled fields, rest of the fields are indexed by default.
is there any better way to index the StockLedger Entry table to avoid the deadlock?
18768 lock struct(s), heap size 1990776, 218638 row lock(s), undo log entries 84
MySQL thread id 10146, OS thread handle 140555535853312, query id 437338 localhost 127.0.0.1 _1bd3e0294da19198 Creating sort index
select *, timestamp(posting_date, posting_time) as âtimestampâ from tabStock Ledger Entry
where item_code = âXXXXâ
and coalesce(is_cancelled, âNoâ)=âNoâ
and timestamp(posting_date, posting_time) > timestamp(â2021-12-01â, â17:49:32.999046â) and warehouse = âXXXâ and name!=âef251ab47dâ
order by timestamp(posting_date, posting_time) asc, creation asc
for update
@ankush. We are hesitant as we have made significant customization in V12 and not sure of the effort involved in moving to V13. Is there any possibility to backport the fix?
Can you give us any pointers on what exactly is fixed with reference to the locking issue?
Nope. itâs not âthe fixâ. There were several major refactors and performance fixes first in v13 and then progressively a few more in minor versions. Some of them are breaking changes that will never get backported.
For the record, still a problem in V13 a year later. Several clients are getting intermittent âCreating sort indexâ locks. Sometimes on tables like Stock Ledger Entry. Other times on things like Tag Link.
When the locks happen, they are very devastating.
ERP times out due to SQL lock, wherever that table (DocType) is referenced.
Iâve tried killing the SQL processes that are creating the sort index, finding them with:
SHOW FULL PROCESSLIST;
However, they never seem to actually die. And the locks remain.
Iâve also tried shutting down the MariaDB database. But it wonât shut down. It just hangs while trying to.
The only thing that works is finding the OS process ID for MariaDB, and using a sudo kill <pid>. Which is an awful thing to do, but itâs the only way, besides rebooting the entire VM.
Once MariaDB restarts, everything works fineâŚuntil someday, when it doesnât.
Root Causes
I think there are several:
Bad Python: Some of the functions are querying the database inefficiently. Fetching too many columns, sorting on columns without indices, using WHERE clauses without indices, etc.
Missing Indices: The SQL tables involved donât have the indices necessary to support the queries being sent.
Bad designs for General Ledger and Stock Ledger: Requiring the individual transactions to also maintain aggregations like ârollingâ balances, quantities, and values? A huge mistake imo, more surprising because itâs been this way for a decade. Submitting 1 transaction can have a ripple effect, and impact thousands of other transactions. Transactions that otherwise wouldnât need to be touched. There are better designs that bypass this problem.
MariaDB tech debt from MySQL: I think MariaDB inherited a lot of MySQL mistakes and headaches when it comes to index performance, and they havenât been able to fix them in the time since they broke away.
Bad assumptions about 'cron' schedules in hook.py. Not a great idea to default intensive background tasks (like recalculating ledger and inventory values), in the middle of the business day.
For today, Iâm going to try to fix #1, #2, and #5. Because if I donât, then one of these days, Iâm going to get a call at 3AM because the ERP went down.
Iâd like to believe that switching to PostgreSQL would help. But Iâm not convinced.
Ultimately I think ERPNext needs a new #3. But I donât have time to replace it right now. And honestly, if Iâve reached that point where Iâm genuinely considering reinventing the entire ERPNext inventory, AR/AP, and GL models? Probably time to think even bigger.