Indexing Stock Ledger Entry to avoid Deadlock

saravana · December 1, 2021, 1:03pm

Team,

We are facing lots of deadlock situation and it is frequently occurring from the stock ledger entry table. Below is a sample of the deadlock log. From the log we can find that close to 2Lakh Records are locked for updating the Stock ledger entry. This is resulting in lots of lock timeouts and deadlock.

We find that except posting_time and is_cancelled fields, rest of the fields are indexed by default.

is there any better way to index the StockLedger Entry table to avoid the deadlock?

18768 lock struct(s), heap size 1990776, 218638 row lock(s), undo log entries 84
MySQL thread id 10146, OS thread handle 140555535853312, query id 437338 localhost 127.0.0.1 _1bd3e0294da19198 Creating sort index

select *, timestamp(posting_date, posting_time) as “timestamp” from tabStock Ledger Entry
where item_code = ‘XXXX’
and coalesce(is_cancelled, ‘No’)=‘No’
and timestamp(posting_date, posting_time) > timestamp(‘2021-12-01’, ‘17:49:32.999046’) and warehouse = ‘XXX’ and name!=‘ef251ab47d’
order by timestamp(posting_date, posting_time) asc, creation asc
for update

Thanks,
Saravana
DigixrTechnologies

ankush · December 1, 2021, 2:47pm

Like I said in other posts too, first upgrade to V13, if you haven’t already!

If you try to optimize performance on v12 you’re likely fixing the problems we have already fixed in v13!

saravana · December 1, 2021, 3:38pm

@ankush. We are hesitant as we have made significant customization in V12 and not sure of the effort involved in moving to V13. Is there any possibility to backport the fix?

Can you give us any pointers on what exactly is fixed with reference to the locking issue?

Thanks,
Saravana

ankush · December 1, 2021, 5:38pm

Nope. it’s not “the fix”. There were several major refactors and performance fixes first in v13 and then progressively a few more in minor versions. Some of them are breaking changes that will never get backported.

brian_pond · December 31, 2022, 1:05am

For the record, still a problem in V13 a year later. Several clients are getting intermittent “Creating sort index” locks. Sometimes on tables like Stock Ledger Entry. Other times on things like Tag Link.

When the locks happen, they are very devastating.

ERP times out due to SQL lock, wherever that table (DocType) is referenced.
I’ve tried killing the SQL processes that are creating the sort index, finding them with:
```
SHOW FULL PROCESSLIST;
```
- However, they never seem to actually die. And the locks remain.
I’ve also tried shutting down the MariaDB database. But it won’t shut down. It just hangs while trying to.
The only thing that works is finding the OS process ID for MariaDB, and using a sudo kill <pid>. Which is an awful thing to do, but it’s the only way, besides rebooting the entire VM.

Once MariaDB restarts, everything works fine…until someday, when it doesn’t.

Root Causes

I think there are several:

Bad Python: Some of the functions are querying the database inefficiently. Fetching too many columns, sorting on columns without indices, using WHERE clauses without indices, etc.
Missing Indices: The SQL tables involved don’t have the indices necessary to support the queries being sent.
Bad designs for General Ledger and Stock Ledger: Requiring the individual transactions to also maintain aggregations like “rolling” balances, quantities, and values? A huge mistake imo, more surprising because it’s been this way for a decade. Submitting 1 transaction can have a ripple effect, and impact thousands of other transactions. Transactions that otherwise wouldn’t need to be touched. There are better designs that bypass this problem.
MariaDB tech debt from MySQL: I think MariaDB inherited a lot of MySQL mistakes and headaches when it comes to index performance, and they haven’t been able to fix them in the time since they broke away.
Bad assumptions about 'cron' schedules in hook.py. Not a great idea to default intensive background tasks (like recalculating ledger and inventory values), in the middle of the business day.

For today, I’m going to try to fix #1, #2, and #5. Because if I don’t, then one of these days, I’m going to get a call at 3AM because the ERP went down.

I’d like to believe that switching to PostgreSQL would help. But I’m not convinced.

Ultimately I think ERPNext needs a new #3. But I don’t have time to replace it right now. And honestly, if I’ve reached that point where I’m genuinely considering reinventing the entire ERPNext inventory, AR/AP, and GL models? Probably time to think even bigger.