ERPNext tabVersion is returning 34 millions of record. This is recorded in mysql slow query log. Once this is observed the server goes to 502 and come back at specified interval. How to overcome this issue
pretty sure it will get OOM before serializing 34M records, and no single document will have that many versions (ideally versions are only fetched for
(doctype, docname) on form view.
The Version table
'tabVersion' can definitely grow large over time, and cause performance issues.
My advice is to decide how much Version history you need (30 days, 60 days, etc.)
Then run a maintenance task every day, that deletes older history.
For example, create a Daily task that executes this SQL statement:
DELETE FROM `tabVersion` WHERE creation < DATE_ADD(now(), INTERVAL -30 DAY);
In the example above, any
'tabVersion' rows older than 30 days are deleted from MariaDB.
This prevents the table from growing forever. But you can still review recent CRUD activity in Version.
This isn’t a good idea TBH. By deleting
tabVersion all changes that are logged for transactions are also deleted. This is a compliance requirement in some countries (to keep change logs), so only do it if you’re really sure about this.
Usually you won’t find it useful for immediate changes but they become valuable for “forensics” when you need to dig up something that happened a year ago.
This table has doctype-docname index, so apart from storage it doesn’t have huge performance hits and queries on this table are almost exclusive for a single document, so it should never be fetching millions of records in the first place… unless someone wrote something stupid like
frappe.get_all("Version") without any filtering for specific document
Check what is the query that is querying this table. And add index for those parts, columns which is in the “where” statement. That should speed things up.
A far better option is horizontal table partitioning.
Here’s a partition by date example, from : https://dev.mysql.com/doc/mysql-partitioning-excerpt/5.7/en/partitioning-types.html
CREATE TABLE members ( firstname VARCHAR(25) NOT NULL, lastname VARCHAR(25) NOT NULL, username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (1960), PARTITION p1 VALUES LESS THAN (1970), PARTITION p2 VALUES LESS THAN (1980), PARTITION p3 VALUES LESS THAN (1990), PARTITION p4 VALUES LESS THAN MAXVALUE );
Another example (database - spreading mysql data across multiple disks - Stack Overflow) where you can direct data to different volumes:
ALTER TABLE mytable REORGANIZE PARTITION p03 INTO ( PARTITION p03 VALUES Less Than (30000) DATA DIRECTORY = "/mnt/disk3" INDEX DIRECTORY = "/mnt/disk3", PARTITION p04 VALUES Less Than MAXVALUE DATA DIRECTORY = "/mnt/disk4" INDEX DIRECTORY = "/mnt/disk4" );
Do research the pitfalls of this however, it doesn’t come for free.
If you -need- the Version data because it actually matters? Or because a regulation says you must maintain it for N years?
Then I agree wholeheartedly. Explore different strategies for archiving, partitioning tables, etc.
Otherwise, if your company doesn’t actually care that in May 2017, John Smith changed the quantity of Purchase Order line 2 from 7 units to 8 units? Then delete it. Not all data is important or useful. Don’t carry around excess baggage.
Unfortunately, very few ERP clients I’ve encountered actually draw up and maintain a Data Retention Policy, documenting what is important to keep, and what is not. The database grows continuously, until someday, it reaches a moment of crisis. Which could have been prevented.
I also came across this issue. Anyone was able to resolve this? Below is what is logged in slow query log. No filtering at all for the query.
Query_time: 12.075216 Lock_time: 0.000031 Rows_sent: 380151 Rows_examined: 380151 Rows_affected: 0 Bytes_sent: 218530922 use uhub_db; SET timestamp=1669356138; SELECT /*!40001 SQL_NO_CACHE */ `name`, `creation`, `modified`, `modified_by`, `owner`, `docstatus`, `parent`, `parentfield`, `parenttype`, `idx`, `ref_doctype`, `docname`, `data`, `_user_tags`, `_comments`, `_assign`, `_liked_by` FROM `tabVersion`;
I was wondering this could be a bug in Frappe. Anybody has a clue how to find the code segment it makes this query?
@tglk that’s not a user query. That’s just backup tool backing up version table
You can ignore all queries containing
/*!40001 SQL_NO_CACHE */ those are from mysqldump.
Thank you so much! I was wondering how to debug this