When setting up a system, importing data is important. Do we need to keep the log if we retain the files used for the import?
Examining my database of ~1G 230MB (1/2 million rows) is consumed by tabData Import Log. With regular backups occurring, this leads to db backups 25% larger than necessary.
Has anyone truncated this tabData Import Log? I’m also curious about tabVersion as well. This table is similar in size. For this one, I probably would export the data to Excel for easy reference when needed.
I would keep a backup that includes this table prior to truncating, but I don’t see the data as being needed in the future.
I’m also interested in any other DB maintenance for optimizing performance and minimizing DB growth.
Regarding tabVersion, I’ve noticed that when any document is being edited then an entry in DocType Version is created every 10 seconds. For what reason I do not know? However, I presume it is used to determine whether a document requires refreshing when multiple users access / edit the same document at any given time.
To investigate any new records to the table over the past 5 minutes, I’ve used the following DB query. Change the number of minutes to your requirements.
bench --site ${sSite} mariadb --execute=“SELECT * FROM tabVersion WHERE modified > (CURRENT_TIMESTAMP() - INTERVAL 5 MINUTE) ORDER BY modified DESC;”
I have on a few occasions deleted very old content from tabVersion without any observable impact on the system behaviour, other than reducing the DB size.
Thank you for teaching me how to query via bench without opening mariaDB console.
In my case, the large size for tabVersions is directly related to importing the original data. Customers get modified several times via the import process (linking contacts, addresses, etc).
Here’s what the consumption is before using ERPNext in production:
Here’s what it looks like after using it for ~15 months in production.
I will create a script to delete records older than 1yr in tabVersions as part of maintenance.
It’s perfectly safe to delete or archive these tables. I’ve been doing it for over 3 years with a daily script:
tabVersion
tabDeleted Document
tabScheduled Job Log
tabEmail Queue
tabAccess Log
__global_search
I’ve also successfully archived tables like 'tabDelivery Note' I usually wait until the records are over 1 year old. By then, it’s very unlikely anyone cares about them. Or opens another document that is linked to them.
The framework actually uses the 'modified' field on each DocType to determine this. Before completing any update, it verifies the 'modified' value hasn’t been altered by another operation.
As for 'tabVersion', you can control what is written there, per DocType: