Is it safe to truncate tabData Import Log

When setting up a system, importing data is important. Do we need to keep the log if we retain the files used for the import?

Examining my database of ~1G 230MB (1/2 million rows) is consumed by tabData Import Log. With regular backups occurring, this leads to db backups 25% larger than necessary.

Has anyone truncated this tabData Import Log? I’m also curious about tabVersion as well. This table is similar in size. For this one, I probably would export the data to Excel for easy reference when needed.

I would keep a backup that includes this table prior to truncating, but I don’t see the data as being needed in the future.

I’m also interested in any other DB maintenance for optimizing performance and minimizing DB growth.

Good question; I’m genuinely not sure but curious!

You could always run backups from the bench cli that exclude those doctypes, but that may or may not fit with your goals and workflow.

I’m not aware of any links or dependencies on those docs, so I suspect it would be fine.

Regarding tabVersion, I’ve noticed that when any document is being edited then an entry in DocType Version is created every 10 seconds. For what reason I do not know? However, I presume it is used to determine whether a document requires refreshing when multiple users access / edit the same document at any given time.

To investigate any new records to the table over the past 5 minutes, I’ve used the following DB query. Change the number of minutes to your requirements.

bench --site ${sSite} mariadb --execute=“SELECT * FROM tabVersion WHERE modified > (CURRENT_TIMESTAMP() - INTERVAL 5 MINUTE) ORDER BY modified DESC;”

I have on a few occasions deleted very old content from tabVersion without any observable impact on the system behaviour, other than reducing the DB size.

Thank you for teaching me how to query via bench without opening mariaDB console.

In my case, the large size for tabVersions is directly related to importing the original data. Customers get modified several times via the import process (linking contacts, addresses, etc).

Here’s what the consumption is before using ERPNext in production:
Screenshot 2025-03-02 at 11.11.00 AM

Here’s what it looks like after using it for ~15 months in production.
Screenshot 2025-03-02 at 11.13.43 AM

I will create a script to delete records older than 1yr in tabVersions as part of maintenance.

1 Like

It’s perfectly safe to delete or archive these tables. I’ve been doing it for over 3 years with a daily script:

  • tabVersion
  • tabDeleted Document
  • tabScheduled Job Log
  • tabEmail Queue
  • tabAccess Log
  • __global_search

I’ve also successfully archived tables like 'tabDelivery Note' I usually wait until the records are over 1 year old. By then, it’s very unlikely anyone cares about them. Or opens another document that is linked to them.

The framework actually uses the 'modified' field on each DocType to determine this. Before completing any update, it verifies the 'modified' value hasn’t been altered by another operation.

As for 'tabVersion', you can control what is written there, per DocType:

2 Likes