Has anyone successfully managed to Archive old Data from ERPNext?

Archiving historic data on an ongoing basis? Easy.

  • Identify historic records in Database A, based on transaction date.
  • Write those records to Database B.
  • Delete the transactional records in Database A (being safe, and double-checking they exist in Database B right before you delete)
  • Repeat as needed.

This could be done with pure SQL, scheduled in a cron job. Or pure Python. Or even an ERPNext scheduled job. Tons of possibilities. If you mess up? No problem. You just restore records from Database B back into Database A.

^ This is the difficult requirement. If you didn’t want a 2nd ERPNext? Then everything I wrote above is easily accomplished.

So, how would this work?

To have a 2nd, fully-operational ERPNext: Database B needs a complete set of records from Database A. Not only the historic transactions you archived. But everything else too. Customers, Suppliers, Items, Users, Permissions, Singles, Links. All the tables.

To achieve that? You need an ongoing synchronization event from A to B. However…you cannot use something easy, like Replication. Because archiving records means eventually executing a DELETE statement in Database A. The replication ensures that same DELETE statement would repeat in Database B. And you just lost the records you’re supposed to be archiving. :man_facepalming:

Option 1: Doing what you asked for:

  1. Create a standalone Database C. Empty. Has nothing to do with ERPNext.
  2. Create an ongoing Replication from Database A to Database B. MariaDB Replication - MariaDB Knowledge Base
  3. You now have 2 identical ERPNext running in parallel. Great.
  4. A scheduler starts the Archive Task. All records designated as “archive-able” in Database A are copied into Database C. It saves them forever.
  5. Next, the scheduler deletes the records in Database A. Performance benefit achieved!
  6. Because of SQL replication, Database B is going to automatically delete those records too…
  7. …but luckily we saved them in standalone Database C!
  8. The final step of the Archive Task is to reinsert the records from Database C, back into B. Leaving database B with everything that A once had.

Option 2: How I’d solve this:

Personally, I’d probably go in a different direction entirely. I’d build a Data Warehouse and do BI reporting.

  • If users want real-time reporting, or recent history, they run traditional ERPNext reports.
  • For anything historic and archived, they’d run Data Warehouse reports.

Building a DW is not trivial. And you’d need some kind of Frontend for the reports. But it’s safe, scales, and avoids the headaches of a 2nd ERPNext, replication pitfalls, DB schema changes when you bench upgrade, timing, etc.

1 Like