Update on post above
Below is a post I had made in our internal issue tracker. I have tried approach number 1 - too slow. I next tried approach number 4 but am stuck trying to figure out how to profile the upload. As you can tell we’re desperate since we have a deadline to beat.
It takes too long to import the data files, especially the accounts, using the erpnext data import tool. We should look for ways of optimizing the upload. Research reveals several ways of doing this;
- Reduce the number of records being uploaded from each file, ideally 1000 records. The downside to this is that we will have too many files to upload for accounts and journals - ~ 247 files for each doctype.
- Increase the number of commits per transaction - see https://github.com/shivjm/frappe/commit/d3196d9969ecd3c5be64b1d02c68c0e9924e4b98. We will then have to increase the data import tool upload limit(5000) to allow upload of more records at a go. The downside is that a lot of data is going to be read into memory before a commit, we will therefore need a machine with sufficient RAM/resources.
- Optimize MySQL writing performance - see Testing ERPnext 700,000 rows of data - #19 by rmehta
- Optimize the data import tool - See https://github.com/frappe/frappe/tree/develop/frappe/core/page/data_import_tool
- Write our own custom scripts to upload the data bypassing the data import tool - see # Importing Transactions