I have a huge number of accounts to upload (~247000) all under 99 parents accounts. I
started by uploading the parent accounts and that completed relatively
quickly. However, the upload of the children accounts is taking a long
time. To speed up the imports I split the accounts to 247 files such
that am uploading 1000 records per file. I was only able to upload 7
files in 3 hours which is not practical. How can I speed up this
process?
FYI, I tried to profile the upload to see where most of the time is being spent by using cProfile on frappe.core.page.data_import_tool.data_import_tool.import_doc() method without success. The method never seem to be called.
Below is a post I had made in our internal issue tracker. I have tried approach number 1 - too slow. I next tried approach number 4 but am stuck trying to figure out how to profile the upload. As you can tell we’re desperate since we have a deadline to beat.
It takes too long to import the data files, especially the accounts, using the erpnext data import tool. We should look for ways of optimizing the upload. Research reveals several ways of doing this;
Reduce the number of records being uploaded from each file, ideally 1000 records. The downside to this is that we will have too many files to upload for accounts and journals - ~ 247 files for each doctype.
Increase the number of commits per transaction - see https://github.com/shivjm/frappe/commit/d3196d9969ecd3c5be64b1d02c68c0e9924e4b98. We will then have to increase the data import tool upload limit(5000) to allow upload of more records at a go. The downside is that a lot of data is going to be read into memory before a commit, we will therefore need a machine with sufficient RAM/resources.