I have a requirement where I need to daily sync an external employee DB with frappe (by creating employees and users for those employees). The thing is, with 1k employees, the sync hits the 25 min timeout of long jobs. With such a low count, I don’t think it should take so much time to create these doctypes.
Any ideas on what is taking so much time and how I could minimize the period of time if takes to sync?
Any ideas on what is taking so much time and how I could minimize the period of time if takes to sync?
Difficult to say without profiling the code and doing some analysis. My guess is SQL wait times.
One solution might be running multiple background tasks, instead of one.
For example, divide the 1000 employees into sets of 100. Then either process each 100 sequentially, or in parallel. Either way the background task for each 100 should fit inside the 25 minute timeout.
Maybe you could first evaluate which db records need to be synced.
E.g. get the list of 1k employee-IDs (together with the changing data if needed) in one go from both sides, put each of them in a Set (python set), calculate the IDs/records which need to be synced (should be a one-liner or so) and then sync only those.
If this method can fit your use case, it might help to speed up things.
I’ve noticed that the wait time is mostly when saving Users, which takes between 1 and 2 seconds, which for 1k employees eventually reaches the 1500 seconds timeout. I will try measuring the times for the User controller methods to see where time is being lost and post it here.
The approach of splitting the employees into chunks doesn’t work for my case since I have to define a hierarchy between them, so all the employees need to be ran sequentially for them to all exist for an eventual second pass through to define the hierarchy. This is usually fast as long as I don’t have a user associated to the employee.
I actually have implemented two things that work in that direction.
I only persist the employee if there has been a change and I also overrode the update_user method to only update the user if a change has actually occurred in the related fields.
This will work 90% of the time, except for the first sync for example. Also, my fear is that some cross company change someday will affect a field in all the employees and thus require the update to run for the complete Set.