[Performance] 1 Sales Invoice Save makes ~700+ DB calls. ~1000+ DB calls on Submit

jai_kejriwal · August 13, 2018, 4:26am

Have you joined the developer group on Telegram. Here is the link:

Himanshu_Mishra · August 16, 2018, 10:27am

I have often encountered the fact that i needed to call a get_doc only to call its instance methods. Often, these instance methods didn’t interact with any of the child tables, and these docs were loaded in hooks, hence, weren’t part of response.

Also, child tables can often be very large. Lazy loading of child tables can surely be a helpful addition to get_doc. Also, lazy loading should be a logical decision, instead of a configuration, which can be implemented by an additional argument called “lazy_load_children” which can be passed to get doc like

frappe.get_doc(doctype, docname, lazy_load_children=True)

bhavikpatel7023 · August 16, 2018, 6:00pm

Based on SP, I think we can optimize ERPNext in better way. Time may less then 1 sec.

Nice analysis @Sachin_Mane
Good work @Sachin_Mane @rmehta

bkm · August 16, 2018, 6:52pm

I worry also about the potential for increased RAM memory requirements of the host platform. Memory is the single largest contributing factor to server costs. So a system with 4 or 5 gunicorn users and 12 live users logged in and processing sales, there could be an increased memory requirement of close to 500mb? Does that sound right?

Memory usually comes in 1gb increments and costs an additional $6 to $10 per month on the server fees invoice.

Have I read this correctly? Is that a reasonable expectation of additional memory overhead?

BKM

felix · August 16, 2018, 7:30pm

If you don’t allocate more memory to redis cache, then that will just mean more cache misses and lead to performance similar to current implementations.

You don’t have to increase redis cache if you don’t want to - everything should still work the same as before (i believe - please correct me if I’m wrong).

The analogy is with the innodb_buffer_pool MariaDB variable you’ve most likely modified in other implementations. You can have the value small, but more memory means better performance. It’ll be the same for this. Finding the recommended values will be the key - and everyone’s value proposition will be different. But until this is tested and someone reports the results, we won’t really know recommendations.

Sachin_Mane · August 17, 2018, 2:47am

Lazy loading is good idea. However it may complicate the caching approach. With document cache enabled, lazy loading wont be needed.

Sachin_Mane · August 17, 2018, 2:51am

Memory is cheap nowadays. You can choose your server configuration based on your current memory utilisation.

It is a tradeoff and as @felix mentioned, it is upto you to decide how much memory should be allocated.

Himanshu_Mishra · August 17, 2018, 3:54am

get_cached_doc and get_cached_value won’t be implementing lazy loading param.

We use get cached doc for master data which is read multiple times, while lazy loading will be for documents which are read only once, and when reading its child tables is not necessary. That way, we can still use instance methods(Without having to manually create instance objects of document classes) without loading child table values (which can be in hundreds in some cases)

Sachin_Mane · August 17, 2018, 7:39am

Makes sense to me.

tmatteson · August 20, 2018, 4:19pm

A design question: currently when get_cached_doc and get_cached_value are called with a bad key they return a DoesNotExist Error rather than None. Is there a reason it works this way? I would think you’d want the two methods to have symmetrical APIs.

Himanshu_Mishra · August 25, 2018, 8:15am

Good point. get_cached_value should return None, while get_cached_doc only should return DoesNotExist Error.

tmatteson · August 25, 2018, 12:29pm

In [2]: x = frappe.get_value("Item", "nope", "item_code")
In [3]: print(x)
None
In [4]: x = frappe.get_cached_value("Item", "nope", "item_code")
---------------------------------------------------------------------------
DoesNotExistError

get_cached_value and get_value do not match.

In [5]: frappe.get_doc("Item", "nope")
---------------------------------------------------------------------------
DoesNotExistError

In [6]: frappe.get_cached_doc("Item", "nope")
---------------------------------------------------------------------------
DoesNotExistError

get_doc performs consistently, though I still think returning None is probably better than DoesNotExistError

clarkej · August 25, 2018, 1:19pm

Well to me the convention is not clear? None is the general nil or no value exception case, whereas DoesNotExistError is thrown when a doctype or an aspect of it fails to load_from_db

github.com

frappe/frappe/blob/develop/frappe/model/document.py#L124


      
          				# first arugment is doctype
          				self.doctype = args[0]
          
          				# doctype for singles, string value or filters for other documents
          				self.name = self.doctype if len(args) == 1 else args[1]
          
          				# for_update is set in flags to avoid changing load_from_db signature
          				# since it is used in virtual doctypes and inherited in child classes
          				self.flags.for_update = kwargs.get("for_update")
          				self.load_from_db()
          				return
          
          			if isinstance(args[0], dict):
          				# first argument is a dict
          				kwargs = args[0]
          
          		if kwargs:
          			# init base document
          			super().__init__(kwargs)
          			self.init_child_tables()
          			self.init_valid_columns()

github.com

frappe/frappe/blob/develop/frappe/model/meta.py#L88


      
          	fname = frappe.scrub(doctype)
          	with open(frappe.get_app_path("frappe", "core", "doctype", fname, fname + ".json")) as f:
          		txt = json.loads(f.read())
          
          	for d in txt.get("fields", []):
          		d["doctype"] = "DocField"
          
          	for d in txt.get("permissions", []):
          		d["doctype"] = "DocPerm"
          
          	txt["fields"] = [BaseDocument(d) for d in txt["fields"]]
          	if "permissions" in txt:
          		txt["permissions"] = [BaseDocument(d) for d in txt["permissions"]]
          
          	return txt
          
          
          class Meta(Document):
          	_metaclass = True
          	default_fields = list(default_fields)[1:]
          	special_doctypes = frozenset(

tmatteson · August 25, 2018, 1:37pm

Maybe this is too specific a use case, but here’s how I’m regularly checking for an existing document before creating a new one. To me, the first option is more pythonic and more efficient.

x = frappe.get_doc("Item", nope")
if x:
    return x
else:
    frappe.new_doc("Item")
    # set item attributes

As opposed to:

try: 
    x = frappe.get_doc("Item", nope")
except:   # or except DoesNotExistError:
   x = None
finally:
     if x:
         frappe.new_doc("Item")
         # set item attributes
return x

clarkej · August 25, 2018, 3:32pm

yes to me the 1st is not ambiguous and preferred

The 2nd use case I question since new_doc executes regardless ie even when Item already exists?

Remember that finally is a ‘clean up’ action…

see 8. Errors and Exceptions — Python 2.7.18 documentation

“'Since the finally clause is always executed before leaving the try statement, whether an exception has occurred or not”

tmatteson · August 25, 2018, 4:02pm

Yeah, my mistake. I have edited my post so that it’s correct. You still have to check for x. If you just put in the except portion it might catch the does not exist error but you still want it to catch other validation errors, so I think finally is better spot for it, to ensure it is outside the try/except pattern.

Sachin_Mane · August 25, 2018, 5:25pm

get_cached_value is returning DoesNotExistError because internally it calls get_cached_doc()

I think this should be handled in get_cahced_value to make the api same as get_value

szufisher · August 27, 2018, 5:32am

@Sachin_Mane
Currently if sales invoice has 51 items, when saving the invoice, there will be 51 DB calls(insert) on the tabSales Invoice Item child table, because in the parent doc’s insert method, one db_insert is called for each child table record. the extracted code as below

	# children
	for d in self.get_all_children():
		d.db_insert()

the db_insert method only handles one record, but actually the insert into clause supports inserting multi records in one call like below

        insert into table (columns) values(record1), (record2), ...

based on the above analysis, I propose to copy and adapt the db_insert method to support handling multi records like below: model/base_document.py

def db_insert_multi(self, docs):
	"""INSERT multi documents (with valid columns) in the database."""
	def insert_multi():
		try:
			columns = doc_dict_for_insert[0].keys()
			sql = """insert into `tab{doctype}`
					({columns}) values """.format(
					doctype=doctype,
					columns=", ".join(["`" + c + "`" for c in columns]))
			placeholders = ''
			values =[]
			for d in doc_dict_for_insert:
				rec_placeholder = '(%s)' %(", ".join(["%s"] * len(columns)))
				if placeholders :
					placeholders = '%s,%s' %(placeholders , rec_placeholder)
				else:
					placeholders = rec_placeholder
				values.extend(list(d.values()))
			sql = '%s %s' %(sql, placeholders )
			frappe.db.sql(sql, values)
		except Exception as e:
			raise
		for doc in doc_obj_for_insert:
			doc.set("__islocal", False)
	if not docs:
		return
	doctype = docs[0].doctype
	doc_dict_for_insert = []
	doc_obj_for_insert = []
	count = 0
	for doc in docs:
		count += 1
		if not doc.name:
		# name will be set by document class in most cases
			set_new_name(doc)
		if not doc.creation:
			doc.creation = doc.modified = now()
			doc.created_by = doc.modifield_by = frappe.session.user

		d = self.get_valid_dict(doc=doc)
		if doc.doctype != doctype or count > 500:
			insert_multi()
			doc_dict_for_insert = []
			doc_obj_for_insert = []
			count = 0
		doctype= doc.doctype
		doc_dict_for_insert.append(d)
		doc_obj_for_insert.append(doc)
	insert_multi()

the related method needed to be adapted also

def get_valid_dict(self, sanitize=True, convert_dates_to_str=False, doc = None):
	doc = doc or self
	d = frappe._dict()
	for fieldname in doc.meta.get_valid_columns():
		d[fieldname] = doc.get(fieldname)

		# if no need for sanitization and value is None, continue
		if not sanitize and d[fieldname] is None:
			continue

		df = doc.meta.get_field(fieldname)
		if df:
			if df.fieldtype=="Check":
				if d[fieldname]==None:
					d[fieldname] = 0

				elif (not isinstance(d[fieldname], int) or d[fieldname] > 1):
					d[fieldname] = 1 if cint(d[fieldname]) else 0

			elif df.fieldtype=="Int" and not isinstance(d[fieldname], int):
				d[fieldname] = cint(d[fieldname])

			elif df.fieldtype in ("Currency", "Float", "Percent") and not isinstance(d[fieldname], float):
				d[fieldname] = flt(d[fieldname])

			elif df.fieldtype in ("Datetime", "Date", "Time") and d[fieldname]=="":
				d[fieldname] = None

			elif df.get("unique") and cstr(d[fieldname]).strip()=="":
				# unique empty field should be set to None
				d[fieldname] = None

			if isinstance(d[fieldname], list) and df.fieldtype != 'Table':
				frappe.throw(_('Value for {0} cannot be a list').format(_(df.label)))

			if convert_dates_to_str and isinstance(d[fieldname], (datetime.datetime, datetime.time, datetime.timedelta)):
				d[fieldname] = str(d[fieldname])

	return d

finally the call to insert child table records /model/document.py needs to be changed as following

def insert(self, ignore_permissions=None, ignore_links=None, ignore_if_duplicate=False, ignore_mandatory=None):

	# children
	#for d in self.get_all_children():
	#	d.db_insert()

	self.db_insert_multi(self.get_all_children())

this change will reduce the DB calls significantly for docs with a lot of child items, hence improve the performance.

Any Idea?

rmehta · August 27, 2018, 5:41am

Lets start a new thread for this. This thread has gone too far. Closing.