Have you joined the developer group on Telegram. Here is the link:
I have often encountered the fact that i needed to call a get_doc only to call its instance methods. Often, these instance methods didn’t interact with any of the child tables, and these docs were loaded in hooks, hence, weren’t part of response.
Also, child tables can often be very large. Lazy loading of child tables can surely be a helpful addition to get_doc. Also, lazy loading should be a logical decision, instead of a configuration, which can be implemented by an additional argument called “lazy_load_children” which can be passed to get doc like
frappe.get_doc(doctype, docname, lazy_load_children=True)
Based on SP, I think we can optimize ERPNext in better way. Time may less then 1 sec.
Nice analysis @Sachin_Mane
Good work @Sachin_Mane @rmehta
I worry also about the potential for increased RAM memory requirements of the host platform. Memory is the single largest contributing factor to server costs. So a system with 4 or 5 gunicorn users and 12 live users logged in and processing sales, there could be an increased memory requirement of close to 500mb? Does that sound right?
Memory usually comes in 1gb increments and costs an additional $6 to $10 per month on the server fees invoice.
Have I read this correctly? Is that a reasonable expectation of additional memory overhead?
BKM
If you don’t allocate more memory to redis cache, then that will just mean more cache misses and lead to performance similar to current implementations.
You don’t have to increase redis cache if you don’t want to - everything should still work the same as before (i believe - please correct me if I’m wrong).
The analogy is with the innodb_buffer_pool MariaDB variable you’ve most likely modified in other implementations. You can have the value small, but more memory means better performance. It’ll be the same for this. Finding the recommended values will be the key - and everyone’s value proposition will be different. But until this is tested and someone reports the results, we won’t really know recommendations.
Lazy loading is good idea. However it may complicate the caching approach. With document cache enabled, lazy loading wont be needed.
Memory is cheap nowadays. You can choose your server configuration based on your current memory utilisation.
It is a tradeoff and as @felix mentioned, it is upto you to decide how much memory should be allocated.
get_cached_doc and get_cached_value won’t be implementing lazy loading param.
We use get cached doc for master data which is read multiple times, while lazy loading will be for documents which are read only once, and when reading its child tables is not necessary. That way, we can still use instance methods(Without having to manually create instance objects of document classes) without loading child table values (which can be in hundreds in some cases)
Makes sense to me.
A design question: currently when get_cached_doc
and get_cached_value
are called with a bad key they return a DoesNotExist
Error rather than None
. Is there a reason it works this way? I would think you’d want the two methods to have symmetrical APIs.
Good point. get_cached_value should return None, while get_cached_doc only should return DoesNotExist Error.
In [2]: x = frappe.get_value("Item", "nope", "item_code")
In [3]: print(x)
None
In [4]: x = frappe.get_cached_value("Item", "nope", "item_code")
---------------------------------------------------------------------------
DoesNotExistError
get_cached_value
and get_value
do not match.
In [5]: frappe.get_doc("Item", "nope")
---------------------------------------------------------------------------
DoesNotExistError
In [6]: frappe.get_cached_doc("Item", "nope")
---------------------------------------------------------------------------
DoesNotExistError
get_doc
performs consistently, though I still think returning None
is probably better than DoesNotExistError
Well to me the convention is not clear? None is the general nil or no value exception case, whereas DoesNotExistError is thrown when a doctype or an aspect of it fails to load_from_db
Maybe this is too specific a use case, but here’s how I’m regularly checking for an existing document before creating a new one. To me, the first option is more pythonic and more efficient.
x = frappe.get_doc("Item", nope")
if x:
return x
else:
frappe.new_doc("Item")
# set item attributes
As opposed to:
try:
x = frappe.get_doc("Item", nope")
except: # or except DoesNotExistError:
x = None
finally:
if x:
frappe.new_doc("Item")
# set item attributes
return x
yes to me the 1st is not ambiguous and preferred
The 2nd use case I question since new_doc executes regardless ie even when Item already exists?
Remember that finally is a ‘clean up’ action…
see 8. Errors and Exceptions — Python 2.7.18 documentation
“'Since the finally clause is always executed before leaving the try statement, whether an exception has occurred or not”
Yeah, my mistake. I have edited my post so that it’s correct. You still have to check for x. If you just put in the except portion it might catch the does not exist error but you still want it to catch other validation errors, so I think finally is better spot for it, to ensure it is outside the try/except pattern.
get_cached_value is returning DoesNotExistError because internally it calls get_cached_doc()
I think this should be handled in get_cahced_value to make the api same as get_value
@Sachin_Mane
Currently if sales invoice has 51 items, when saving the invoice, there will be 51 DB calls(insert) on the tabSales Invoice Item child table, because in the parent doc’s insert method, one db_insert is called for each child table record. the extracted code as below
# children
for d in self.get_all_children():
d.db_insert()
the db_insert method only handles one record, but actually the insert into clause supports inserting multi records in one call like below
insert into table (columns) values(record1), (record2), ...
based on the above analysis, I propose to copy and adapt the db_insert method to support handling multi records like below: model/base_document.py
def db_insert_multi(self, docs):
"""INSERT multi documents (with valid columns) in the database."""
def insert_multi():
try:
columns = doc_dict_for_insert[0].keys()
sql = """insert into `tab{doctype}`
({columns}) values """.format(
doctype=doctype,
columns=", ".join(["`" + c + "`" for c in columns]))
placeholders = ''
values =[]
for d in doc_dict_for_insert:
rec_placeholder = '(%s)' %(", ".join(["%s"] * len(columns)))
if placeholders :
placeholders = '%s,%s' %(placeholders , rec_placeholder)
else:
placeholders = rec_placeholder
values.extend(list(d.values()))
sql = '%s %s' %(sql, placeholders )
frappe.db.sql(sql, values)
except Exception as e:
raise
for doc in doc_obj_for_insert:
doc.set("__islocal", False)
if not docs:
return
doctype = docs[0].doctype
doc_dict_for_insert = []
doc_obj_for_insert = []
count = 0
for doc in docs:
count += 1
if not doc.name:
# name will be set by document class in most cases
set_new_name(doc)
if not doc.creation:
doc.creation = doc.modified = now()
doc.created_by = doc.modifield_by = frappe.session.user
d = self.get_valid_dict(doc=doc)
if doc.doctype != doctype or count > 500:
insert_multi()
doc_dict_for_insert = []
doc_obj_for_insert = []
count = 0
doctype= doc.doctype
doc_dict_for_insert.append(d)
doc_obj_for_insert.append(doc)
insert_multi()
the related method needed to be adapted also
def get_valid_dict(self, sanitize=True, convert_dates_to_str=False, doc = None):
doc = doc or self
d = frappe._dict()
for fieldname in doc.meta.get_valid_columns():
d[fieldname] = doc.get(fieldname)
# if no need for sanitization and value is None, continue
if not sanitize and d[fieldname] is None:
continue
df = doc.meta.get_field(fieldname)
if df:
if df.fieldtype=="Check":
if d[fieldname]==None:
d[fieldname] = 0
elif (not isinstance(d[fieldname], int) or d[fieldname] > 1):
d[fieldname] = 1 if cint(d[fieldname]) else 0
elif df.fieldtype=="Int" and not isinstance(d[fieldname], int):
d[fieldname] = cint(d[fieldname])
elif df.fieldtype in ("Currency", "Float", "Percent") and not isinstance(d[fieldname], float):
d[fieldname] = flt(d[fieldname])
elif df.fieldtype in ("Datetime", "Date", "Time") and d[fieldname]=="":
d[fieldname] = None
elif df.get("unique") and cstr(d[fieldname]).strip()=="":
# unique empty field should be set to None
d[fieldname] = None
if isinstance(d[fieldname], list) and df.fieldtype != 'Table':
frappe.throw(_('Value for {0} cannot be a list').format(_(df.label)))
if convert_dates_to_str and isinstance(d[fieldname], (datetime.datetime, datetime.time, datetime.timedelta)):
d[fieldname] = str(d[fieldname])
return d
finally the call to insert child table records /model/document.py needs to be changed as following
def insert(self, ignore_permissions=None, ignore_links=None, ignore_if_duplicate=False, ignore_mandatory=None):
# children
#for d in self.get_all_children():
# d.db_insert()
self.db_insert_multi(self.get_all_children())
this change will reduce the DB calls significantly for docs with a lot of child items, hence improve the performance.
Any Idea?
Lets start a new thread for this. This thread has gone too far. Closing.