Help me understand this SQL statement logic

ganas · October 18, 2017, 5:50am

I have been going through ERPNext code so I can have better understanding. However, sometimes I have a problem understanding the logic. For example:
for data in customers:
address = frappe.db.sql(""" select name, address_line1, address_line2, city, state, email_id, phone, fax, pincode from 'tabAddress' where is_primary_address =1 and name in (select parent from 'tabDynamic Link' where link_doctype = 'Customer' and link_name = %s and parenttype = 'Address')""", data.name, as_dict=1)

https://github.com/frappe/erpnext/blob/966b3c156ab8d374d39883e2795ed06cff03207f/erpnext/accounts/doctype/sales_invoice/pos.py#L179-L189

so this SQL statement will run for every customer in customers, can’t this be done the other way around for example
1- first select parents in ‘tabDynamic Link’ based on customers names
select parents from 'tabDynamic Link' where link_doctype = 'Customer' and link_name in (customers.name ....) and parenttype = 'Address
2- then return the address based on parents
select name, address_line1, address_line2, city, state, email_id, phone, fax, pincode from 'tabAddress' where is_primary_address =1 and name in parents

so in this case we will only have two sql statements instead of running tens of sql stamens based on the customers list. Am I missing something or my understanding is wrong?

webingold2 · October 18, 2017, 6:49am

Hi Ganas,

Apparently the developer wanted to be 100% sure that the function returns at most one primary address per customer, hence the statement if address: address_data = address[0] which picks up the first primary address found after the SQL statement.

ganas · October 18, 2017, 7:25am

Thanks for clarifying. I didn’t notice that multiple address can be set to is_primary_address at the same time. Isn’t supposed to be one Preferred Billing Address for each customer? and anyway if is_primary_address is allowed for multiple address, shouldn’t be their a priority to those preferred addresses? so in the our SQL statement we add condition for the highest priority, and thus only one address is return for each customer.

webingold2 · October 18, 2017, 7:36am

A developer should not assume that the set of data he is dealing with is perfect, in this case that there is a maximum one primary address per customer. This is a good programming practice to ensure more resiliency in the code. Here the developer selected to pick up the first primary address that is returned, which is random as there is no sort in the SQL statement. So, in case there are two primary addresses, once it can return the first one and the next minute it can return the other one. In order to gain more consistency, the developer could have sorted the primary addresses by “modified” (name of the field containing the modified date, I presume) and selected the primary address with the most recent modification date.

ganas · October 18, 2017, 7:48am

Totally agree with you resiliency wise; however, penalty on database is huge since the number of queries is equal to the number of customers in the list (where also this function were called more than three times in the file).

webingold2 · October 18, 2017, 5:54pm

I agree with you that N queries to pick up N customers and their first primary address is far from optimal. There are much better ways to obtain the same result with just one query which will be slightly more complex though. This is the beauty of open source: you have access to the code, you think you have an improvement, you submit it and if accepted, it benefits the whole community!

(A hint on how to return maximum one row per customer: mysql - SQL select only rows with max value on a column - Stack Overflow)

ganas · October 18, 2017, 9:48pm

I created a github issue https://github.com/frappe/erpnext/issues/11248 to rethink the primary addresses and contacts as I think primary should be selected at the linked doc not at the address or contact doctype since address and contact can be linked to multiple docs.