"Export Data" to Excel stores non-standard xlsx

I exported about 100 records from Item to xlsx format.

I then tried to read that data back into my Python app using

import pandas as pd

SOURCE = "/opt/erpnext_workfiles/Item.xlsx"

@frappe.whitelist()
def loadItems(company):
    wkbk = pd.read_excel(SOURCE)

The result was:

18:35:53 web.1            | Traceback (most recent call last):
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/apps/frappe/frappe/app.py", line 67, in application
18:35:53 web.1            |     response = frappe.api.handle()
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/apps/frappe/frappe/api.py", line 58, in handle
18:35:53 web.1            |     return frappe.handler.handle()
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/apps/frappe/frappe/handler.py", line 30, in handle
18:35:53 web.1            |     data = execute_cmd(cmd)
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/apps/frappe/frappe/handler.py", line 70, in execute_cmd
18:35:53 web.1            |     return frappe.call(method, **frappe.form_dict)
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/apps/frappe/frappe/__init__.py", line 1145, in call
18:35:53 web.1            |     return fn(*args, **newargs)
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/apps/MyApp/MyApp/MyModule/doctype/MyDoc/MyLoader.py", line 186, in loadItems
18:35:53 web.1            |     wkbk = pd.read_excel(SOURCE)
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/env/lib/python3.8/site-packages/pandas/util/_decorators.py", line 299, in wrapper
18:35:53 web.1            |     return func(*args, **kwargs)
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/env/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 344, in read_excel
18:35:53 web.1            |     data = io.parse(
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/env/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 1170, in parse
18:35:53 web.1            |     return self._reader.parse(
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/env/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 566, in parse
18:35:53 web.1            |     output[asheetname] = parser.read(nrows=nrows)
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/env/lib/python3.8/site-packages/pandas/io/parsers.py", line 1057, in read
18:35:53 web.1            |     index, columns, col_dict = self._engine.read(nrows)
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/env/lib/python3.8/site-packages/pandas/io/parsers.py", line 2496, in read
18:35:53 web.1            |     alldata = self._rows_to_cols(content)
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/env/lib/python3.8/site-packages/pandas/io/parsers.py", line 3189, in _rows_to_cols
18:35:53 web.1            |     self._alert_malformed(msg, row_num + 1)
18:35:53 web.1            |   File "/home/erpdev/frappe-bench-DELS/env/lib/python3.8/site-packages/pandas/io/parsers.py", line 2948, in _alert_malformed
18:35:53 web.1            |     raise ParserError(msg)
18:35:53 web.1            | pandas.errors.ParserError: Expected 2 fields in line 14, saw 136

As a quick sanity check I loaded the file into LibreOffice, made a minor change, saved it and tried loading that. It worked without problem!

I’d like to look at the Data Export code. Where can I find it?

So here’s a putrid sort of workaround:

import openpyxl as pxl

@frappe.whitelist()
def loadItems(company):
    wkbk = pxl.load_workbook(SOURCE)
    wkbk.save(SOURCE)
    wkbk = pd.read_excel(SOURCE)

This should be a good starting point:

Oh Cool! Thanks.

I’ll try to dig into it a bit.

I suspect that frappe.utils.xlsxutils is badly out of date. The comments to this answer suggest as much:

Stack Overflow ==> Pandas read excel : ValueError

The code could save to a temporary file, then immediately load it and save it again with openpyxl then return it to the user. It would be a pragmatic workaround.