Curious if anyone has setup a nice ‘semi-automated’ workflow to populate forms from pdf’s received by suppliers.
Not sure how people are doing it, but manually keying in data from a supplier delivery-note or invoice is time consuming and better done by ML based OCR. By ‘semi’, I mean that the result should be visualised as source data → interpreted field → destination, queued for manual approval.
If you already have a OCR and metadata extracting tool, post the data to Erpnext via the API.
I would also be very interested in it. There must be something. Nothing out there, not even a paid version?
+1, I really would like to see a smart pdf extractor that previews how it would populate any doctype, allowing touch-ups and allowing the training-set to improve over time.
there are soo many great tools to do this now so would be a nice addition to have a native way of doing what a human would do when receiving an invoice or delivery-note via email.
I can share my actual use case here:
In China, invoices are formal PDF documents, and the government is gradually transitioning to standardized XML formats. The scenario is as follows:
Suppose you receive 2 PDF invoices. In ERPNext, here’s how I handle OCR processing:
The general workflow is as follows:
- Employees receive 2 PDF format invoices.
- Employees send the invoices to a designated email address.
- The system reads the email data, including the email content and attached invoices, and stores them.
- The system uses OCR to recognize the invoice content and stores it.
- Finally, the finance team reviews the invoices and generates vouchers in ERPNext.
- OCR recognition: Python packages like email and pdfminer(another package cnocr also could be considerd).
- Frappe-related technologies: Doctype, scheduling email parsing every 5 minutes.