Invoice documents are extremely common and critical in a diverse range of business workflows. Annually, approximately 550 billion electronic invoices are sent over the world and an astonishing 90% is still processed manually. Within companies, this implies that for each invoice, an employee has to retrieve, verify and report entity information, such as the invoice number, date and total amount to execute the financial transaction.
A large number of research efforts have been put forward that attempts to transform unstructured data from documents, such as company invoices, to understandable and readable formats for machines, and make this process scalable and inclusive for a broad variety of document layouts. Yet, prevailing literature points out that a clean solution, implementation nor pipeline are at hand to enable companies to participate in this digital revolution.
Currently, the automated systems that do exist, are based on brittle and error-prone heuristics and do not display comforting results without huge constraints and drawbacks. In collaboration with Clappform, this research focuses on how deep learning neural networks, in conjunction with OCR, can be used to construct a system to convert digital invoices from PDF documents into indexed text data.
From a digital perspective, more value is derived and created from these documents and thus, contributes to a more efficient processing of information. This extraction enables companies to utilize and analyse entailed information more efficiently than ever before.
Technical explanation:
This research introduces a novel approach to the problem based on object detection algorithms. The devised system is described and displayed below as simple and clear as possible. First, the document is converted to an image and prepared to be fed into the object detection model. The objective of this model is to detect specified entities of information in the invoice and thus, to predict the location of the plane where the entities are located. The output of the model are planes with corresponding class. From here, the planes are fed into the OCR module to obtain the textual information they entail. In the last step, the textual information, received from the OCR, is unified with its corresponding class.
The research is focused mainly on the second component of the Object Detection model. Multiple experiments are executed to gain more insight into the feasibility and applicability of these models within the invoice extraction domain. With limited computational, data and training conditions, promising results are achieved. Despite these promising results, a feasible system using Object Detection models cannot be constructed yet. Would you like to know more and read the entire research paper? Please fill in the form below and we’ll dispatch it right away!
Whatever challenges you have, we are happy to help!
Do not hesitate to contact us or request a free demo.