Accurately extracting text/image data from unpredictable format/layout documents (PDF, Word, Excel, Webpages, emails) which do not have an underlying technical structure XML or field identifiers, has always been a challenge for all conventional technologies, including other RPA (Robotic Process Automation) platforms. This means people must read each document and re-enter data increasing processing cost, time and errors.
Instaknow patented Artificial Intelligence process millions of complex documents to eliminate manual processing for Fortune 500 clients in Banking, Supply Chain, Healthcare, Utilities, Pharmaceuticals, Law, Insurance and Government. All required data is accurately extracted and converted to XML for conventional processing.
Using human-eyeball-like scanning of each document’s layout, Instaknow correctly decide which text is which header or label in that document, WITHOUT needing the underlying structure like XML or field identifiers or Machine Learning examples. Data can be laid out DIFFERENTLY in different documents. Instaknow can even accurately determine the checkboxes and radio buttons. If a human eyeball can find and isolate date of interest, Instaknow can do it too, regardless of variations. Documents do NOT need to be in specific technical formats. They can be text documents or image/scan documents, with one or multiple pages. Section within documents can appear in any order and columns in tables can also have an unpredictable sequence!
E.g. in the following scanned tax returns example, the top return has space for three Officers while the bottom return can have up to four Officers listed. Also, the column widths are very different. These documents came in as scanned images and have no underlying XML, technical ids or predictable string sequences which will allow conventional data processing like RPA (Robotic Process Automation). Only a person can detect the actual data layout and content, and has to manually re-enter it in another computer system or file for further processing. But manual processing of thousands of documents is expensive, slow and error-prone!