Generative AI Labelling Prediction with UiPath Document understanding

What Is Document Understanding, And Why Is Generative AI Important?

Document Understanding is a powerful UiPath tool that utilizes different methods to separate documents based on their contents i.e., classification, as well as extract various fields that are needed for processing these documents, i.e., extraction. While there are different approaches for data extraction, we will delve into the machine learning extractor as a focal point.

The advent of Generative AI services, epitomized by models like GPT, being brought into the spotlight has made a paradigm shift to realize how powerful Generative AI services can be and how they can make some of our mundane tasks easier and more efficiently completed. The Automation industry leader, UiPath, has harnessed it through Generative Predict functionality within Document Understanding labeling sessions.

Labelling Documents: A Crucial Step And Its Importance

Labeling documents simply refers to identifying and annotating required fields from a large set of documents that the Machine Learning Extractor will subsequently be trained on. When creating and training a Document Understanding model, one of the most critical steps is to ensure the accurate and complete execution of this labeling process. Furthermore, this is often one of, if not the most time-consuming, steps in establishing a Document Understanding-based workflow.

Manual labeling efforts are particularly susceptible to human errors due to the sheer volume of documents and fields that require annotation, as well as the variations present among documents within a given subtype. If a model is trained on poorly labeled documents riddled with errors during the labeling process, it will inevitably result in a poorly performing model. While other factors contribute to the training of a high-performing model, such as setting realistic expectations and working with clear documents, labeling documents remains typically the most time-consuming and arguably the most crucial step in the Document Understanding process.

The Previous Method Of Prediction For Documents

The older method of prediction or pre-labeling is a feature that is within labeling sessions for Document Understanding. Instead of painstakingly labeling every field one by one for every document, this tool allowed users to first run a prediction on the document based on either an out-of-the-box model or a model previously trained on a custom set of documents. This tool essentially would mimic what would be the output during the actual extraction of the document within a production Document Understanding process utilizing that model.

This has limitations and drawbacks; however, as the reason for training in the first place is to produce a model that performs better than the model that would be used currently, which means there will almost always be adjustments that need to be made to the predicted values, otherwise the current labeling session would not be required in the first place.

A further limitation of only using a specialized field extraction model for predictions is that it is only possible to extract existing fields for that model. For example, if the model had never been trained to extract a First name, only a last name, predictions would only be able to be attempted for the last name field on which it has been trained. This forces whoever is labeling the documents to always label this new field manually, every time. While the older method for prediction saved some time during labeling, much time is still dedicated to the manual labeling effort.

Generative AI Transforms Labeling Experience

Generative AI services have revolutionized the document labeling process. By employing generative AI to assist in predictions during the labeling phase of a Document Understanding process it offers several advantages that would otherwise be very challenging, if not impossible, to achieve.

Generative AI use cases, especially labeling, provide various benefits, contingent on the specific objectives in mind. Firstly, it enables the use of a generative model to predict the required values without relying on either a pre-trained, out-of-the-box model or a customized model. Consider a scenario where incoming document types necessitate data extraction, and no pre-existing model is available.

In such cases, users would traditionally need to manually label each field across potentially thousands of documents. While this approach may be manageable for one or two fields, scaling it up to encompass 5, 10, or even 20 distinct fields within a document becomes impractical and significantly time-consuming.

The Generative Predict function makes it so that the document can then get fed into a Generative AI model, which is much more fluid and can extract fields that it has not been previously trained on. That new field requested on an older document type that previously had to be labeled manually or that brand new document type with 20 fields that would take minutes to label each document, compounded by the fact that you would need thousands of labeled documents to produce an accurate model? This function reduces the time needed from multiple minutes to mere seconds. Furthermore, it reduces the potential errors that come from manual labeling; instead of needing to label every field, a labeling session becomes more validation of what was pre-labeled, with a slight adjustment at times to any incorrectly predicted values.

Conclusion

The Generative Predict functionality is a good Generative AI use case. It can either be applied to new documents, or it can provide the best of two Machine Learning models, a specialized Extraction model and a Generative AI model. This new preview prediction method can reduce the number of errors during a labeling session, reduce the time/cost it takes to label documents and allow for a much faster Document Understanding process implementation with a high-performing model.