With GenAI, we are evolving our extraction technology. The goal is to bring models closer to what ultimately matters: the final, structured output.
What’s changing
Until now, classic Extraction has followed a multi-step approach. Documents are broken down into OCR boxes, these boxes are classified, and the results are then combined into final fields through additional logic.
This approach worked, but was limited in certain aspects:
- High dependency on box structures: Variations in layout or annotation can affect quality
- Time-consuming annotation: Training data must be created very consistently at the box level
- Complex post-processing: Accurate results often depend on additional rules and configuration
- Limited reuse of feedback: User corrections are difficult to leverage directly as training data
- Metric limitations: Box classification quality does not always reflect final output quality
The key point:
The system primarily optimizes intermediate steps and not the result that the user actually needs.
What GenAI extraction does differently
With GenAI Extraction, this focus fundamentally shifts.
The model is trained directly to generate the final fields — the exact values that are ultimately used in the system. Intermediate steps like box classification become less relevant.
In practice, this means:
- The entire document is considered as context
- Field definitions and descriptions (prompts) guide the extraction
- The model generates final results directly
Instead of labeling individual text fragments, the model interprets the content and derives the appropriate values.
The key difference
The difference between classic extraction and GenAI extraction lies in the optimization goal:
- Classic extraction: Optimizes the classification of individual elements (e.g., OCR boxes)
- GenAI extraction: Optimizes the quality of the final output directly
This not only simplifies the technical approach but also aligns extraction more closely with real-world use cases.
Why this matters
GenAI extraction supports a new way of working with extraction models.
Instead of rigid training cycles, it introduces an iterative process:
- Define fields: clearly specify field names and prompts
- See results immediately: get instant results with zero-shot, no training required
- Improve step by step: iteratively refine prompts and results based on feedback
This makes development faster, more transparent and more scalable, especially in API-driven and automated workflows. It also shifts the focus from intermediate steps to the final output, allowing metrics and improvements to better reflect real production quality. At the same time, user feedback and corrections can be more directly incorporated into the learning process, enabling continuous optimization.
There are three key advantages that make GenAI-based extraction a core part of our new approach:
1. Zero-shot – Instant extraction without training
Zero-shot fundamentally changes how extraction workflows are started.
It enables field extraction without any prior training. The model relies only on the field name, its description (prompt), and the document content to generate the most likely value.
This can reduce the need for a traditional training process; results can be generated quickly without upfront annotation or waiting time.
Advantages:
- No traditional training process required
- Results are available immediately, without annotation or waiting time
The key advantage is the fast feedback loop. Fields can be defined, tested, and iteratively refined before any training data is created, allowing early validation of field definitions and prompt quality.
Zero-shot is based on generative reasoning: the model interprets content rather than simply classifying it, making prompt quality critical.
It works especially well for standardized fields and common document types. For complex rules or strict requirements, it can be complemented with few-shot or full training.
2. Training models with existing data (e.g. DocuWare documents)
Instead of uploading documents and annotating them, you can now also upload documents with the information needed to train the models.
This could either be done
- by uploading a document with the matchin json file (see screenshot)
- or by creating a model from DocuWare. Here, you can simply select the documents that you want to use to create the extraction model. Index values in the DocuWare file cabinet are then used to annotate and train the model
This allows you to create and train your models faster while aiming to maintain comparable extraction quality.
3. Train with validation
An additional advantage of GenAI Extraction is that customers will be able to use feedback from the Validation UI to further improve existing models.
A corresponding feedback-loop functionality is currently being developed and will be available soon.
When users correct extracted values during validation, this feedback is captured and can be used to improve the model in future training iterations.
In addition, validation introduces a controlled step within the automated IDP workflow where results are reviewed and confirmed before being passed on to downstream processes. During this step, the process is paused until user confirmation is provided, ensuring that only validated data continues through the pipeline.
This feedback mechanism enables continuous improvement of the system. Over time, and with regular retraining, validation feedback can help the model become more accurate and reliable, creating an adaptive system that evolves through usage and delivers increasingly trustworthy results.
From static extraction to learning systems
GenAI extraction represents the shift from a traditional, pipeline-based extraction approach to an outcome-driven system designed for continuous improvement. The focus is no longer on optimizing individual processing steps, but on the quality of the final structured output.
Training is therefore no longer an isolated process, but part of a continuous learning cycle driven by usage and validation. The system continuously evolves and adapts to real-world requirements and data.
Overall, this approach makes extraction simpler, more transparent, and closer to actual business use cases with the goal of turning models into productive, adaptive systems that continuously improve over time.