Companies process large volumes of documents every day, such as invoices, contracts, or delivery notes. However, the real value is only created when this content can be automatically recognized, classified, and structured for downstream processes.
With the new developments in the DocuWare IDP plugin, this step is now significantly simpler. Users can train AI models for classification and extraction directly within DocuWare—based on existing documents in the system, without the need for external training environments.
The goal of this development is to make the creation and use of AI models as simple as possible and to fully integrate them into the existing DocuWare environment.
Creating AI models without system breaks
A key objective of this enhancement is to enable users to:
- train AI models directly within DocuWare
- use existing documents from file cabinets as a training basis
- create models without external tools or data preparation
- immediately deploy trained models productively within the system
This establishes a seamless end-to-end approach in which the entire lifecycle of an IDP model is handled within DocuWare.
The DocuWare IDP plugin as a central platform
The IDP plugin serves as the central control unit for all Intelligent Document Processing functions. Within this interface, users can:
- start training for classification, extraction, and splitting
- select file cabinets and index fields as a training basis
- configure and reuse models
- manage the entire lifecycle of IDP workflows
All steps are bundled into a unified interface, drastically simplifying the previously fragmented training process.
Simplified training concept: From configuration to data
A key paradigm shift is the move away from manual configuration toward data-driven learning.
Instead of complex model setup, the new approach is based on:
- existing document repositories in DocuWare
- structured index data
- automated generation of training data
As a result, model quality is increasingly determined by the available data rather than manually defined rules.
Splitting: Seamless separation for improved classification and extraction
Splitting extends the DocuWare IDP plugin by enabling the automatic separation of mixed or multi-page documents into individual, logically distinct documents.
The splitter is selected as an agent within the plugin and processes documents directly via DocuWare IDP before they are passed on to classification and extraction.
From external tooling to IDP integration
Before the integration, splitting was not available in the DocuWare IDP plugin and could only be implemented using external tools such as storageRobot or Make.com.
Enhancements in document splitting for IDP workflows
- Fewer manual preprocessing steps — annotation and external document separation have been eliminated
- Improved output quality in classification and extraction thanks to cleanly separated individual documents
- Native integration of splitting into the DocuWare IDP plugin, eliminating external tools and system breaks
Splitting is now integrated into the DocuWare IDP plugin, seamlessly embedding document separation into the workflow. This eliminates the need for external tools and enables a fully automated processing pipeline from ingestion through classification and extraction.
Classification: More precise models through structured data
Document classification is a key prerequisite for any downstream automation. Only when a document type is correctly identified can processes be reliably controlled.
Training via file cabinets and index fields
Previously, training was primarily based on selecting multiple file cabinets, where the cabinet name was used as the class label. However, this approach has limitations in practice, as file cabinets often contain heterogeneous documents.
Enhancement: Use of index fields
With the new extension, an index field can additionally be used to define classes. This enables significantly higher granularity.
Example:
A file cabinet contains 1,000 documents with the index field “DocType”:
- Invoice
- Credit Note
- Delivery Slip
Instead of treating the entire file cabinet as a single class, these values can now be used directly as separate classes. This results in more homogeneous training data and improved classification performance.
Flexible training logic
Users can choose between two approaches:
- multiple file cabinets as the basis for classes
- a single file cabinet plus an index field for more fine-grained classification
This keeps the system flexible and adaptable to different customer structures.
Extraction: End-to-end training without annotation
The most significant simplification concerns document extraction.
Previous process
In the traditional approach, after setting up an extraction process, users had to:
- manually annotate documents
- mark fields
- prepare training data
This step was time-consuming and often a major barrier to adoption.
New Approach: Direct training
With GenAI extraction, this step is completely eliminated.
Instead:
- users start training directly in the IDP plugin
- file cabinets and index fields are selected
- the system automatically creates a training model
- annotation is no longer required
Integration with DocuWare IDP
The training logic is based on integration with the DocuWare IDP platform. Through a gateway interface:
- training data is transferred from DocuWare
- models are created in DocuWare IDP
- results are made available again in DocuWare
This establishes a seamless connection between document management and AI training.
Benefits for companies
The enhancements address a key bottleneck in previous IDP usage: the complexity of model training.
Concrete advantages:
- significantly faster creation of new AI models
- no external training environments required
- use of existing documents as a training basis
- reduced configuration effort
- faster deployment of IDP processes into production
- higher adoption due to simplified user experience
A key assumption is that the previously limited use of IDP is partly driven by the complexity of model creation.
Outlook
With the introduction of end-to-end training for classification and extraction, the DocuWare IDP plugin is evolving toward a fully integrated AI training platform within document management.
Future enhancements will further strengthen this approach and enable the automatic use and continuous improvement of models in live operations.