New developments for classification and extraction in the IDP plugin

Companies process large volumes of documents every day, such as invoices, contracts, or delivery notes. However, the real value is only created when this content can be automatically recognized, classified, and structured for downstream processes. 

With the new developments in the DocuWare IDP plugin, this step is now significantly simpler. Users can train AI models for classification and extraction directly within DocuWare—based on existing documents in the system, without the need for external training environments. 

The goal of this development is to make the creation and use of AI models as simple as possible and to fully integrate them into the existing DocuWare environment. 

Creating AI models without system breaks 

A key objective of this enhancement is to enable users to: 

train AI models directly within DocuWare  
use existing documents from file cabinets as a training basis  
create models without external tools or data preparation  
immediately deploy trained models productively within the system

This establishes a seamless end-to-end approach in which the entire lifecycle of an IDP model is handled within DocuWare.  

The DocuWare IDP plugin as a central platform 

The IDP plugin serves as the central control unit for all Intelligent Document Processing functions. Within this interface, users can: 

start training for classification, extraction, and splitting  
select file cabinets and index fields as a training basis  
configure and reuse models  
manage the entire lifecycle of IDP workflows

All steps are bundled into a unified interface, drastically simplifying the previously fragmented training process. 

Simplified training concept: From configuration to data 

A key paradigm shift is the move away from manual configuration toward data-driven learning. 

Instead of complex model setup, the new approach is based on: 

existing document repositories in DocuWare  
structured index data  
automated generation of training data

As a result, model quality is increasingly determined by the available data rather than manually defined rules. 

Splitting: Seamless separation for improved classification and extraction 

Splitting extends the DocuWare IDP plugin by enabling the automatic separation of mixed or multi-page documents into individual, logically distinct documents. 

The splitter is selected as an agent within the plugin and processes documents directly via DocuWare IDP before they are passed on to classification and extraction. 

From external tooling to IDP integration 

Before the integration, splitting was not available in the DocuWare IDP plugin and could only be implemented using external tools such as storageRobot or Make.com. 

Enhancements in document splitting for IDP workflows 

Fewer manual preprocessing steps — annotation and external document separation have been eliminated  
Improved output quality in classification and extraction thanks to cleanly separated individual documents  
Native integration of splitting into the DocuWare IDP plugin, eliminating external tools and system breaks

Splitting is now integrated into the DocuWare IDP plugin, seamlessly embedding document separation into the workflow. This eliminates the need for external tools and enables a fully automated processing pipeline from ingestion through classification and extraction. 

Classification: More precise models through structured data 

Document classification is a key prerequisite for any downstream automation. Only when a document type is correctly identified can processes be reliably controlled. 

Training via file cabinets and index fields 

Previously, training was primarily based on selecting multiple file cabinets, where the cabinet name was used as the class label. However, this approach has limitations in practice, as file cabinets often contain heterogeneous documents. 

Enhancement: Use of index fields 

With the new extension, an index field can additionally be used to define classes. This enables significantly higher granularity. 

Example: 

A file cabinet contains 1,000 documents with the index field “DocType”: 

Invoice  
Credit Note  
Delivery Slip

Instead of treating the entire file cabinet as a single class, these values can now be used directly as separate classes. This results in more homogeneous training data and improved classification performance. 

Flexible training logic 

Users can choose between two approaches: 

multiple file cabinets as the basis for classes  
a single file cabinet plus an index field for more fine-grained classification

This keeps the system flexible and adaptable to different customer structures. 

Extraction: End-to-end training without annotation 

The most significant simplification concerns document extraction. 

Previous process 

In the traditional approach, after setting up an extraction process, users had to: 

manually annotate documents  
mark fields  
prepare training data

This step was time-consuming and often a major barrier to adoption. 

New Approach: Direct training 

With GenAI extraction, this step is completely eliminated. 

Instead: 

users start training directly in the IDP plugin  
file cabinets and index fields are selected  
the system automatically creates a training model  
annotation is no longer required

Integration with DocuWare IDP 

The training logic is based on integration with the DocuWare IDP platform. Through a gateway interface: 

training data is transferred from DocuWare  
models are created in DocuWare IDP 
results are made available again in DocuWare

This establishes a seamless connection between document management and AI training. 

Benefits for companies 

The enhancements address a key bottleneck in previous IDP usage: the complexity of model training. 

Concrete advantages: 

significantly faster creation of new AI models  
no external training environments required  
use of existing documents as a training basis  
reduced configuration effort  
faster deployment of IDP processes into production  
higher adoption due to simplified user experience

A key assumption is that the previously limited use of IDP is partly driven by the complexity of model creation. 

Outlook 

With the introduction of end-to-end training for classification and extraction, the DocuWare IDP plugin is evolving toward a fully integrated AI training platform within document management. 

Future enhancements will further strengthen this approach and enable the automatic use and continuous improvement of models in live operations. 

New developments for classification and extraction in the IDP plugin

Creating AI models without system breaks

The DocuWare IDP plugin as a central platform

Simplified training concept: From configuration to data

Splitting: Seamless separation for improved classification and extraction

From external tooling to IDP integration

Enhancements in document splitting for IDP workflows

Classification: More precise models through structured data

Training via file cabinets and index fields

Enhancement: Use of index fields

Example:

Flexible training logic

Extraction: End-to-end training without annotation

Previous process

New Approach: Direct training

Integration with DocuWare IDP

Benefits for companies

Concrete advantages:

Outlook

Written by

Julia Schließmeier

Tip: translate it

Topics

Recent posts

Explore DocuWare

Partner Resources

Legal

Contact Us

New developments for classification and extraction in the IDP plugin

Creating AI models without system breaks

The DocuWare IDP plugin as a central platform

Simplified training concept: From configuration to data

Splitting: Seamless separation for improved classification and extraction

From external tooling to IDP integration

Enhancements in document splitting for IDP workflows

Classification: More precise models through structured data

Training via file cabinets and index fields

Enhancement: Use of index fields

Example:

Flexible training logic

Extraction: End-to-end training without annotation

Previous process

New Approach: Direct training

Integration with DocuWare IDP

Benefits for companies

Concrete advantages:

Outlook

Written by

Julia Schließmeier

Tip: translate it

Topics

Recent posts

Explore DocuWare

Partner Resources

Legal

Contact Us

Creating AI models without system breaks 

The DocuWare IDP plugin as a central platform 

Simplified training concept: From configuration to data 

Splitting: Seamless separation for improved classification and extraction 

From external tooling to IDP integration 

Enhancements in document splitting for IDP workflows 

Classification: More precise models through structured data 

Training via file cabinets and index fields 

Enhancement: Use of index fields 

Example: 

Flexible training logic 

Extraction: End-to-end training without annotation 

Previous process 

New Approach: Direct training 

Integration with DocuWare IDP 

Benefits for companies 

Concrete advantages: 

Outlook