<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=7444762&amp;fmt=gif">
Solutions
Products
Resources
Company
Partners
Request a demo

New developments for classification and extraction in the IDP plugin

Companies process large volumes of documents every day, such as invoices, contracts, or delivery notes. However, the real value is only created when this content can be automatically recognized, classified, and structured for downstream processes.

With the new developments in the DocuWare IDP plugin, this step is now significantly simpler. Users can train AI models for classification and extraction directly within DocuWare—based on existing documents in the system, without the need for external training environments.

The goal of this development is to make the creation and use of AI models as simple as possible and to fully integrate them into the existing DocuWare environment.

Creating AI models without system breaks 

A keyobjectiveof this enhancement is to enable users to:

  • train AI models directly within DocuWare  
  • use existing documents from file cabinets as a training basis  
  • create models without external tools or data preparation  
  • immediatelydeploy trained models productively within the system  

Thisestablishesa seamless end-to-end approach in which the entire lifecycle of an IDP model is handled within DocuWare.

The DocuWare IDP plugin as a central platform 

The IDP plugin serves as the central control unit for all Intelligent Document Processing functions. Within this interface, users can:

  • start training for classification, extraction, and splitting  
  • select file cabinets and index fields as a training basis  
  • configure and reuse models  
  • manage the entire lifecycle of IDP workflows  

All steps are bundled into a unified interface, drastically simplifying the previously fragmented training process.

Simplified training concept: From configuration to data 

A key paradigm shift is the move away from manual configuration toward data-driven learning.

Instead of complex model setup, thenew approachis based on:

  • existing document repositories in DocuWare  
  • structured index data  
  • automated generation of training data  

As a result, model quality is increasinglydeterminedby the available data rather than manually defined rules.

Splitting: Seamless separation for improved classification and extraction 

Splitting extends the DocuWare IDP plugin by enabling the automatic separation of mixed or multi-page documents into individual, logically distinct documents.

The splitter is selected as an agent within the plugin and processes documents directly via DocuWare IDP before they are passed on to classification and extraction.

From external tooling to IDP integration 

Before the integration, splitting was not available in the DocuWare IDP plugin and could only be implemented using external tools such asstorageRobotor Make.com.

Enhancements in document splitting for IDP workflows 

  • Fewer manual preprocessing steps — annotation and external document separation have been eliminated  
  • Improved output quality in classification and extraction thanks to cleanly separated individual documents  
  • Native integration of splitting into the DocuWare IDP plugin,eliminatingexternal tools and system breaks

Splitting is now integrated into the DocuWare IDP plugin, seamlessly embedding document separation into the workflow. Thiseliminatesthe need for external tools and enables a fully automated processing pipeline from ingestion through classification and extraction.

Classification: More precise models through structured data 

Document classification is a key prerequisite for any downstream automation. Only when a document type is correctlyidentifiedcan processes be reliably controlled.

Training via file cabinets and index fields 

Previously, training was primarily based on selecting multiple file cabinets, where the cabinet name was used as the class label. However, this approach has limitations in practice, as file cabinets oftencontainheterogeneous documents.

Enhancement: Use of index fields 

With the new extension, an index field can additionally be used to define classes. This enables significantly higher granularity.

Example: 

A file cabinetcontains1,000 documents with the index field DocType:

  • Invoice  
  • Credit Note  
  • Delivery Slip  

Instead of treating the entire file cabinet as a single class, these values can now be used directly as separate classes. This results in more homogeneous training data and improved classification performance.

Flexible training logic 

Users can choose between two approaches:

  • multiple file cabinets as the basis for classes  
  • a single file cabinet plus an index field for more fine-grained classification  

This keeps the system flexible and adaptable to different customer structures.

Extraction: End-to-end training without annotation 

The most significant simplification concerns document extraction.

Previous process 

In the traditional approach, after setting up an extraction process, users had to:

  • manually annotate documents  
  • mark fields  
  • prepare training data  

This step was time-consuming and often a major barrier to adoption.

New Approach: Direct training 

With GenAIextraction,this step iscompletely eliminated.

Instead:

  • users start training directly in the IDP plugin  
  • file cabinets and index fields are selected  
  • the system automatically creates a training model  
  • annotation is no longerrequired  

Integration with DocuWare IDP 

The training logic is based on integration with the DocuWare IDP platform. Through a gateway interface:

  • training data is transferred from DocuWare  
  • models are created in DocuWare IDP
  • results are made available again in DocuWare  

Thisestablishesa seamless connection between document management and AI training.

Benefits for companies 

The enhancements address a key bottleneck inpreviousIDP usage: the complexity of model training.

Concrete advantages: 

  • significantly faster creation of new AI models  
  • no external training environmentsrequired  
  • use of existing documents as a training basis  
  • reduced configuration effort  
  • faster deployment of IDP processes into production  
  • higher adoption due to simplified user experience  

A key assumption is that the previously limited use of IDP is partly driven by the complexity of model creation.

Outlook 

With the introduction of end-to-end training for classification and extraction, the DocuWare IDP plugin is evolving toward a fully integrated AI training platform within document management.

Future enhancements will further strengthen this approach and enable the automatic use and continuous improvement of models in live operations.

Tip: translate it

Maybe you’d rather read this article in your native tongue than in English? No problem! Your browser can handle the translation work. Learn how



Topics

Show all topics

Recent posts