Request a demo

How to scan for best text recognition

How to scan for best text recognition

There are some easy ways to adjust your scanning devices to make sure paper documents can be seamlessly imported into DocuWare. This will help save you from time-consuming re-scanning and really speeds up filing and correct indexing in DocuWare.

Many invoices, delivery notes or contracts are still delivered on paper and scanned centrally before they end up in DocuWare as digital documents. This is usually done on a network device, where you can adjust the scan quality in detail to help avoid capturing errors. Once adjusted, the data you are grabbing – for example, from invoices – can be read almost error-free. This is not only important for the initial indexing when archiving your documents, but it’s also key for downstream processes, like automatically reconciling invoice items with a delivery note and purchase order. Often it’s just a question of “tightening a few screws“ and it will significantly improve scan outcomes. Here are some handy tips:

Which file format for which purpose?

What should happen to your scanned (i.e. digitized) document? The answer to this is crucial for the choice of file format. 

Read data: PDF or PDF/A

When importing, DocuWare reads data from documents such as invoices in order to use this information for indexing and trigger workflows. This can be read from text or barcodes.

File only: PDF, PDF/A, PNG, JPEG, TIFF and other formats

Other documents – like blueprints – will only be archived in DocuWare and displayed by the DocuWare Client or sent as email attachments. In this case, you can choose any of these formats. To ensure that these documents are well-displayed, choose a higher scan quality.

In color or not

The nature of your business dictates the need for capturing color in your documents. For example, "Color" mode can be useful if contracts have been signed with different pen colors following a specific rights system or for plans that are digitized in color. For invoices and delivery notes, the black-and-white setting, which also requires the least amount of storage space, or grayscale is usually sufficient. 

Resolution or what "dpi" means

DocuWare can best index your documents if the resolution is optimally set for scanning. Resolution is the the dot density of the file, which is measured in dpi, "dots per inch." A resolution of at least 300 dpi is recommended. At a lower value, it is easy to encounter optical character recognition (OCR) errors. For example, an "i" or "!" can be read out instead as "I"; or two "v‘s" will be recognized instead of "w."

Depending on the color mode, different dpi values are recommended: 300 or 400 dpi for black-and-white scans, and 150 to 300 dpi are often sufficient for grayscale and color scans.

How font size and resolution work together

You should also pay attention to a document‘s font size, because this will also impact the ideal resolution. Rule of thumb: the smaller the font, the higher the resolution.

How font size and resolution work together

Black and white mode: The character string "Ilti" is only accurately captured when using 300 dpi. 
Grayscale mode: Even at 100 dpi, the characters are easy to recognize, but other image information still has an impact, such as the structure of the paper used. This will enlarge the size of the file and take up space unnecessarily. Black-and-white mode may therefore be superior for the same dpi number, because less memory is required.

Best data quality = best processes in DocuWare

Make sure you have the right settings for your scanner to ensure the best data quality of your documents. This saves you a ton of time and money because scanning is faster and your processes with DocuWare run like clockwork. 

Read also how to avoid manual entry by using document content.