Document types
A document type is the definition of a logical type of document that different business processes must handle.
What is a document type and what can it contain?
Document types include invoices, medical records, IRS Forms W-2, contracts, and others. A document type, besides a name, group, and category, usually contains a collection of fields.
For example, invoices usually contain the following information:
- Vendor name, vendor address, billing name, billing address
- Invoice number, purchase order number, payment terms, due date
- Net amount, tax amount, discount, total amount
- VAT number, VAT rate
- Bank account number, bank name, SWIFT, IBAN
Figure 1. Invoice example

Document type formats
Document types can be classified based on their format. Some document types have very structured content, while others mainly consist of free text.
Documents are classified into three main formats:
- Structured
- Semi-structured
- Unstructured
Documents can often be a combination of these three categories. A file can have a structured heading, followed by an unstructured, free-form content. They can also contain unstructured content with specific information that always appears in a very structured or repeating context.
Structured documents
Structured documents include:
- Surveys
- Questionnaires
- Tax forms
- Passports
- Licenses
- Time sheets
These documents are designed to collect information in a specific format. They typically contain key-value pairs, tables, handwritten text, signatures, and checkboxes. These documents guide the user by providing precise areas for entering each piece of data. Such documents are commonly called forms and are used to collect low-diversity data.
Figure 2. Driver license, an example of a structured document

Semi-structured documents
Semi-structured documents are documents that do not follow a strict format like structured forms and are not bound to specified data fields. These don't have a fixed form but follow a common enough format. They contain fixed and variable parts, like tables. They may contain paragraphs as well, but data is mainly found in key-value pairs. Semi-structured documents include:
- Invoices
- Receipts
- Purchase orders
- Healthcare lab reports
- Bank statements
- Utility bills
Figure 3. Invoice, an example of a semi-structured document

Unstructured documents
Unstructured documents are files that do not follow a specific or organized model. They do not have a fixed format, and the information they contain is often presented in an unstructured manner, making it difficult for robots to process. While humans can easily understand these documents, the data can be challenging for machines to interpret. Unstructured documents can take many forms, including:
- Contracts
- Leases
- Annual reports
- Agreements
- News articles
Figure 4. License agreement, an example of an unstructured document
