Content Classification

The Content Classification route was developed to facilitate the identification and classification of documents in image files, whether they are single or multi-page. Ideal for various sectors that deal with large volumes of documents, this API is capable of processing and returning accurate results regarding the types of documents present in each submitted file.

What does this route do?

The Content Classification endpoint receives files containing one or more documents and performs advanced processing to determine the types of documents present. For each identified document, the API returns:

Document Type(s) and Subtype(s): Clear and precise identification of the type and subtype of the document(s) present on each page or section of the submitted file.

Score: A numerical indicator representing the reliability and accuracy of the classification, allowing users to quickly assess the accuracy of the results.

Tags: A set of identifiers that characterize the classified document, including country, language, subject, among other characteristics.

Page: The page where the classified document is located in the original file.

Cropped Classified Images (Optional): If requested by the client, the API can also return binary cuts of the images of the classified documents, with the perspective corrected to facilitate later visualization and use.

Comparison with the Content Extraction route

The Content Extraction route is a robust solution that offers not only document classification but also the full extraction of all textual content. However, if your process only requires document typing or classification, without the need for full extraction, the new Content Classification route is ideal. It fulfills the need for prior classification, offering a more cost-effective and efficient approach.

Why use the Content Classification route?

This new route was designed to offer:

Accuracy and Efficiency: Utilize cutting-edge technology to ensure maximum precision in document classification.

Flexibility: The ability to handle single or multi-page files, adapting to various business needs.

Convenience: Obtain detailed and reliable results without the need for manual interventions, optimizing workflows.

Versatility: The option to obtain cropped classified images allows for more detailed manipulation and analysis of processed documents.

By using the Content Classification route, you can:

Reduce Costs: Pre-identify the documents that need to be processed in more detail, avoiding the costs associated with full extraction of unnecessary documents.

Optimize Processes: Assess which documents should be sent to the content extraction route based on the classification results, improving operational efficiency.

Improve Data Quality: Ensure that only relevant and correct documents are sent for extraction, increasing the accuracy and usefulness of the extracted data.

Use Case

Imagine a registration process where the end client needs to submit two documents: an identification document (ID card or driver’s license) and a proof of residence. By directly using the extraction route, you will get the classified document type along with the data extraction. However, if the client submits an unrequested document (e.g., a birth certificate instead of an ID), you will process a document that will not be accepted in your process.

With the new classification route, you can first classify the received documents. If the documents are not of the desired type, you can reject the document and request a new submission without incurring the cost of full extraction. Only when the correct documents are received will you use the Content Extraction route. This is an example of one of the many use cases for the Content Classification route.

Example Workflow:

Document Receipt

  Your end client submits an identification document and a proof of residence.

Initial Classification

  You use the Content Classification endpoint to identify the types of documents received.

Document Validation

  If the documents are classified as the desired types (ID card, driver’s license, proof of residence), they are sent to the Content Extraction route for full content extraction.

  If a document is identified as an unwanted type (e.g., birth certificate), you can reject the document and request a new submission, avoiding unnecessary extraction costs.

Final Processing

  Final Processing: Only the correct documents are processed through the Content Extraction route, optimizing the costs and efficiency of the operation.

This workflow demonstrates how the Content Classification route can be used to significantly improve the efficiency of document processing.

Content Classification

What does this route do?#

Comparison with the Content Extraction route#

Why use the Content Classification route?#

Use Case#

Example Workflow:#

What does this route do?

Comparison with the Content Extraction route

Why use the Content Classification route?

Use Case

Example Workflow: