Classify PDF Documents using C#

You can classify documents using pre-defined tags or categories within IAB-2, Documents, and Sentiment taxonomies programmatically. The classification of documents makes it easier to find the relevant information at the right time. It also helps to manage and sort the documents for searching and retrieving the information that matters. In this article, you will learn how to classify PDF documents using C#.

The following topics are discussed/covered in this article:

C# API for PDF Classification

I will be using GroupDocs.Classification for .NET API for the classification of the PDF files. It offers an advanced document and text classification ‎in predefined categories. The API supports different types of taxonomies such as IAB-2, Documents, and Sentiment taxonomy. It analyzes the text and shows classification information including the best class with its probability score. You can classify a variety of industry-standard document formats such as PDF, Word, OpenDocument, RTF, and TXT. The API also offers Sentiment analysis with auto-detection of language and supports English, Chinese, Spanish, and German languages. It can be used to develop applications in any development environment that targets the .NET platform.

You can either download the DLL of the API or install it using the NuGet.

Install-Package GroupDocs.Classification

Classify PDF Documents with IAB-2 Taxonomy using C#

You can easily classify PDF documents with IAB-2 taxonomy programmatically by following the simple steps given below:

The following code sample shows how to classify PDF with IAB-2 taxonomy using C#.

Classify PDF Documents with IAB-2 Taxonomy using C#

Classify PDF Documents with IAB-2 Taxonomy using C#

The Classifier class is the main class that provides various methods to classify the documents. The Classify() method of this class classifies documents by file name and directory name. The bestClassesCount parameter defines the count of the best matching classes to return. In the above code example, I used Taxonomy.IAB2 taxonomy for classification.

The ClassificationResponse class provides properties and methods to show the retrieved classification information.

PDF Classification with Documents Taxonomy using C#

You can classify PDF documents with Documents taxonomy programmatically by following the easy steps given below:

The following code sample shows how to classify PDF with Documents taxonomy using C#.

Classify PDF with Documents Taxonomy using C#

Classify PDF with Documents Taxonomy using C#

PDF Document Classification from Stream using C#

You can classify PDF documents from file stream programmatically by following the few steps given below:

The following code sample shows how to classify PDF from document stream using C#.

Classify Password Protected PDF Files using C#

You can easily classify password-protected PDF documents programmatically by following the simple steps given below:

The following code sample shows how to classify password protected PDF file using C#.

Get a Free License

You can try the API without evaluation limitations by requesting a free temporary license.

Conclusion

In this article, you have learned how to classify PDF documents using C#. You have also learned how to classify documents with IAB-2 taxonomy and Documents taxonomy. Moreover, you have learned how to classify documents while loading them using file stream instead of the file path in C#. You can learn more about GroupDocs.Classification for .NET API using the documentation. In case of any ambiguity, please feel free to contact us on the forum.

See Also