Extract Table From PDF in C#

Extracting tables from PDF files can be a common task when dealing with data extraction or analysis. Whether you’re trying to retrieve tables for processing or exporting them for further use, automating this process can save time and effort. In this blog post, we will show you how to extract tables from a PDF in C# or export the extracted data to a CSV file.

We will be covering the following sections in this article:

Why Extract Tables from PDF?

PDF files are widely used for sharing information in a portable format. However, extracting structured data like tables can be challenging because PDF files are not designed for easy editing or manipulation. By extracting table data programmatically in C#, you can make the data more accessible, allowing you to analyze, modify, or store the data in different formats like CSV, which is widely used for data exchange between different applications.

PDF Table Extraction - C# API Configuration

You can extract data from PDF tables by configuring Conholdate.Total for .NET with the NuGet installation command below:

PM> NuGet\Install-Package Conholdate.Total

Extract Table From PDF in C#

In this section, we’ll walk through extracting tables from a PDF document using C#.

  • Loading the PDF Document: The code starts by loading the PDF file into a Document class object.
  • Table Absorber: The TableAbsorber class is used to detect and absorb tables on each page of the PDF.
  • Extracting Table Data: The code iterates through each table found, extracting the content of each cell in the table, and prints it in a structured format.

This method provides an effective way to access tabular data from PDF documents, enabling you to work with the information programmatically. The code snippet below demonstrates the table extraction in C#:

Convert PDF Table Data to CSV in C#

Once you have extracted table data from a PDF, you might want to save it in a more accessible format like CSV. CSV (Comma Separated Values) is widely supported by applications like Microsoft Excel, Google Sheets, and many data processing systems. In this section, we will convert the PDF table data to CSV in C#:

The following steps show how to export data from PDF tables to CSV format in C#:

  • Loading the PDF Document: The PDF document is loaded into the Document object as before.
  • ExcelSaveOptions: An instance of ExcelSaveOptions is created with the format set to CSV. This ensures that the table data from the PDF is exported directly as a CSV file.
  • Saving the CSV File: The Save method is called to export the PDF table data into a .csv file.

The code sample below shows how to convert PDF table to CSV format in C#:

Get a Free License

You can get a free temporary license to test different features of the API without any evaluation limitations.

Summing up

Extracting tables from a PDF and exporting the data to a CSV format in C# can be made simple and efficient in C#. The ability to automate table extraction helps in scenarios where you need to analyze or share tabular data from PDFs in a more flexible format like CSV. By following the steps in this blog post, you can easily retrieve and store table data from your PDF files. This process is especially useful in data analysis, reporting, or any task that requires structured data from otherwise rigid PDF documents. In case of any queries, please get in touch with us at the forum.

See Also