Extract Table From PDF in Java

Extract Table From PDF in Java

In the recent past, we published an article that is about how to extract text from a PDF File in Java programmatically. In this blog post, we will learn how to extract table from PDF in Java using this Java API for PDF. This library provides powerful features to parse and manipulate PDF documents. In addition, you can automate the data extraction by building a PDF table extractor using this library. However, please ensure you have installed Java on your local machine before proceeding.

The following points shall be covered in this guide:

PDF Table Extractor - Java API Installation

The installation process of this library is straight and simple. You can either download the API or install it using the following Maven configurations.

 <repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java API</name>
    <url>https://repository.aspose.com/repo/</url>
</repository>
<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-pdf</artifactId>
    <classifier>jdk17</classifier>
</dependency>

Extract Table From PDF in Java

Once API is installed, you can start writing code to build a PDF table extractor programmatically in Java.

You may follow the steps mentioned below:

  1. Create an object of the Document class and load the source PDF file.
  2. Instantiate an instance of the TableAbsorber class that performs a search and provides access to search results.
  3. Loop through the PDF document pages by calling the getPages method.
  4. Invoke the visit method to extract the table from the page.
  5. Call the getTableList method that returns read-only IList containing tables that were found.
  6. Get the rows by calling the getRowList method Iterate through the list of rows.
  7. Iterate through the list of cell by invoking the getCellList method.
  8. Invoke the getTextFragments method to get a collection of TextFragment objects that describe text contained in the cell.
  9. Call the getSegments method that gets text segments for the current TextFragment.
  10. Print the results.

Copy & paste the following code into your main file to extract data from a table in PDF using Java:

String filePath = "table.pdf";
// Create an object of the Document class and load the source PDF file
Document pdfDocument = new Document(filePath);
// Instantiate an instance of the TableAbsorber class that performs search and provides access to search results
TableAbsorber absorber = new TableAbsorber();
// Loop through the PDF document pages by calling the getPages method
for (Page page : pdfDocument.getPages()) {
// Invoke the visit method to extract table from the page
absorber.visit(page);
// Call the getTableList method that returns readonly IList containing tables that were found
for (AbsorbedTable table : absorber.getTableList()) {
System.out.println("Table");
// Get the rows by calling the getRowList method Iterate through list of rows
for (AbsorbedRow row : table.getRowList()) {
// Iterate through list of cell by invoking the getCellList method
for (AbsorbedCell cell : row.getCellList()) {
// Invoke the getTextFragments method to get collection of TextFragment objects that describes text containing in the cell
for (TextFragment fragment : cell.getTextFragments()) {
StringBuilder sb = new StringBuilder();
// Invoke the getSegments method that gets text segments for current TextFragment.
for (TextSegment seg : fragment.getSegments())
sb.append(seg.getText());
System.out.print(sb.toString() + "|");
}
}
System.out.println();
}
}
}

Get a Free License

You can get a free temporary license to try the API without evaluation limitations.

Summing up

This brings us to the end of this blog post. You have learned how to extract table from PDF in Java programmatically. This guide will help you if you are looking to build a PDF table extractor for your business application. Further, you may visit the documentation to know the other features.

Moreover, we suggest you follow our Getting Started guide.

Finally, conholdate.com is writing new blog posts. Therefore, please stay in touch for the latest updates.

Ask a question

You can let us know about your questions or queries on our forum.

FAQs

Can I extract tables from PDF?

Please visit this link to know the steps and the code snippet that extracts a table from a PDF document programmatically in Java.

See Also