In the recent past, we published an article that is about how to extract text from a PDF File in Java programmatically. In this blog post, we will learn how to extract table from PDF in Java using this Java API for PDF. This library provides powerful features to parse and manipulate PDF documents. In addition, you can automate the data extraction by building a PDF table extractor using this library. However, please ensure you have installed Java on your local machine before proceeding.
The following points shall be covered in this guide:
PDF Table Extractor - Java API Installation
The installation process of this library is straight and simple. You can either download the API or install it using the following Maven configurations.
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-pdf</artifactId>
<classifier>jdk17</classifier>
</dependency>
Extract Table From PDF in Java
Once API is installed, you can start writing code to build a PDF table extractor programmatically in Java.
You may follow the steps mentioned below:
- Create an object of the Document class and load the source PDF file.
- Instantiate an instance of the TableAbsorber class that performs a search and provides access to search results.
- Loop through the PDF document pages by calling the getPages method.
- Invoke the visit method to extract the table from the page.
- Call the getTableList method that returns read-only IList containing tables that were found.
- Get the rows by calling the getRowList method Iterate through the list of rows.
- Iterate through the list of cell by invoking the getCellList method.
- Invoke the getTextFragments method to get a collection of TextFragment objects that describe text contained in the cell.
- Call the getSegments method that gets text segments for the current TextFragment.
- Print the results.
Copy & paste the following code into your main file to extract data from a table in PDF using Java:
String filePath = "table.pdf"; | |
// Create an object of the Document class and load the source PDF file | |
Document pdfDocument = new Document(filePath); | |
// Instantiate an instance of the TableAbsorber class that performs search and provides access to search results | |
TableAbsorber absorber = new TableAbsorber(); | |
// Loop through the PDF document pages by calling the getPages method | |
for (Page page : pdfDocument.getPages()) { | |
// Invoke the visit method to extract table from the page | |
absorber.visit(page); | |
// Call the getTableList method that returns readonly IList containing tables that were found | |
for (AbsorbedTable table : absorber.getTableList()) { | |
System.out.println("Table"); | |
// Get the rows by calling the getRowList method Iterate through list of rows | |
for (AbsorbedRow row : table.getRowList()) { | |
// Iterate through list of cell by invoking the getCellList method | |
for (AbsorbedCell cell : row.getCellList()) { | |
// Invoke the getTextFragments method to get collection of TextFragment objects that describes text containing in the cell | |
for (TextFragment fragment : cell.getTextFragments()) { | |
StringBuilder sb = new StringBuilder(); | |
// Invoke the getSegments method that gets text segments for current TextFragment. | |
for (TextSegment seg : fragment.getSegments()) | |
sb.append(seg.getText()); | |
System.out.print(sb.toString() + "|"); | |
} | |
} | |
System.out.println(); | |
} | |
} | |
} |
Get a Free License
You can get a free temporary license to try the API without evaluation limitations.
Summing up
This brings us to the end of this blog post. You have learned how to extract table from PDF in Java programmatically. This guide will help you if you are looking to build a PDF table extractor for your business application. Further, you may visit the documentation to know the other features.
Moreover, we suggest you follow our Getting Started guide.
Finally, conholdate.com is writing new blog posts. Therefore, please stay in touch for the latest updates.
Ask a question
You can let us know about your questions or queries on our forum.
FAQs
Can I extract tables from PDF?
Please visit this link to know the steps and the code snippet that extracts a table from a PDF document programmatically in Java.