Modifying PDF content programmatically can be essential in scenarios where sensitive or outdated information must be replaced before distribution. One of the most practical features in PDF manipulation is the ability to search for a specific phrase or pattern and replace it with alternate content. In this detailed guide, we will demonstrate how you can find and replace text in PDF files using Java. The focus will be on using the powerful Conholdate.Total for Java SDK, which enables developers to perform robust document manipulation tasks including redaction and replacement with just a few lines of code.
We will cover two important scenarios. The first one will explain how to locate an exact word or phrase in a PDF and replace it. The second one will walk through using regular expressions to match and substitute variable patterns such as phone numbers, account numbers, or other custom data formats.
Why Find and Replace Text in PDF Files?
Ensure Privacy and Compliance: Easily remove personal or confidential data before publishing or sharing PDF files.
Update Documents Efficiently: Automatically replace outdated terms or content across large batches of files.
Flexible Text Search Options: Utilize both exact phrase matching and regular expressions for comprehensive redaction.
Preserve Document Integrity: Maintain the original layout and formatting while replacing the content.
Automation Friendly: Integrate into Java workflows for batch processing and document automation tasks.
Find and Replace Text in PDF using Java - SDK Installation
To redact PDF documents, you have to configure Conholdate.Total for Java SDK in your environment. You may download the DLL file from the New Releases section or use the NuGet installation command below in Visual Studio:
Install-Package Conholdate.Total
Find and Replace Text in PDF using Java
When dealing with PDF documents that contain sensitive information like names, contact details, or organization identifiers, there is often a need to redact or replace these words before sharing the file. Conholdate.Total for Java SDK provides a straightforward way to accomplish this. The following example demonstrates how to search for an exact match of a word or phrase and substitute it with a placeholder or alternative text.
final Redactor redactor = new Redactor("path/document.docx");
redactor.apply(new ExactPhraseRedaction("John Doe", new ReplacementOptions("[censored]")));
// If you want to save the redacted file at different location with diferent name.
FileOutputStream stream = new FileOutputStream("path/exactPhrase.docx");
RasterizationOptions rasterOptions = new RasterizationOptions();
rasterOptions.setEnabled(false);
redactor.save(stream, rasterOptions);
In this snippet, the SDK scans the PDF file for the phrase “John Doe” and replaces every instance with the term “[censored]”. This operation is especially useful in automating the removal of personally identifiable information from documents. The AddSuffix option ensures the original file remains untouched by saving the edited version as a new file, while the RasterizeToPDF flag keeps the output as a text-based PDF rather than converting it into an image.
Find and Replace Text in PDF with Regular Expressions in Java
Sometimes, text that needs to be replaced in a PDF does not follow a fixed format. For instance, patterns like invoice numbers, postal codes, or identification numbers may vary from document to document. In such situations, regular expressions offer a dynamic solution. The Conholdate.Total for Java SDK allows developers to define regex patterns to detect and replace complex text structures within a PDF.
// Find text using regular expression and replace it with some other text using Java
final Redactor redactor = new Redactor("path/document.docx");
redactor.apply(new RegexRedaction("\\d{2}\\s*\\d{2}[^\\d]*\\d{6}", new ReplacementOptions("[censored]")));
redactor.save();
The regular expression used in this code matches a pattern commonly used for formatted codes, such as bank references or transaction identifiers. Any text fitting this pattern is automatically located and replaced with a predefined label like “[censored]”. This level of flexibility ensures you can protect sensitive information even when you don’t know its exact content ahead of time.
Conclusion
Finding and replacing text in PDF documents using Java has never been easier, thanks to the capabilities offered by Conholdate.Total for Java. Whether you’re replacing a specific word or searching for variable patterns with regular expressions, this SDK ensures that you can manipulate your PDF content with precision and control. The ability to redact sensitive information, automate updates, and preserve document formatting makes this a powerful tool for developers across industries. Integrate it into your Java projects today and streamline your document processing workflows with confidence.
