Java: Get Coordinates of Text or Images in PDF

2024-10-18 05:42:00 Written by  support iceblue
Rate this item
(0 votes)

Getting the coordinates of text or images in a PDF helps accurately identify elements, making it easier to extract content. This is especially important for data analysis, where specific information needs to be pulled from complicated layouts. Additionally, knowing these coordinates allows users to add notes, marks, or stamps in the right places, improving document interactivity and collaboration by letting them highlight important sections or add comments exactly where they're needed.

In this article, you will learn how to get coordinates of the specified text or image in a PDF document using Java and Spire.PDF for Java library.

Install Spire.PDF for Java

First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>10.10.0</version>
    </dependency>
</dependencies>
    

Coordinate System in Spire.PDF

When utilizing Spire.PDF for Java to work with an existing PDF document, it's important to note that the coordinate system's origin is positioned at the top-left corner of the page. The x-axis extends to the right, and the y-axis extends downward, as illustrated below.

Java: Get Coordinates of Text or Images in PDF

Get Coordinates of the Specified Text in PDF

To start, you can use the PdfTextFinder.find() method to search for all occurrences of the specified text on the page, which results in a list of PdfTextFragment. After that, you can retrieve the coordinates of the first occurrence of the text using the PdfTextFragment.getPositions() method.

The steps to get coordinates of the specified text in PDF are as follows:

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Get a specific page using PdfDocument.getPages().get() method.
  • Search for all occurrences of the specified text on the page using PdfTextFinder.find() method and return results in a list of PdfTextFragment.
  • Access a specific PdfTextFragment in the list, and get the coordinates of the fragment using PdfTextFragment.getPositions() method.
  • Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.texts.PdfTextFindOptions;
import com.spire.pdf.texts.PdfTextFinder;
import com.spire.pdf.texts.PdfTextFragment;
import com.spire.pdf.texts.TextFindParameter;

import java.awt.geom.Point2D;
import java.util.EnumSet;
import java.util.List;

public class GetTextCoordinates {

    public static void main(String[] args) {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF file
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

        // Get a specific page
        PdfPageBase page = doc.getPages().get(0);

        // Create a PdfTextFinder object
        PdfTextFinder finder = new PdfTextFinder(page);

        // Set the find options
        PdfTextFindOptions options = new PdfTextFindOptions();
        options.setTextFindParameter(EnumSet.of(TextFindParameter.IgnoreCase));
        finder.setOptions(options);

        // Find all instances of the text
        List fragments = finder.find("Personal Data");

        // Get a specific text fragment
        PdfTextFragment fragment = fragments.get(0);

        // Get the positions of the text (If the text spans multiple lines, there will be more than one position)
        Point2D[] positions = fragment.getPositions();

        // Get its first position
        double x = positions[0].getX();
        double y = positions[0].getY();

        // Print result
        System.out.println(String.format("The text is located at: (%f, %f).",x,y));
    }
}

Java: Get Coordinates of Text or Images in PDF

Get Coordinates of the Specified Image in PDF

To begin, you can use the PdfImageHelper.getImagesInfo() method to retrieve information about all images on the specified page, storing the results in an array of PdfImageInfo. Next, you can obtain the X and Y coordinates of a specific image using the PdfImageInfo.getBounds().getX() and PdfImageInfo.getBounds().getY() methods.

The steps to get coordinates of the specified image in PDF are as follows:

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Get a specific page using PdfDocument.getPages().get() method.
  • Retrieve all the image information on the page using PdfImageHelper.getImagesInfo() method and return results in an array of PdfImageInfo.
  • Get X and Y coordinates of a specific image using PdfImageInfo.getBounds().getX() and PdfImageInfo.getBounds().getY() methods
  • Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.utilities.PdfImageHelper;
import com.spire.pdf.utilities.PdfImageInfo;

public class GetImageCoordinates {

    public static void main(String[] args) {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF file
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input2.pdf");

        // Get a specific page
        PdfPageBase page = doc.getPages().get(0);

        // Create a PdfImageHelper object
        PdfImageHelper helper = new PdfImageHelper();

        // Get image information from the page
        PdfImageInfo[] imageInfo = helper.getImagesInfo(page);

        // Get X, Y coordinates of the first image
        double x = imageInfo[0].getBounds().getX();
        double y = imageInfo[0].getBounds().getY();

        // Print result
        System.out.println(String.format("The image is located at: (%f, %f).",x,y));
    }
}

Java: Get Coordinates of Text or Images in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

  • tutorial_title:
Last modified on Friday, 18 October 2024 05:38