Java: Find and Extract Hyperlinks in Word Documents

Hyperlinks in Word documents can lead readers to a webpage, an external file, an email address, and a specific place of the document being read. They are commonly used in Word documents for their convenience. This article will teach you how to use Spire.Doc for Java to find and extract hyperlinks in Word documents, including hypertexts and links.

Install Spire.Doc for Java

First, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>12.11.0</version>
    </dependency>
</dependencies>
    

Find and Extract a Specified Hyperlink in a Word Document

The detailed steps are as follows:

  • Create a Document instance and load a Word document from disk using Document.loadFromFile() method.
  • Create an object of ArrayList<Field>.
  • Iterate through the items in the sections to find all hyperlinks.
  • Get the text of the first hyperlink using Field.get().getFieldText() method and get its link using Field.get().getValue() method.
  • Save the text and the link of the first hyperlink to a TXT file using custom method writeStringToText().
  • Java
import com.spire.doc.*;
import com.spire.doc.documents.*;
import com.spire.doc.fields.Field;

import java.io.*;
import java.util.ArrayList;

public class findHyperlinks {
    public static void main(String[] args) throws IOException {
        //Create a Document instance and load a Word document from file
        String input = "D:/testp/test.docx";
        Document doc = new Document();
        doc.loadFromFile(input);

        //Create an object of ArrayList
        ArrayList hyperlinks = new ArrayList();

        //Iterate through the items in the sections to find all hyperlinks
        for (Section section : (Iterable) doc.getSections()) {
            for (DocumentObject object : (Iterable) section.getBody().getChildObjects()) {
                if (object.getDocumentObjectType().equals(DocumentObjectType.Paragraph)) {
                    Paragraph paragraph = (Paragraph) object;
                    for (DocumentObject cObject : (Iterable) paragraph.getChildObjects()) {
                        if (cObject.getDocumentObjectType().equals(DocumentObjectType.Field)) {
                            Field field = (Field) cObject;
                            if (field.getType().equals(FieldType.Field_Hyperlink)) {
                                hyperlinks.add(field);
                            }
                        }
                    }
                }
            }
        }

        //Get the text and the address of the first hyperlink
        String hyperlinksText = hyperlinks.get(0).getFieldText();
        String hyperlinkAddress = hyperlinks.get(0).getValue();

        //Save the text and the link of the first hyperlink to a TXT file
        String output = "D:/javaOutput/HyperlinkTextAndLink.txt";
        writeStringToText("Text:\r\n" + hyperlinksText+ "\r\n" + "Link:\r\n" + hyperlinkAddress, output);
    }

    //Create a method to write the text and link of hyperlinks to a TXT file
    public static void writeStringToText(String content, String textFileName) throws IOException {
        File file = new File(textFileName);
        if (file.exists())
        {
            file.delete();
        }
        FileWriter fWriter = new FileWriter(textFileName, true);
        try {
            fWriter.write(content);
        } catch (IOException ex) {
            ex.printStackTrace();
        } finally {
            try {
                fWriter.flush();
                fWriter.close();
            } catch (IOException ex) {
                ex.printStackTrace();
            }
        }
    }
}

Java: Find and Extract Hyperlinks in Word Documents

Find and Extract All the Hyperlinks in a Word Document

The detailed steps are as follows:

  • Create a Document instance and load a Word document from disk using Document.loadFromFile() method.
  • Create an object of ArrayList<Field>.
  • Iterate through the items in the sections to find all hyperlinks.
  • Get the texts of the hyperlinks using Field.get().getFieldText() method and get their links using Field.get().getValue() method.
  • Save the text and the links of the hyperlinks to a TXT file using custom method writeStringToText().
  • Java
import com.spire.doc.*;
import com.spire.doc.documents.*;
import com.spire.doc.fields.Field;

import java.io.*;
import java.util.ArrayList;

public class findHyperlinks {
    public static void main(String[] args) throws IOException {
        //Create a Document instance and load a Word document from file
        String input = "D:/testp/test.docx";
        Document doc = new Document();
        doc.loadFromFile(input);

        //Create an object of ArrayList
        ArrayList hyperlinks = new ArrayList();
        String hyperlinkText = "";
        String hyperlinkAddress = "";

        //Iterate through the items in the sections to find all hyperlinks
        for (Section section : (Iterable) doc.getSections()) {
            for (DocumentObject object : (Iterable) section.getBody().getChildObjects()) {
                if (object.getDocumentObjectType().equals(DocumentObjectType.Paragraph)) {
                    Paragraph paragraph = (Paragraph) object;
                    for (DocumentObject cObject : (Iterable) paragraph.getChildObjects()) {
                        if (cObject.getDocumentObjectType().equals(DocumentObjectType.Field)) {
                            Field field = (Field) cObject;
                            if (field.getType().equals(FieldType.Field_Hyperlink)) {
                                hyperlinks.add(field);

                                //Get the texts and links of all hyperlinks
                                hyperlinkText += field.getFieldText() + "\r\n";
                                hyperlinkAddress += field.getValue() + "\r\n";
                            }
                        }
                    }
                }
            }
        }

        //Save the texts and the links of the hyperlinks to a TXT file
        String output = "D:/javaOutput/HyperlinksTextsAndLinks.txt";
        writeStringToText("Text:\r\n " + hyperlinkText + "\r\n" + "Link:\r\n" + hyperlinkAddress + "\r\n", output);
    }

    //Create a method to write the text and link of hyperlinks to a TXT file
    public static void writeStringToText(String content, String textFileName) throws IOException {
        File file = new File(textFileName);
        if (file.exists())
        {
            file.delete();
        }
        FileWriter fWriter = new FileWriter(textFileName, true);
        try {
            fWriter.write(content);
        } catch (IOException ex) {
            ex.printStackTrace();
        } finally {
            try {
                fWriter.flush();
                fWriter.close();
            } catch (IOException ex) {
                ex.printStackTrace();
            }
        }
    }
}

Java: Find and Extract Hyperlinks in Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.