Java: Extract or Update Textboxes in Word

Text boxes in Microsoft Word are flexible elements that improve the layout and design of documents. They enable users to place text separately from the main text flow, facilitating the creation of visually attractive documents. At times, you might need to extract text from these text boxes for reuse, or update the content within them to maintain clarity and relevance. This article demonstrates how to extract or update textboxes in a Word document using Java with Spire.Doc for Java.

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>12.9.4</version>
    </dependency>
</dependencies>
    

Extract Text from a Textbox in Word in Java

With Spire.Doc for Java, you can access a specific text box in a document using the Document.getTextBoxes().get() method. You can then iterate through the child objects of the text box to check if each one is a paragraph or a table. For paragraphs, retrieve the text using the Paragraph.getText() method. For tables, loop through the cells to extract text from each cell.

Here are the steps to extract text from a text box in a Word document:

  • Create a Document object.
  • Load a Word file using Document.loadFromFile() method.
  • Access a specific text box using Document.getTextBoxes().get() method.
  • Iterate through the child objects of the text box.
  • Check if a child object is a paragraph. If so, use Paragraph.getText() method to get the text.
  • Check if a child object is a table. If so, use extractTextFromTable() method to retrieve the text from the table.
  • Java
import com.spire.doc.*;
import com.spire.doc.documents.DocumentObjectType;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.fields.TextBox;

import java.io.FileWriter;
import java.io.IOException;

public class ExtractTextFromTextbox {

    public static void main(String[] args) throws IOException {

        // Create a Document object
        Document document = new Document();

        // Load a Word file
        document.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx");

        // Get a specific textbox
        TextBox textBox = document.getTextBoxes().get(0);

        // Create a FileWriter to write extracted text to a txt file
        FileWriter fileWriter = new FileWriter("Extracted.txt");

        // Iterate though child objects of the textbox
        for (Object object: textBox.getChildObjects()) {

            // Determine if the child object is a paragraph
            if (((DocumentObject) object).getDocumentObjectType() == DocumentObjectType.Paragraph) {

                // Write paragraph text to the txt file
                fileWriter.write(((Paragraph)object).getText() + "\n");
            }

            // Determine if the child object is a table
            if (((DocumentObject) object).getDocumentObjectType() == DocumentObjectType.Table) {

                // Extract text from table to the txt file
                extractTextFromTable((Table)object, fileWriter);
            }
        }

        // Close the stream
        fileWriter.close();
    }

    // Extract text from a table
    static void extractTextFromTable(Table table, FileWriter fileWriter) throws IOException {
        for (int i = 0; i < table.getRows().getCount(); i++) {
            TableRow row = table.getRows().get(i);
            for (int j = 0; j < row.getCells().getCount(); j++) {
                TableCell cell = row.getCells().get(j);
                for (Object paragraph: cell.getParagraphs()) {
                    fileWriter.write(((Paragraph) paragraph).getText() + "\n");
                }
            }
        }
    }
}

Java: Extract or Update Textboxes in Word

Update a Textbox in Word in Java

To modify a text box, first remove its existing content using TextBox.getChildObjects.clear() method. Then, create a new paragraph and assign the desired text to it.

Here are the steps to update a text box in a Word document:

  • Create a Document object.
  • Load a Word file using Document.loadFromFile() method.
  • Get a specific textbox using Document.getTextBoxes().get() method.
  • Remove existing content of the textbox using TextBox.getChildObjects().clear() method.
  • Add a paragraph to the textbox using TextBox.getBody().addParagraph() method.
  • Add text to the paragraph using Paragraph.appendText() method.
  • Save the document to a different Word file.
  • Java
import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.documents.Paragraph;
import com.spire.doc.fields.TextBox;
import com.spire.doc.fields.TextRange;

public class UpdateTextbox {

    public static void main(String[] args) {

        // Create a Document object
        Document document = new Document();

        // Load a Word file
        document.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx");

        // Get a specific textbox
        TextBox textBox = document.getTextBoxes().get(0);

        // Remove child objects of the textbox
        textBox.getChildObjects().clear();

        // Add a new paragraph to the textbox
        Paragraph paragraph = textBox.getBody().addParagraph();

        // Set line spacing
        paragraph.getFormat().setLineSpacing(15f);

        // Add text to the paragraph
        TextRange textRange = paragraph.appendText("The text in this textbox has been updated.");

        // Set font size
        textRange.getCharacterFormat().setFontSize(15f);

        // Save the document to a different Word file
        document.saveToFile("UpdateTextbox.docx", FileFormat.Docx_2019);

        // Dispose resources
        document.dispose();
    }
}

Java: Extract or Update Textboxes in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.