Text boxes in Microsoft Word are flexible elements that improve the layout and design of documents. They enable users to place text separately from the main text flow, facilitating the creation of visually attractive documents. At times, you might need to extract text from these text boxes for reuse, or update the content within them to maintain clarity and relevance. This article demonstrates how to extract or update textboxes in a Word document using Java with Spire.Doc for Java.
Install Spire.Doc for Java
First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>12.11.0</version> </dependency> </dependencies>
Extract Text from a Textbox in Word in Java
With Spire.Doc for Java, you can access a specific text box in a document using the Document.getTextBoxes().get() method. You can then iterate through the child objects of the text box to check if each one is a paragraph or a table. For paragraphs, retrieve the text using the Paragraph.getText() method. For tables, loop through the cells to extract text from each cell.
Here are the steps to extract text from a text box in a Word document:
- Create a Document object.
- Load a Word file using Document.loadFromFile() method.
- Access a specific text box using Document.getTextBoxes().get() method.
- Iterate through the child objects of the text box.
- Check if a child object is a paragraph. If so, use Paragraph.getText() method to get the text.
- Check if a child object is a table. If so, use extractTextFromTable() method to retrieve the text from the table.
- Java
import com.spire.doc.*; import com.spire.doc.documents.DocumentObjectType; import com.spire.doc.documents.Paragraph; import com.spire.doc.fields.TextBox; import java.io.FileWriter; import java.io.IOException; public class ExtractTextFromTextbox { public static void main(String[] args) throws IOException { // Create a Document object Document document = new Document(); // Load a Word file document.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx"); // Get a specific textbox TextBox textBox = document.getTextBoxes().get(0); // Create a FileWriter to write extracted text to a txt file FileWriter fileWriter = new FileWriter("Extracted.txt"); // Iterate though child objects of the textbox for (Object object: textBox.getChildObjects()) { // Determine if the child object is a paragraph if (((DocumentObject) object).getDocumentObjectType() == DocumentObjectType.Paragraph) { // Write paragraph text to the txt file fileWriter.write(((Paragraph)object).getText() + "\n"); } // Determine if the child object is a table if (((DocumentObject) object).getDocumentObjectType() == DocumentObjectType.Table) { // Extract text from table to the txt file extractTextFromTable((Table)object, fileWriter); } } // Close the stream fileWriter.close(); } // Extract text from a table static void extractTextFromTable(Table table, FileWriter fileWriter) throws IOException { for (int i = 0; i < table.getRows().getCount(); i++) { TableRow row = table.getRows().get(i); for (int j = 0; j < row.getCells().getCount(); j++) { TableCell cell = row.getCells().get(j); for (Object paragraph: cell.getParagraphs()) { fileWriter.write(((Paragraph) paragraph).getText() + "\n"); } } } } }
Update a Textbox in Word in Java
To modify a text box, first remove its existing content using TextBox.getChildObjects.clear() method. Then, create a new paragraph and assign the desired text to it.
Here are the steps to update a text box in a Word document:
- Create a Document object.
- Load a Word file using Document.loadFromFile() method.
- Get a specific textbox using Document.getTextBoxes().get() method.
- Remove existing content of the textbox using TextBox.getChildObjects().clear() method.
- Add a paragraph to the textbox using TextBox.getBody().addParagraph() method.
- Add text to the paragraph using Paragraph.appendText() method.
- Save the document to a different Word file.
- Java
import com.spire.doc.Document; import com.spire.doc.FileFormat; import com.spire.doc.documents.Paragraph; import com.spire.doc.fields.TextBox; import com.spire.doc.fields.TextRange; public class UpdateTextbox { public static void main(String[] args) { // Create a Document object Document document = new Document(); // Load a Word file document.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx"); // Get a specific textbox TextBox textBox = document.getTextBoxes().get(0); // Remove child objects of the textbox textBox.getChildObjects().clear(); // Add a new paragraph to the textbox Paragraph paragraph = textBox.getBody().addParagraph(); // Set line spacing paragraph.getFormat().setLineSpacing(15f); // Add text to the paragraph TextRange textRange = paragraph.appendText("The text in this textbox has been updated."); // Set font size textRange.getCharacterFormat().setFontSize(15f); // Save the document to a different Word file document.saveToFile("UpdateTextbox.docx", FileFormat.Docx_2019); // Dispose resources document.dispose(); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.