Textboxes in a Word document serve as versatile containers for text, enabling users to enhance layout and design. They allow for the separation of content from the main body, making documents more visually appealing and organized. Extracting or updating textboxes can be essential for improving document efficiency, ensuring information is current, and facilitating data analysis.
In this article, you will learn how to extract or update textboxes in a Word document using Python and Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Extract Text from a Textbox in Word
Using Spire.Doc for Python, you can access a specific text box in a document by utilizing the Document.TextBoxes[index] property. After retrieving the text box, you can iterate through its child objects to identify whether each one is a paragraph or a table. If the object is a paragraph, you can retrieve its text using the Paragraph.Text property. In cases where the object is a table, you will need to loop through each cell to extract text from every individual cell within that table.
The steps to extract text from a text box in a Word document are as follows:
- Create a Document object.
- load a Word file by using Document.LoadFromFile() method.
- Access a specific text box using Document.TextBoxes[index] property.
- Iterate through the child objects within the text box.
- Determine if a child object is a paragraph. If it is, retrieve the text from the paragraph using Paragraph.Text property.
- Check if a child object is a table. If so, iterate through the cells in the table to extract text from each cell.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx") # Get a specific textbox textBox = document.TextBoxes[0] with open('ExtractedText.txt','w') as sw: # Iterate through the child objects in the textbox for i in range(textBox.ChildObjects.Count): # Get a specific child object object = textBox.ChildObjects.get_Item(i) # Determine if the child object is paragraph if object.DocumentObjectType == DocumentObjectType.Paragraph: # Write paragraph text to txt file sw.write((object if isinstance(object, Paragraph) else None).Text + "\n") # Determine if the child object is table if object.DocumentObjectType == DocumentObjectType.Table: table = object if isinstance(object, Table) else None for i in range(table.Rows.Count): row = table.Rows[i] for j in range(row.Cells.Count): cell = row.Cells[j] for k in range(cell.Paragraphs.Count): paragraph = cell.Paragraphs.get_Item(k) # Write paragrah text of a specific cell to txt file sw.write(paragraph.Text + "\n") # Dispose resources document.Dispose()
Update Text in a Textbox in Word
To update a textbox in a Word document, start by clearing its existing content with the TextBox.ChildObjects.Clear() method. This action removes all child objects, including any paragraphs or tables currently contained within the textbox. After clearing the content, you can add a new paragraph to the text box. Once the paragraph is created, set its text to the desired value.
The steps to update a textbox in a Word document are as follows:
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Get a specific textbox using Document.TextBoxes[index] property
- Remove existing content of the textbox using TextBox.ChildObjects.Clear() method.
- Add a paragraph to the textbox using TextBox.Body.AddParagraph() method.
- Add text to the paragraph using Paragraph.AppendText() method.
- Save the document to a different Word file.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx") # Get a specific textbox textBox = document.TextBoxes[0] # Remove child objects of the textbox textBox.ChildObjects.Clear() # Add a new paragraph to the textbox paragraph = textBox.Body.AddParagraph() # Set line spacing paragraph.Format.LineSpacing = 15.0 # Add text to the paragraph textRange = paragraph.AppendText("The text in this textbox has been updated.") # Set font size textRange.CharacterFormat.FontSize = 15.0 # Save the document to a different Word file document.SaveToFile("UpdateTextbox.docx", FileFormat.Docx2019); # Dispose resources document.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.