Python: Remove Blank Lines from Word Documents

2023-09-21 00:56:26 Written by  support iceblue
Rate this item
(0 votes)

During the process of document creation, it is common to encounter numerous blank lines. These empty spaces can disrupt the flow of the content, clutter the layout, and undermine the overall aesthetic presentation of the document. In order to optimize the reading experience and ensure a well-structured document, it becomes crucial to eliminate the blank lines. This article will demonstrate how to delete blank lines from Word documents through Python programs using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Remove Blank Lines from Word Documents

Blank lines in a Word document appear as blank paragraphs, which are child objects of sections. Therefore, removing blank lines simply requires iterating through the sections, identifying and deleting empty paragraphs within them. The detailed steps are as follows:

  • Create an object of Document class.
  • Load a Word document using Document.LoadFromFile() method.
  • Iterate through each section and each child object of the sections.
  • First, check if a child object is of paragraph type. If it is, continue to check if the sub-object is an instance of the "Paragraph" class. If it is, further check if the paragraph has no text. If there is no text, delete the paragraph using Section.Body.ChildObjects.Remove() method.
  • Save the document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
doc = Document()

# Load a Word document
doc.LoadFromFile("Sample.docx")

# Iterate through each section in the document
for i in range(doc.Sections.Count):
    section = doc.Sections.get_Item(i)
    j = 0
    # Iterate through each child object in the section
    while j < section.Body.ChildObjects.Count:
        # Check if the child object is of type Paragraph
        if section.Body.ChildObjects[j].DocumentObjectType == DocumentObjectType.Paragraph:
            objItem = section.Body.ChildObjects[j]
            # Check if the child object is an instance of the Paragraph class
            if isinstance(objItem, Paragraph):
                paraObj = Paragraph(objItem)
                # Check if the paragraph text is empty
                if len(paraObj.Text) == 0:
                    # If the paragraph text is empty, remove the object from the section's child objects list
                    section.Body.ChildObjects.Remove(objItem)
                    j -= 1
        j += 1

# Save the document
doc.SaveToFile("output/RemoveBlankLines.docx")

Python: Remove Blank Lines from Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

  • tutorial_title:
Last modified on Thursday, 25 April 2024 02:09