Spire.Doc for Python 12.2.1 supports obtaining page content through fixed layout

2024-02-26 06:38:09

We are happy to announce the release of Spire.Doc for Python 12.2.1. This version supports obtaining page content through fixed layout. More details are listed below.

Here is a list of changes made in this release

Category ID Description
New feature - Supports obtaining page content through fixed layout.
def WriteAllText(fpath:str,content:str):
    with open(fpath,'w',encoding="utf-8") as fp:
        fp.write(content)

# Specify the file path
inputFile = "./Data/Sample.docx"
outputFile = "output.txt"

# Create a new instance of Document
doc = Document()

# Load the document from the specified file
doc.LoadFromFile(inputFile, FileFormat.Docx)

# Create a FixedLayoutDocument object using the loaded document
layoutDoc = FixedLayoutDocument(doc)
result = ''

# Get the first line on the first page
line = layoutDoc.Pages[0].Columns[0].Lines[0]
result += "Line: "
result += line.Text
result += "\n"
# Retrieve the original paragraph associated with the line
para = line.Paragraph
result += "Paragraph text: "
result += para.Text
result += "\n"
# Retrieve all the text that appears on the first page in plain text format (including headers and footers).
pageText = layoutDoc.Pages[0].Text
result += pageText
result += "\n"
# Loop through each page in the document and print how many lines appear on each page.
pages = layoutDoc.Pages
for i in range(pages.Count):
	page = pages[i]
	lines = page.GetChildEntities(LayoutElementType.Line, True)
	result += "Page "
	result += str(page.PageIndex)
	result += " has "
	result += str(lines.Count)
	result += " lines."
	result += "\n"

# Perform a reverse lookup of layout entities for the first paragraph
result += "\n"
result += "The lines of the first paragraph:"
result += "\n"
tempChild = doc.FirstChild
section = Section(tempChild)
para = section.Body.Paragraphs[0]
paragraphLines = layoutDoc.GetLayoutEntitiesOfNode(para)

for i in range(paragraphLines.Count):
	tempLine = paragraphLines[i]
	paragraphLine = FixedLayoutLine(tempLine)
	result += (paragraphLine.Text).strip()
	result += "\n"
	result += paragraphLine.Rectangle.ToString()
	result += "\n"
	result += "\n"
# Write the extracted text to a file
WriteAllText(outputFile, result)

# Dispose of the document resources
doc.Dispose()
Click the link below to get Spire.Doc for Python 12.2.1: