With the increasing popularity of team collaboration, the track changes function in Word documents has become the cornerstone of version control and content review. However, for developers who pursue automation and efficiency, how to flexibly extract these revision information from Word documents remains a significant challenge. This article will introduce you to how to use Spire.Doc for Python to obtain revision information in Word documents.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Get Revisions of Word Document in Python
Spire.Doc for Python provides the IsInsertRevision and DeleteRevision properties to support determining whether an element in a Word document is an insertion revision or a deletion revision. Here are the detailed steps:
- Create an instance of the Document class and load the Word document that contains revisions.
- Initialize lists to collect insertion and deletion revision information.
- Iterate through the sections of the document and their body elements.
- Obtain the paragraphs in the body and use the IsInsertRevision property to determine if the paragraph is an insertion revision.
- Get the type, author, and associated text of the insertion revision.
- Use the IsDeleteRevision property to determine if the paragraph is a deletion revision, and obtain its revision type, author, and associated text.
- Iterate through the child elements of the paragraph, similarly checking if the TextRange is an insertion or deletion revision, and retrieve the revision type, author, and associated text.
- Define a WriteAllText function to save the insertion and deletion revision information to TXT documents.
- Python
from spire.doc import * # Function to write text to a file def WriteAllText(fname: str, text: str): with open(fname, "w", encoding='utf-8') as fp: fp.write(text) # Input and output file names inputFile = "sample.docx" outputFile1 = "InsertRevision.txt" outputFile2 = "DeleteRevision.txt" # Create a Document object document = Document() # Load the Word document document.LoadFromFile(inputFile) # Initialize lists to store insert and delete revisions insert_revisions = [] delete_revisions = [] # Iterate through sections in the document for k in range(document.Sections.Count): sec = document.Sections.get_Item(k) # Iterate through body elements in the section for m in range(sec.Body.ChildObjects.Count): # Check if the item is a Paragraph docItem = sec.Body.ChildObjects.get_Item(m) if isinstance(docItem, Paragraph): para = docItem para.AppendField("",FieldType.FieldDocVariable) # Check if the paragraph is an insertion revision if para.IsInsertRevision: insRevison = para.InsertRevision insType = insRevison.Type insAuthor = insRevison.Author # Add insertion revision details to the list insert_revisions.append(f"Revision Type: {insType.name}\n") insert_revisions.append(f"Revision Author: {insAuthor}\n") insert_revisions.append(f"Insertion Text: {para.Text}\n") # Check if the paragraph is a deletion revision elif para.IsDeleteRevision: delRevison = para.DeleteRevision delType = delRevison.Type delAuthor = delRevison.Author # Add deletion revision details to the list delete_revisions.append(f"Revision Type:: {delType.name}\n") delete_revisions.append(f"Revision Author: {delAuthor}\n") delete_revisions.append(f"Deletion Text: {para.Text}\n") else: # Iterate through all child objects of Paragraph for j in range(para.ChildObjects.Count): obj = para.ChildObjects.get_Item(j) # Check if the current object is an instance of TextRange if isinstance(obj, TextRange): textRange = obj # Check if the textrange is an insertion revision if textRange.IsInsertRevision: insRevison = textRange.InsertRevision insType = insRevison.Type insAuthor = insRevison.Author # Add insertion revision details to the list insert_revisions.append(f"Revision Type: {insType.name}\n") insert_revisions.append(f"Revision Author: {insAuthor}\n") insert_revisions.append(f"Insertion Text: {textRange.Text}\n") # Check if the textrange is a deletion revision elif textRange.IsDeleteRevision: delRevison = textRange.DeleteRevision delType = delRevison.Type delAuthor = delRevison.Author # Add deletion revision details to the list delete_revisions.append(f"Revision Type: {delType.name}\n") delete_revisions.append(f"Revision Author: {delAuthor}\n") delete_revisions.append(f"Deletion Text: {textRange.Text}\n") # Write all the insertion revision details to the 'outputFile1' file WriteAllText(outputFile1, ''.join(insert_revisions)) # Write all the deletion revision details to the 'outputFile2' file WriteAllText(outputFile2, ''.join(delete_revisions)) # Dispose the document document.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.