Monday, 18 December 2023 03:28

Python: Merge Word Documents

Dealing with a large number of Word documents can be very challenging. Whether it's editing or reviewing a large number of documents, there's a lot of time wasted on opening and closing documents. What's more, sharing and receiving a large number of separate Word documents can be annoying, as it may require a lot of repeated sending and receiving operations by both the sharer and the receiver. Therefore, in order to enhance efficiency and save time, it is advisable to merge related Word documents into a single file. From this article, you will know how to use Spire.Doc for Python to easily merge Word documents through Python programs.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Merge Word Documents by Inserting Files with Python

The method Document.insertTextFromFile() is used to insert other Word documents to the current one, and the inserted content will start from a new page. The detailed steps for merging Word documents by inserting are as follows:

  • Create an object of Document class and load a Word document using Document.LoadFromFile() method.
  • Insert the content from another document to it using Document.InsertTextFromFile() method.
  • Save the document using Document.SaveToFile() method.
  • Python
from spire.doc import *
    from spire.doc.common import *
    
    # Create an object of Document class and load a Word document
    doc = Document()
    doc.LoadFromFile("Sample1.docx")
    
    # Insert the content from another Word document to this one
    doc.InsertTextFromFile("Sample2.docx", FileFormat.Auto)
    
    # Save the document
    doc.SaveToFile("output/InsertDocuments.docx")
    doc.Close()

Python: Merge Word Documents

Merge Word Documents by Cloning Contents with Python

Merging Word documents can also be achieved by cloning contents from one Word document to another. This method maintains the formatting of the original document, and content cloned from another document continues at the end of the current document without starting a new Page. The detailed steps are as follows:

  • Create two objects of Document class and load two Word documents using Document.LoadFromFile() method.
  • Get the last section of the destination document using Document.Sections.get_Item() method.
  • Loop through the sections in the document to be cloned and then loop through the child objects of the sections.
  • Get a section child object using Section.Body.ChildObjects.get_Item() method.
  • Add the child object to the last section of the destination document using Section.Body.ChildObjects.Add() method.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
    from spire.doc.common import *
    
    # Create two objects of Document class and load two Word documents
    doc1 = Document()
    doc1.LoadFromFile("Sample1.docx")
    doc2 = Document()
    doc2.LoadFromFile("Sample2.docx")
    
    # Get the last section of the first document
    lastSection = doc1.Sections.get_Item(doc1.Sections.Count - 1)
    
    # Loop through the sections in the second document
    for i in range(doc2.Sections.Count):
        section = doc2.Sections.get_Item(i)
        # Loop through the child objects in the sections
        for j in range(section.Body.ChildObjects.Count):
            obj = section.Body.ChildObjects.get_Item(j)
            # Add the child objects from the second document to the last section of the first document
            lastSection.Body.ChildObjects.Add(obj.Clone())
    
    # Save the result document
    doc1.SaveToFile("output/MergeByCloning.docx")
    doc1.Close()
    doc2.Close()

Python: Merge Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

See Also

Published in doc

Creating, reading, and updating Word documents is a common need for many developers working with the Python programming language. Whether it's generating reports, manipulating existing documents, or automating document creation processes, having the ability to work with Word documents programmatically can greatly enhance productivity and efficiency. In this article, you will learn how to create, read, or update Word documents in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Create a Word Document from Scratch in Python

Spire.Doc for Python offers the Document class to represent a Word document model. A document must contain at least one section (represented by the Section class) and each section is a container for various elements such as paragraphs, tables, charts, and images. This example shows you how to create a simple Word document containing several paragraphs using Spire.Doc for Python.

  • Create a Document object.
  • Add a section using Document.AddSection() method.
  • Set the page margins through Section.PageSetUp.Margins property.
  • Add several paragraphs to the section using Section.AddParagraph() method.
  • Add text to the paragraphs using Paragraph.AppendText() method.
  • Create a ParagraphStyle object, and apply it to a specific paragraph using Paragraph.ApplyStyle() method.
  • Save the document to a Word file using Document.SaveToFile() method.
  • Python
from spire.doc import *
    from spire.doc.common import *
    
    # Create a Document object
    doc = Document()
    
    # Add a section
    section = doc.AddSection()
    
    # Set the page margins
    section.PageSetup.Margins.All = 40
    
    # Add a title
    titleParagraph = section.AddParagraph()
    titleParagraph.AppendText("Introduction of Spire.Doc for Python")
    
    # Add two paragraphs
    bodyParagraph_1 = section.AddParagraph()
    bodyParagraph_1.AppendText("Spire.Doc for Python is a professional Python library designed for developers to " +
                               "create, read, write, convert, compare and print Word documents in any Python application " +
                               "with fast and high-quality performance.")
    
    bodyParagraph_2 = section.AddParagraph()
    bodyParagraph_2.AppendText("As an independent Word Python API, Spire.Doc for Python doesn't need Microsoft Word to " +
                               "be installed on neither the development nor target systems. However, it can incorporate Microsoft Word " +
                               "document creation capabilities into any developers' Python applications.")
    
    # Apply heading1 to the title
    titleParagraph.ApplyStyle(BuiltinStyle.Heading1)
    
    # Create a style for the paragraphs
    style2 = ParagraphStyle(doc)
    style2.Name = "paraStyle"
    style2.CharacterFormat.FontName = "Arial"
    style2.CharacterFormat.FontSize = 13
    doc.Styles.Add(style2)
    bodyParagraph_1.ApplyStyle("paraStyle")
    bodyParagraph_2.ApplyStyle("paraStyle")
    
    # Set the horizontal alignment of the paragraphs
    titleParagraph.Format.HorizontalAlignment = HorizontalAlignment.Center
    bodyParagraph_1.Format.HorizontalAlignment = HorizontalAlignment.Left
    bodyParagraph_2.Format.HorizontalAlignment = HorizontalAlignment.Left
    
    # Set the after spacing
    titleParagraph.Format.AfterSpacing = 10
    bodyParagraph_1.Format.AfterSpacing = 10
    
    # Save to file
    doc.SaveToFile("output/WordDocument.docx", FileFormat.Docx2019)

Python: Create, Read, or Update a Word Document

Read Text of a Word Document in Python

To get the text of an entire Word document, you could simply use Document.GetText() method. The following are the detailed steps.

  • Create a Document object.
  • Load a Word document using Document.LoadFromFile() method.
  • Get text from the entire document using Document.GetText() method.
  • Python
from spire.doc import *
    from spire.doc.common import *
    
    # Create a Document object
    doc = Document()
    
    # Load a Word file
    doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\WordDocument.docx")
    
    # Get text from the entire document
    text = doc.GetText()
    
    # Print text
    print(text)

Python: Create, Read, or Update a Word Document

Update a Word Document in Python

To access a specific paragraph, you can use the Section.Paragraphs[index] property. If you want to modify the text of the paragraph, you can reassign text to the paragraph through the Paragraph.Text property. The following are the detailed steps.

  • Create a Document object.
  • Load a Word document using Document.LoadFromFile() method.
  • Get a specific section through Document.Sections[index] property.
  • Get a specific paragraph through Section.Paragraphs[index] property.
  • Change the text of the paragraph through Paragraph.Text property.
  • Save the document to another Word file using Document.SaveToFile() method.
  • Python
from spire.doc import *
    from spire.doc.common import *
    
    # Create a Document object
    doc = Document()
    
    # Load a Word file
    doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\WordDocument.docx")
    
    # Get a specific section
    section = doc.Sections[0]
    
    # Get a specific paragraph
    paragraph = section.Paragraphs[1]
    
    # Change the text of the paragraph
    paragraph.Text = "The title has been changed"
    
    # Save to file
    doc.SaveToFile("output/Updated.docx", FileFormat.Docx2019)

Python: Create, Read, or Update a Word Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

See Also

Published in doc
Monday, 18 December 2023 03:01

Python: Convert HTML to Word

Install with Pip

pip install Spire.Doc

Related Links

While HTML is designed for online viewing, Word documents are commonly used for printing and physical documentation. Converting HTML to Word ensures that the content is optimized for printing, allowing for accurate page breaks, headers, footers, and other necessary elements for professional documentation purposes. In this article, we will explain how to convert HTML to Word in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Convert an HTML File to Word with Python

You can easily convert an HTML file to Word format by using the Document.SaveToFile() method provided by Spire.Doc for Python. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load an HTML file using Document.LoadFromFile() method.
  • Save the HTML file to Word format using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Specify the input and output file paths
inputFile = "Input.html"
outputFile = "HtmlToWord.docx"

# Create an object of the Document class
document = Document()
# Load an HTML file
document.LoadFromFile(inputFile, FileFormat.Html, XHTMLValidationType.none)

# Save the HTML file to a .docx file
document.SaveToFile(outputFile, FileFormat.Docx2016)
document.Close()

Python: Convert HTML to Word

Convert an HTML String to Word with Python

To convert an HTML string to Word, you can use the Paragraph.AppendHTML() method. The detailed steps are as follows.

  • Create an object of the Document class.
  • Add a section to the document using Document.AddSection() method.
  • Add a paragraph to the section using Section.AddParagraph() method.
  • Append an HTML string to the paragraph using Paragraph.AppendHTML() method.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Specify the output file path
outputFile = "HtmlStringToWord.docx"

# Create an object of the Document class
document = Document()
# Add a section to the document
sec = document.AddSection()

# Add a paragraph to the section
paragraph = sec.AddParagraph()

# Specify the HTML string
htmlString = """
<html>
<head>
    <title>HTML to Word Example</title>
    <style>
        body {
            font-family: Arial, sans-serif;
        }
        h1 {
            color: #FF5733;
            font-size: 24px;
            margin-bottom: 20px;
        }
        p {
            color: #333333;
            font-size: 16px;
            margin-bottom: 10px;
        }
        ul {
            list-style-type: disc;
            margin-left: 20px;
            margin-bottom: 15px;
        }
        li {
            font-size: 14px;
            margin-bottom: 5px;
        }
        table {
            border-collapse: collapse;
            width: 100%;
            margin-bottom: 20px;
        }
        th, td {
            border: 1px solid #CCCCCC;
            padding: 8px;
            text-align: left;
        }
        th {
            background-color: #F2F2F2;
            font-weight: bold;
        }
        td {
            color: #0000FF;
        }
    </style>
</head>
<body>
    <h1>This is a Heading</h1>
    <p>This is a paragraph demonstrating the conversion of HTML to Word document.</p>
    <p>Here's an example of an unordered list:</p>
    <ul>
        <li>Item 1</li>
        <li>Item 2</li>
        <li>Item 3</li>
    </ul>
    <p>And here's a table:</p>
    <table>
        <tr>
            <th>Product</th>
            <th>Quantity</th>
            <th>Price</th>
        </tr>
        <tr>
            <td>Jacket</td>
            <td>30</td>
            <td>$150</td>
        </tr>
        <tr>
            <td>Sweater</td>
            <td>25</td>
            <td>$99</td>
        </tr>
    </table>
</body>
</html>
"""

# Append the HTML string to the paragraph
paragraph.AppendHTML(htmlString)

# Save the result document
document.SaveToFile(outputFile, FileFormat.Docx2016)
document.Close()

Python: Convert HTML to Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

See Also

Published in doc
Monday, 18 December 2023 02:49

Python: Convert Text to Word or Word to Text

Install with Pip

pip install Spire.Doc

Related Links

Text files are a common file type that contain only plain text without any formatting or styles. If you want to apply formatting or add images, charts, tables, and other media elements to text files, one of the recommended solutions is to convert them to Word files.

Conversely, if you want to efficiently extract content or reduce the file size of Word documents, you can convert them to text format. This article will demonstrate how to programmatically convert text files to Word format and convert Word files to text format using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Convert Text (TXT) to Word in Python

Conversion from TXT to Word is quite simple that requires only a few lines of code. The following are the detailed steps.

  • Create a Document object.
  • Load a text file using Document.LoadFromFile(string fileName) method.
  • Save the text file as a Word file using Document.SaveToFile(string fileName, FileFormat fileFormat) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a TXT file
document.LoadFromFile("input.txt")

# Save the TXT file as Word
document.SaveToFile("TxtToWord.docx", FileFormat.Docx2016)
document.Close()

Python: Convert Text to Word or Word to Text

Convert Word to Text (TXT) in Python

The Document.SaveToFile(string fileName, FileFormat.Txt) method provided by Spire.Doc for Python allows you to export a Word file to text format. The following are the detailed steps.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile(string fileName) method.
  • Save the Word file in txt format using Document.SaveToFile(string fileName, FileFormat.Txt) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file from disk
document.LoadFromFile("Input.docx")

# Save the Word file in txt format
document.SaveToFile("WordToTxt.txt", FileFormat.Txt)
document.Close()

Python: Convert Text to Word or Word to Text

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

See Also

Published in doc
Monday, 18 December 2023 02:34

Python: Convert Word to PDF

Nowadays, digital documents play a significant role in our daily lives, both in personal and professional settings. One such common format is Microsoft Word - used for creating and editing textual documents. However, there may come a time when you need to convert your Word files into a more universally accessible format, such as PDF. PDFs offer advantages like preserving formatting, ensuring compatibility, and maintaining document integrity across different devices and operating systems. In this article, you will learn how to convert Word to PDF in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Convert Doc or Docx to PDF in Python

Spire.Doc for Python offers the Document.SaveToFile(string fileName, FileFormat fileFormat) method that allows to save Word as PDF, XPS, HTML, RTF, etc. If you just want to save your Word documents as regular PDFs without additional settings, follow the steps below.

  • Create a Document object.
  • Load a sample Word document using Document.LoadFromFile() method.
  • Save the document to PDF using Doucment.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create word document
document = Document()

# Load a doc or docx file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

#Save the document to PDF
document.SaveToFile("output/ToPDF.pdf", FileFormat.PDF)
document.Close()

Python: Convert Word to PDF

Convert Word to Password-Protected PDF in Python

To convert Word to a Password-Protected PDF, you can utilize the Document.SaveToFile(string fileName, ToPdfParameterList paramList) method, where the ToPdfParameterList parameter allows you to control the conversion process of a Word document into a PDF format. This includes options such as encrypting the document during the conversion. Here are the specific steps to accomplish this task.

  • Create a Document object.
  • Load a sample Word document using Document.LoadFromFile() method.
  • Create a ToPdfParameterList object, which is used to set conversion options.
  • Specify the open password and permission password and then set both passwords for the generated PDF using ToPdfParameterList.PdfSecurity.Encrypt() method.
  • Save the Word document to PDF with password using Doucment.SaveToFile(string fileName, ToPdfParameterList paramList) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Create a ToPdfParameterList object
parameter = ToPdfParameterList()

# Specify open password and permission password
openPsd = "abc-123"
permissionPsd = "permission"

# Protect the PDF to be generated with open password and permission password
parameter.PdfSecurity.Encrypt(openPsd, permissionPsd, PdfPermissionsFlags.Default, PdfEncryptionKeySize.Key128Bit)

# Save the Word document to PDF
document.SaveToFile("output/ToPdfWithPassword.pdf", parameter)
document.Close()

Python: Convert Word to PDF

Convert Word to PDF with Bookmarks in Python

Adding bookmarks to a document can improve its readability. When creating PDF from Word, you may want to keep the existing bookmarks or create new ones based on the headings. Here are the steps to convert Word to PDF while maintaining bookmarks.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Create a ToPdfParameterList object, which is used to set conversion options.
  • Create bookmarks in PDF from the headings in Word by setting ToPdfParameterList.CreateWordBookmarksUsingHeadings to true.
  • Save the document to PDF with bookmarks using Doucment.SaveToFile(string fileName, ToPdfParameterList paramList) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Create a ToPdfParameterList object
parames = ToPdfParameterList()

# Create bookmarks from Word headings
parames.CreateWordBookmarksUsingHeadings = True

# Create bookmarks in PDF from existing bookmarks in Word
# parames.CreateWordBookmarks = True

# Save the document to PDF
document.SaveToFile("output/ToPdfWithBookmarks.pdf", FileFormat.PDF)
document.Close()

Python: Convert Word to PDF

Convert Word to PDF with Fonts Embedded in Python

To ensure consistent appearance of a PDF document on any device, you probably need to embed fonts in the generated PDF document. The following are the steps to embed the fonts used in a Word document into the resulting PDF.

  • Create a Document object.
  • Load a sample Word file using Document.LoadFromFile() method.
  • Create a ToPdfParameterList object, which is used to set conversion options.
  • Embed fonts in generated PDF through ToPdfParameterList.IsEmbeddedAllFonts property.
  • Save the document to PDF using Doucment.SaveToFile(string fileName, ToPdfParameterList paramList) method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Create a ToPdfParameterList object
parameter = ToPdfParameterList()

# Embed fonts in PDF
parameter.IsEmbeddedAllFonts = True

# Save the Word document to PDF
document.SaveToFile("output/EmbedFonts.pdf", parameter)
document.Close()

Python: Convert Word to PDF

Set Image Quality When Converting Word to PDF in Python

When converting a Word document to PDF, it is important to consider the size of the resulting file, especially if it contains numerous high-quality images. You have the option to compress the image quality during the conversion process. To do this, follow the steps below.

  • Create a Document object.
  • Load a sample Word file using Document.LoadFromFile() method.
  • Set the image quality through Document.JPEGQuality property.
  • Save the document to PDF using Doucment.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Compress image to 40% of its original quality
document.JPEGQuality = 40

# Preserve original image quality
# document.JPEGQuality = 100

# Save the Word document to PDF
document.SaveToFile("output/SetImageQuality.pdf", FileFormat.PDF)
document.Close()

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

See Also

Published in doc