While HTML is designed for online viewing, Word documents are commonly used for printing and physical documentation. Converting HTML to Word ensures that the content is optimized for printing, allowing for accurate page breaks, headers, footers, and other necessary elements for professional documentation purposes. In this article, we will explain how to convert HTML to Word in Python using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Convert an HTML File to Word with Python
You can easily convert an HTML file to Word format by using the Document.SaveToFile() method provided by Spire.Doc for Python. The detailed steps are as follows.
- Create an object of the Document class.
- Load an HTML file using Document.LoadFromFile() method.
- Save the HTML file to Word format using Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Specify the input and output file paths inputFile = "Input.html" outputFile = "HtmlToWord.docx" # Create an object of the Document class document = Document() # Load an HTML file document.LoadFromFile(inputFile, FileFormat.Html, XHTMLValidationType.none) # Save the HTML file to a .docx file document.SaveToFile(outputFile, FileFormat.Docx2016) document.Close()
Convert an HTML String to Word with Python
To convert an HTML string to Word, you can use the Paragraph.AppendHTML() method. The detailed steps are as follows.
- Create an object of the Document class.
- Add a section to the document using Document.AddSection() method.
- Add a paragraph to the section using Section.AddParagraph() method.
- Append an HTML string to the paragraph using Paragraph.AppendHTML() method.
- Save the result document using Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Specify the output file path outputFile = "HtmlStringToWord.docx" # Create an object of the Document class document = Document() # Add a section to the document sec = document.AddSection() # Add a paragraph to the section paragraph = sec.AddParagraph() # Specify the HTML string htmlString = """ <html> <head> <title>HTML to Word Example</title> <style> body { font-family: Arial, sans-serif; } h1 { color: #FF5733; font-size: 24px; margin-bottom: 20px; } p { color: #333333; font-size: 16px; margin-bottom: 10px; } ul { list-style-type: disc; margin-left: 20px; margin-bottom: 15px; } li { font-size: 14px; margin-bottom: 5px; } table { border-collapse: collapse; width: 100%; margin-bottom: 20px; } th, td { border: 1px solid #CCCCCC; padding: 8px; text-align: left; } th { background-color: #F2F2F2; font-weight: bold; } td { color: #0000FF; } </style> </head> <body> <h1>This is a Heading</h1> <p>This is a paragraph demonstrating the conversion of HTML to Word document.</p> <p>Here's an example of an unordered list:</p> <ul> <li>Item 1</li> <li>Item 2</li> <li>Item 3</li> </ul> <p>And here's a table:</p> <table> <tr> <th>Product</th> <th>Quantity</th> <th>Price</th> </tr> <tr> <td>Jacket</td> <td>30</td> <td>$150</td> </tr> <tr> <td>Sweater</td> <td>25</td> <td>$99</td> </tr> </table> </body> </html> """ # Append the HTML string to the paragraph paragraph.AppendHTML(htmlString) # Save the result document document.SaveToFile(outputFile, FileFormat.Docx2016) document.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.