Converting Word documents to HTML enables easy sharing and publishing of content online. Additionally, HTML content is more search engine friendly, thus converting to HTML also allows search engines to better index and rank your content, increasing its visibility in search results. In this article, you will learn how to programmatically convert Word to HTML using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Convert Word Doc/Docx to HTML in Python
Spire.Doc for Python offers the Document.SaveToFile(fileName string, FileFormat.Html) method to simply save a doc or docx document as an HTML file. The following are the detailed steps.
- Create a Document object.
- Load a Word document using Document.LoadFromFile() method.
- Save the document as an HTML file using Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document instance document = Document() # Load a doc or docx document document.LoadFromFile("Statement.docx") # Save to HTML document.SaveToFile("WordToHtml.html", FileFormat.Html) document.Close()
Convert Word to HTML with Export Options in Python
Spire.Doc for Python also offers the HtmlExportOptions class to set Word to HTML export options during conversion, such as whether to embed CSS styles, images, and whether to export form fields as plain text. The following are the detailed steps.
- Create a Document object.
- Load a Word document using Document.LoadFromFile() method.
- Embed CSS styles during conversion using Document.HtmlExportOptions.CssStyleSheetType property.
- Set whether to embed images using Document.HtmlExportOptions.ImageEmbedded property.
- Set whether to export form fields as plain text using Document.HtmlExportOptions.IsTextInputFormFieldAsText property.
- Save the result document using Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document instance document = Document() # Load a Word document document.LoadFromFile("Statement.docx") # Embed css styles document.HtmlExportOptions.CssStyleSheetFileName = "sample.css" document.HtmlExportOptions.CssStyleSheetType = CssStyleSheetType.External # Set whether to embed images document.HtmlExportOptions.ImageEmbedded = False document.HtmlExportOptions.ImagesPath = "Images/" # Set whether to export form fields as plain text document.HtmlExportOptions.IsTextInputFormFieldAsText = True # Save the document as an html file document.SaveToFile("ToHtmlExportOption.html", FileFormat.Html) document.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.