Python: Convert Word to HTML

2023-10-12 02:58:11 Written by  support iceblue
Rate this item
(0 votes)

Converting Word documents to HTML enables easy sharing and publishing of content online. Additionally, HTML content is more search engine friendly, thus converting to HTML also allows search engines to better index and rank your content, increasing its visibility in search results. In this article, you will learn how to programmatically convert Word to HTML using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Convert Word Doc/Docx to HTML in Python

Spire.Doc for Python offers the Document.SaveToFile(fileName string, FileFormat.Html) method to simply save a doc or docx document as an HTML file. The following are the detailed steps.

  • Create a Document object.
  • Load a Word document using Document.LoadFromFile() method.
  • Save the document as an HTML file using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *
     
# Create a Document instance
document = Document()

# Load a doc or docx document 
document.LoadFromFile("Statement.docx")

# Save to HTML
document.SaveToFile("WordToHtml.html", FileFormat.Html)
document.Close()

Python: Convert Word to HTML

Convert Word to HTML with Export Options in Python

Spire.Doc for Python also offers the HtmlExportOptions class to set Word to HTML export options during conversion, such as whether to embed CSS styles, images, and whether to export form fields as plain text. The following are the detailed steps.

  • Create a Document object.
  • Load a Word document using Document.LoadFromFile() method.
  • Embed CSS styles during conversion using Document.HtmlExportOptions.CssStyleSheetType property.
  • Set whether to embed images using Document.HtmlExportOptions.ImageEmbedded property.
  • Set whether to export form fields as plain text using Document.HtmlExportOptions.IsTextInputFormFieldAsText property.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document instance
document = Document()

# Load a Word document
document.LoadFromFile("Statement.docx")

# Embed css styles
document.HtmlExportOptions.CssStyleSheetFileName = "sample.css"
document.HtmlExportOptions.CssStyleSheetType = CssStyleSheetType.External

# Set whether to embed images
document.HtmlExportOptions.ImageEmbedded = False
document.HtmlExportOptions.ImagesPath = "Images/"

# Set whether to export form fields as plain text
document.HtmlExportOptions.IsTextInputFormFieldAsText = True

# Save the document as an html file
document.SaveToFile("ToHtmlExportOption.html", FileFormat.Html)
document.Close()

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

  • tutorial_title:
Last modified on Thursday, 25 April 2024 02:05