Python: Convert Word to HTML
Converting Word documents to HTML enables easy sharing and publishing of content online. Additionally, HTML content is more search engine friendly, thus converting to HTML also allows search engines to better index and rank your content, increasing its visibility in search results. In this article, you will learn how to programmatically convert Word to HTML using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Convert Word Doc/Docx to HTML in Python
Spire.Doc for Python offers the Document.SaveToFile(fileName string, FileFormat.Html) method to simply save a doc or docx document as an HTML file. The following are the detailed steps.
- Create a Document object.
- Load a Word document using Document.LoadFromFile() method.
- Save the document as an HTML file using Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document instance document = Document() # Load a doc or docx document document.LoadFromFile("Statement.docx") # Save to HTML document.SaveToFile("WordToHtml.html", FileFormat.Html) document.Close()
Convert Word to HTML with Export Options in Python
Spire.Doc for Python also offers the HtmlExportOptions class to set Word to HTML export options during conversion, such as whether to embed CSS styles, images, and whether to export form fields as plain text. The following are the detailed steps.
- Create a Document object.
- Load a Word document using Document.LoadFromFile() method.
- Embed CSS styles during conversion using Document.HtmlExportOptions.CssStyleSheetType property.
- Set whether to embed images using Document.HtmlExportOptions.ImageEmbedded property.
- Set whether to export form fields as plain text using Document.HtmlExportOptions.IsTextInputFormFieldAsText property.
- Save the result document using Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document instance document = Document() # Load a Word document document.LoadFromFile("Statement.docx") # Embed css styles document.HtmlExportOptions.CssStyleSheetFileName = "sample.css" document.HtmlExportOptions.CssStyleSheetType = CssStyleSheetType.External # Set whether to embed images document.HtmlExportOptions.ImageEmbedded = False document.HtmlExportOptions.ImagesPath = "Images/" # Set whether to export form fields as plain text document.HtmlExportOptions.IsTextInputFormFieldAsText = True # Save the document as an html file document.SaveToFile("ToHtmlExportOption.html", FileFormat.Html) document.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Convert Word to Images
Converting a Word document into images can be a useful and convenient option when you want to share or present the content without worrying about formatting issues or compatibility across devices. By converting a Word document into images, you can ensure that the text, images, and formatting remain intact, making it an ideal solution for sharing documents on social media, websites, or through email. In this article, you will learn how to convert Word to PNG, JPEG or SVG in Python using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Convert Word to PNG or JPEG in Python
Spire.Doc for Python offers the Document.SaveImageToStream() method to convert a certain page into a bitmap image. Afterwards, you can save the bitmap image to a popular image format such as PNG, JPEG, or BMP. The detailed steps are as follows.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Retrieve each page in the document, and convert a specific page into a bitmap image using Document.SaveImageToStreams() method.
- Save the bitmap image into a PNG or JPEG file.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx") # Loop through the pages in the document for i in range(document.GetPageCount()): # Convert a specific page to bitmap image imageStream = document.SaveImageToStreams(i, ImageType.Bitmap) # Save the bitmap to a PNG file with open('Output/ToImage-{0}.png'.format(i),'wb') as imageFile: imageFile.write(imageStream.ToArray()) document.Close()
Convert Word to SVG in Python
To convert a Word document into multiple SVG files, you can simply use the Document.SaveToFile() method. Here are the steps.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Convert it to individual SVG files using Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx") # Convert it to SVG files document.SaveToFile("output/ToSVG.svg", FileFormat.SVG) document.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Convert Text to Word or Word to Text
Text files are a common file type that contain only plain text without any formatting or styles. If you want to apply formatting or add images, charts, tables, and other media elements to text files, one of the recommended solutions is to convert them to Word files.
Conversely, if you want to efficiently extract content or reduce the file size of Word documents, you can convert them to text format. This article will demonstrate how to programmatically convert text files to Word format and convert Word files to text format using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Convert Text (TXT) to Word in Python
Conversion from TXT to Word is quite simple that requires only a few lines of code. The following are the detailed steps.
- Create a Document object.
- Load a text file using Document.LoadFromFile(string fileName) method.
- Save the text file as a Word file using Document.SaveToFile(string fileName, FileFormat fileFormat) method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a TXT file document.LoadFromFile("input.txt") # Save the TXT file as Word document.SaveToFile("TxtToWord.docx", FileFormat.Docx2016) document.Close()
Convert Word to Text (TXT) in Python
The Document.SaveToFile(string fileName, FileFormat.Txt) method provided by Spire.Doc for Python allows you to export a Word file to text format. The following are the detailed steps.
- Create a Document object.
- Load a Word file using Document.LoadFromFile(string fileName) method.
- Save the Word file in txt format using Document.SaveToFile(string fileName, FileFormat.Txt) method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file from disk document.LoadFromFile("Input.docx") # Save the Word file in txt format document.SaveToFile("WordToTxt.txt", FileFormat.Txt) document.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Convert Word to PDF
Nowadays, digital documents play a significant role in our daily lives, both in personal and professional settings. One such common format is Microsoft Word - used for creating and editing textual documents. However, there may come a time when you need to convert your Word files into a more universally accessible format, such as PDF. PDFs offer advantages like preserving formatting, ensuring compatibility, and maintaining document integrity across different devices and operating systems. In this article, you will learn how to convert Word to PDF in Python using Spire.Doc for Python.
- Convert Doc or Docx to PDF in Python
- Convert Word to Password-Protected PDF in Python
- Convert Word to PDF with Bookmarks in Python
- Convert Word to PDF with Fonts Embedded in Python
- Set Image Quality When Converting Word to PDF in Python
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Convert Doc or Docx to PDF in Python
Spire.Doc for Python offers the Document.SaveToFile(string fileName, FileFormat fileFormat) method that allows to save Word as PDF, XPS, HTML, RTF, etc. If you just want to save your Word documents as regular PDFs without additional settings, follow the steps below.
- Create a Document object.
- Load a sample Word document using Document.LoadFromFile() method.
- Save the document to PDF using Doucment.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create word document document = Document() # Load a doc or docx file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx") #Save the document to PDF document.SaveToFile("output/ToPDF.pdf", FileFormat.PDF) document.Close()
Convert Word to Password-Protected PDF in Python
To convert Word to a Password-Protected PDF, you can utilize the Document.SaveToFile(string fileName, ToPdfParameterList paramList) method, where the ToPdfParameterList parameter allows you to control the conversion process of a Word document into a PDF format. This includes options such as encrypting the document during the conversion. Here are the specific steps to accomplish this task.
- Create a Document object.
- Load a sample Word document using Document.LoadFromFile() method.
- Create a ToPdfParameterList object, which is used to set conversion options.
- Specify the open password and permission password and then set both passwords for the generated PDF using ToPdfParameterList.PdfSecurity.Encrypt() method.
- Save the Word document to PDF with password using Doucment.SaveToFile(string fileName, ToPdfParameterList paramList) method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx") # Create a ToPdfParameterList object parameter = ToPdfParameterList() # Specify open password and permission password openPsd = "abc-123" permissionPsd = "permission" # Protect the PDF to be generated with open password and permission password parameter.PdfSecurity.Encrypt(openPsd, permissionPsd, PdfPermissionsFlags.Default, PdfEncryptionKeySize.Key128Bit) # Save the Word document to PDF document.SaveToFile("output/ToPdfWithPassword.pdf", parameter) document.Close()
Convert Word to PDF with Bookmarks in Python
Adding bookmarks to a document can improve its readability. When creating PDF from Word, you may want to keep the existing bookmarks or create new ones based on the headings. Here are the steps to convert Word to PDF while maintaining bookmarks.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Create a ToPdfParameterList object, which is used to set conversion options.
- Create bookmarks in PDF from the headings in Word by setting ToPdfParameterList.CreateWordBookmarksUsingHeadings to true.
- Save the document to PDF with bookmarks using Doucment.SaveToFile(string fileName, ToPdfParameterList paramList) method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx") # Create a ToPdfParameterList object parames = ToPdfParameterList() # Create bookmarks from Word headings parames.CreateWordBookmarksUsingHeadings = True # Create bookmarks in PDF from existing bookmarks in Word # parames.CreateWordBookmarks = True # Save the document to PDF document.SaveToFile("output/ToPdfWithBookmarks.pdf", FileFormat.PDF) document.Close()
Convert Word to PDF with Fonts Embedded in Python
To ensure consistent appearance of a PDF document on any device, you probably need to embed fonts in the generated PDF document. The following are the steps to embed the fonts used in a Word document into the resulting PDF.
- Create a Document object.
- Load a sample Word file using Document.LoadFromFile() method.
- Create a ToPdfParameterList object, which is used to set conversion options.
- Embed fonts in generated PDF through ToPdfParameterList.IsEmbeddedAllFonts property.
- Save the document to PDF using Doucment.SaveToFile(string fileName, ToPdfParameterList paramList) method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx") # Create a ToPdfParameterList object parameter = ToPdfParameterList() # Embed fonts in PDF parameter.IsEmbeddedAllFonts = True # Save the Word document to PDF document.SaveToFile("output/EmbedFonts.pdf", parameter) document.Close()
Set Image Quality When Converting Word to PDF in Python
When converting a Word document to PDF, it is important to consider the size of the resulting file, especially if it contains numerous high-quality images. You have the option to compress the image quality during the conversion process. To do this, follow the steps below.
- Create a Document object.
- Load a sample Word file using Document.LoadFromFile() method.
- Set the image quality through Document.JPEGQuality property.
- Save the document to PDF using Doucment.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx") # Compress image to 40% of its original quality document.JPEGQuality = 40 # Preserve original image quality # document.JPEGQuality = 100 # Save the Word document to PDF document.SaveToFile("output/SetImageQuality.pdf", FileFormat.PDF) document.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.