How to Convert PDF to Word in Python

2024-01-22 05:31:06

The main advantage of PDF files is their ability to always maintain the format and layout of the original document, which makes them ideal for sharing and printing. However, they are often difficult to edit or modify without specialized software. In this case, converting PDF to Word provides you with greater flexibility. With this conversion, you can easily make various changes to the document content, such as modifying the text, adding or deleting text, and adjusting formatting and styles to make it meet your needs. In this article, I will show you a simple but effective way to convert PDF to Word via Python.

Python Library for PDF Conversion

Spire.PDF for Python is a powerful PDF manipulation API that allows you to create, modify or convert PDF files on Python platforms. With it, you are able to use Python code to convert PDF to Word effortlessly and set document properties during the conversion. Before that, please install Spire.PDF for Python and plum-dispatch v1.7.4 using the following pip commands.

pip install Spire.PDF

This article covers more details of the installation: How to Install Spire.PDF for Python in VS Code

Convert PDF to DOC in Python

If you want to edit the content of PDF, converting it to Word format first is a good choice. Take PDF to DOC conversion as an example. You only need to load the PDF and save it in DOC format to the desired location.

Steps

  1. Import the necessary library modules.
  2. Create a PdfDocument object.
  3. Use the PdfDocument.LoadFromFile() method to load a PDF file from the specified path.
  4. Call the PdfDocument.SaveToFile() method to save the PDF in Word format, specifying FileFormat as DOC.
  5. Close the PdfDocument object.

Sample Code

  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load a PDF file from the specified path
pdf.LoadFromFile("C:/Users/Administrator/Desktop/Sample.pdf")

# Save the PDF in DOC format
pdf.SaveToFile("C:/Users/Administrator/Desktop/ToDoc.doc", FileFormat.DOC)

# Close the PdfDocument object
pdf.Close()

How to Convert PDF to Word in Python

Convert PDF to DOCX in Python

This method is the same as the one above. You only need to specify the format as DOCX when saving the generated file at the end.

Steps

  1. Import the necessary library modules.
  2. Create a PdfDocument object.
  3. Use the PdfDocument.LoadFromFile() method to load a PDF file from the specified path.
  4. Call the PdfDocument.SaveToFile() method to save the PDF in Word format, specifying FileFormat as DOCX.
  5. Close the PdfDocument object.

Sample Code

  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load a PDF file from the specified path
pdf.LoadFromFile("C:/Users/Administrator/Desktop/Sample.pdf")

# Save the PDF in DOCX format
pdf.SaveToFile("C:/Users/Administrator/Desktop/ToDocx.docx", FileFormat.DOCX)

# Close the PdfDocument object
pdf.Close()

How to Convert PDF to Word in Python

Set Document Properties at Conversion in Python

In addition to regular conversions, you can also customize document properties during PDF to Word conversion. This can help you better categorize and manage your documents.

Steps

  1. Import the required library modules.
  2. Create a PdfToDocConverter object and pass in the path of the PDF file to be converted as a parameter.
  3. Customize the properties of the converted Word document through the properties of PdfToDocConverter class.
  4. Call the PdfToDocConverter.SaveToDocx() method to save the PDF in Word format.

Sample Code

  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfToDocConverter object
converter = PdfToDocConverter("C:/Users/Administrator/Desktop/Sample.pdf")

# Customize the properties for the file
converter.DocxOptions.Title = "World Environment Day"
converter.DocxOptions.Subject = "Promoting Sustainable Actions for a Greener Future."
converter.DocxOptions.Tags = "Environmental Protection"
converter.DocxOptions.Categories = "Environment"
converter.DocxOptions.Commments = "This is an article about environmental protection."
converter.DocxOptions.Authors = "Mark"
converter.DocxOptions.LastSavedBy = "Johnny"
converter.DocxOptions.Revision = 5
converter.DocxOptions.Version = "V4.0"
converter.DocxOptions.ProgramName = "Green Development"
converter.DocxOptions.Company = "New Technology"
converter.DocxOptions.Manager = "Andy"

# Save the PDF in DOCX format
converter.SaveToDocx("C:/Users/Administrator/Desktop/SetProperties.docx")

How to Convert PDF to Word in Python

Get a Free License for the Library to Convert PDF Files

You can get a free 30-day temporary license of Spire.PDF for Python to use Python script to convert PDF to Word without any evaluation limitation.

Conclusion

In this article, you have learned how to convert PDF to Word with Python easily. With Spire.PDF for Python library, you can also create PDF from scratch or edit it as needed. In short, this library simplifies the process and allows developers to focus on creating powerful applications that perform PDF manipulation tasks.

See Also