Python: Convert PDF to Excel

PDF files present information in a fixed-layout format, which makes them ideal for maintaining document integrity. However, this fixed layout can pose challenges when you need to analyze or manipulate the data contained within them. By converting PDF to Excel, you can take advantage of Excel's extensive data manipulation capabilities, such as formulas, conditional formatting, pivot tables, and charts, to efficiently analyze, manipulate, visualize your data, and derive meaningful insights from it. In this article, you will learn how to convert PDF to Excel in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Convert PDF to Excel in Python

To convert PDF documents to Excel using Spire.PDF for Python, you can utilize the PdfDocument.SaveToFile() method. Before converting, you have the ability to specify the conversion options by creating an object of the XlsxLineLayoutOptions class and then applying the conversion options using the PdfDocument.ConvertOptions.SetPdfToXlsxOptions() method. The XlsxLineLayoutOptions class constructor accepts the following five parameters, which allow you to control how your PDF will be converted to Excel:

  • convertToMultipleSheet (bool): Specifies whether to convert each page to a different worksheet in the same Excel. If set to False, only the first page will be converted.
  • rotatedText (bool): Specifies whether to display rotated text.
  • splitCell (bool): Specifies whether to convert text in a PDF cell (spanning more than two lines) to one Excel cell or to multiple cells.
  • wrapText (bool): Specifies whether to wrap text in an Excel cell.
  • overlapText (bool): Specifies whether to display overlapping text.

The following steps explain how to convert a PDF document to Excel XLSX format with specific conversion options using Spire.PDF for Python:

  • Create a PdfDocument object.
  • Load a PDF document using PdfDocument.LoadFromFile() method.
  • Create an XlsxLineLayoutOptions object and pass the corresponding parameters to the constructor of the XlsxLineLayoutOptions class to specify the conversion options.
  • Apply the conversion options using PdfDocument.ConvertOptions.SetPdfToXlsxOptions() method.
  • Save the PDF document to Excel XLSX format using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()
# Load a PDF document
pdf.LoadFromFile("Sample.pdf")

# Create an XlsxLineLayoutOptions object to specify the conversion options
# Parameters: convertToMultipleSheet, rotatedText, splitCell, wrapText, overlapText
convertOptions = XlsxLineLayoutOptions(True, True, False, True, False)

# Set the conversion options
pdf.ConvertOptions.SetPdfToXlsxOptions(convertOptions)

# Save the PDF document to Excel XLSX format
pdf.SaveToFile("PdfToExcel.xlsx", FileFormat.XLSX)
pdf.Close()

Python: Convert PDF to Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.