Python: Extract Tables from PowerPoint Presentations

PowerPoint presentations often serve as repositories of essential data and information shared during meetings, lectures, and conferences. They frequently include tables for data presentation and basic analysis. However, to further analyze the data or integrate it into reports and spreadsheets, it becomes necessary to extract these tables and save them in other formats. By leveraging Python, users can efficiently extract tables from PowerPoint presentations, transforming static slides into dynamic data sets ready for processing.

This article aims to demonstrate how to extract tables from PowerPoint presentations and write them to text and Excel worksheets using Spire.Presentation for Python, thereby enhancing the utilization of data in presentations and streamlining the data extraction process.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Presentation

If you are unsure how to install, please refer to: How to Install Spire.Presentation for Python on Windows

Extract Table Data from PowerPoint Presentations to Text Files

Spire.Presentation for Python provides the ITable class which represents a table in a presentation slide. By iterating through the shapes in each slide to check if it’s an instance of ITable class, developers can retrieve all the tables in the presentation file and get the data in the tables.

The detailed steps for extracting tables from PowerPoint presentations and writing them to text files are as follows:

  • Create an instance of Presentation class and load a PowerPoint file using Presentation.LoadFromFile() method.
  • Iterate through all the slides in the file and then all the shapes in the slides.
  • Check if a shape is an instance of ITable class. If it is, iterate through the rows and then the cells in each row. Get the cell values using TableRow[].TextFrame.Text property and append them to strings.
  • Write the table data to text files.
  • Python
from spire.presentation import *
from spire.presentation.common import *

# Create an instance of Presentation
presentation = Presentation()

# Load a PowerPoint file
presentation.LoadFromFile("Sample.pptx")

tables = []
# Iterate through all the slides
for slide in presentation.Slides:
    # Iterate through all the shapes
    for shape in slide.Shapes:
        # Check whether the shape is a table
        if isinstance(shape, ITable):
            tableData = ""
            # Iterate through all the rows
            for row in shape.TableRows:
                rowData = ""
                # Iterate through all the cells in the row
                for i in range(0, row.Count):
                    # Get the cell value
                    cellValue = row[i].TextFrame.Text
                    rowData += (cellValue + "\t" if i < row.Count - 1 else cellValue)
                tableData += (rowData + "\n")
            tables.append(tableData)

# Write the tables to text files
for idx, table in enumerate(tables, start=1):
    fileName = f"output/Tables/Table-{idx}.txt"
    with open(fileName, "w") as f:
        f.write(table)
presentation.Dispose()

Python: Extract Tables from PowerPoint Presentations

Extract Table Data from PowerPoint Presentations to Excel Worksheets

After extracting table data from presentations using Spire.Presentation for Python, developers can further utilize Spire.XLS for Python to write this data into Excel worksheets, facilitating further analysis, referencing, and format conversion.

Install Spire.XLS for Python via PyPI:

pip install Spire.XLS

The detailed steps for extracting tables from PowerPoint presentations and writing them to Excel worksheets are as follows:

  • Create an instance of Presentation class and load a PowerPoint file using Presentation.LoadFromFile() method.
  • Create an instance of Workbook class and clear the default worksheets.
  • Iterate through the slides in the presentation and then the shapes in the slides to check if the shapes are instances of ITable class. Append all the ITable instances to a list.
  • Iterate through the tables in the list and add a worksheet to the workbook for each table using Workbook.Worksheets.Add() method.
  • Iterate through the rows of each table and then the cells in the rows to get the cell values through TableRow.TextFrame.Text property. Write the values to the corresponding cells in the worksheet through Worksheet.Range[].Value property.
  • Save the workbook using Workbook.SaveToFile() method.
  • Python
from spire.presentation import *
from spire.presentation.common import *
from spire.xls import *
from spire.xls.common import *

# Create an instance of Presentation
presentation = Presentation()

# Load a PowerPoint file
presentation.LoadFromFile("Sample.pptx")

# Create an Excel file and clear the default worksheets
workbook = Workbook()
workbook.Worksheets.Clear()

tables = []
# Iterate through all the slides
for slide in presentation.Slides:
    # Iterate through all the shapes
    for shape in slide.Shapes:
        # Check whether the shape is a table
        if isinstance(shape, ITable):
            tables.append(shape)

# Iterate through all the tables
for t in range(len(tables)):
    table = tables[t]
    sheet = workbook.Worksheets.Add(f"Sheet-{t+1}")
    for i in range(0, table.TableRows.Count):
        row = table.TableRows[i]
        for j in range(0, row.Count):
            sheet.Range[i + 1, j + 1].Value = row[j].TextFrame.Text
    # Autofit rows and columns
    sheet.AllocatedRange.AutoFitColumns()
    sheet.AllocatedRange.AutoFitRows()

# Save the Excel file
workbook.SaveToFile("output/PresentationTables.xlsx", FileFormat.Version2016)

presentation.Dispose()
workbook.Dispose()

Python: Extract Tables from PowerPoint Presentations

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.