PowerPoint presentations often serve as repositories of essential data and information shared during meetings, lectures, and conferences. They frequently include tables for data presentation and basic analysis. However, to further analyze the data or integrate it into reports and spreadsheets, it becomes necessary to extract these tables and save them in other formats. By leveraging Python, users can efficiently extract tables from PowerPoint presentations, transforming static slides into dynamic data sets ready for processing.
This article aims to demonstrate how to extract tables from PowerPoint presentations and write them to text and Excel worksheets using Spire.Presentation for Python, thereby enhancing the utilization of data in presentations and streamlining the data extraction process.
- Extract Table Data from PowerPoint Presentations to Text Files
- Extract Table Data from PowerPoint Presentations to Excel Worksheets
Install Spire.Presentation for Python
This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Presentation
If you are unsure how to install, please refer to: How to Install Spire.Presentation for Python on Windows
Extract Table Data from PowerPoint Presentations to Text Files
Spire.Presentation for Python provides the ITable class which represents a table in a presentation slide. By iterating through the shapes in each slide to check if it’s an instance of ITable class, developers can retrieve all the tables in the presentation file and get the data in the tables.
The detailed steps for extracting tables from PowerPoint presentations and writing them to text files are as follows:
- Create an instance of Presentation class and load a PowerPoint file using Presentation.LoadFromFile() method.
- Iterate through all the slides in the file and then all the shapes in the slides.
- Check if a shape is an instance of ITable class. If it is, iterate through the rows and then the cells in each row. Get the cell values using TableRow[].TextFrame.Text property and append them to strings.
- Write the table data to text files.
- Python
from spire.presentation import * from spire.presentation.common import * # Create an instance of Presentation presentation = Presentation() # Load a PowerPoint file presentation.LoadFromFile("Sample.pptx") tables = [] # Iterate through all the slides for slide in presentation.Slides: # Iterate through all the shapes for shape in slide.Shapes: # Check whether the shape is a table if isinstance(shape, ITable): tableData = "" # Iterate through all the rows for row in shape.TableRows: rowData = "" # Iterate through all the cells in the row for i in range(0, row.Count): # Get the cell value cellValue = row[i].TextFrame.Text rowData += (cellValue + "\t" if i < row.Count - 1 else cellValue) tableData += (rowData + "\n") tables.append(tableData) # Write the tables to text files for idx, table in enumerate(tables, start=1): fileName = f"output/Tables/Table-{idx}.txt" with open(fileName, "w") as f: f.write(table) presentation.Dispose()
Extract Table Data from PowerPoint Presentations to Excel Worksheets
After extracting table data from presentations using Spire.Presentation for Python, developers can further utilize Spire.XLS for Python to write this data into Excel worksheets, facilitating further analysis, referencing, and format conversion.
Install Spire.XLS for Python via PyPI:
pip install Spire.XLS
The detailed steps for extracting tables from PowerPoint presentations and writing them to Excel worksheets are as follows:
- Create an instance of Presentation class and load a PowerPoint file using Presentation.LoadFromFile() method.
- Create an instance of Workbook class and clear the default worksheets.
- Iterate through the slides in the presentation and then the shapes in the slides to check if the shapes are instances of ITable class. Append all the ITable instances to a list.
- Iterate through the tables in the list and add a worksheet to the workbook for each table using Workbook.Worksheets.Add() method.
- Iterate through the rows of each table and then the cells in the rows to get the cell values through TableRow.TextFrame.Text property. Write the values to the corresponding cells in the worksheet through Worksheet.Range[].Value property.
- Save the workbook using Workbook.SaveToFile() method.
- Python
from spire.presentation import * from spire.presentation.common import * from spire.xls import * from spire.xls.common import * # Create an instance of Presentation presentation = Presentation() # Load a PowerPoint file presentation.LoadFromFile("Sample.pptx") # Create an Excel file and clear the default worksheets workbook = Workbook() workbook.Worksheets.Clear() tables = [] # Iterate through all the slides for slide in presentation.Slides: # Iterate through all the shapes for shape in slide.Shapes: # Check whether the shape is a table if isinstance(shape, ITable): tables.append(shape) # Iterate through all the tables for t in range(len(tables)): table = tables[t] sheet = workbook.Worksheets.Add(f"Sheet-{t+1}") for i in range(0, table.TableRows.Count): row = table.TableRows[i] for j in range(0, row.Count): sheet.Range[i + 1, j + 1].Value = row[j].TextFrame.Text # Autofit rows and columns sheet.AllocatedRange.AutoFitColumns() sheet.AllocatedRange.AutoFitRows() # Save the Excel file workbook.SaveToFile("output/PresentationTables.xlsx", FileFormat.Version2016) presentation.Dispose() workbook.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.