Python: Extract Text from PowerPoint Presentations

2024-03-08 02:51:29 Written by  support iceblue
Rate this item
(0 votes)

Directly extracting text has emerged as a crucial method for obtaining textual information from information-dense PowerPoint presentations. By utilizing Python programs, users can conveniently and quickly access the content within slides, enabling efficient collection of information and further data processing. This article shows how to use Spire.Presentation for Python to extract text from PowerPoint presentations, including text in slides, speaker notes, and comments.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Presentation

If you are unsure how to install, please refer to: How to Install Spire.Presentation for Python on Windows

Extract Text from Presentation Slides with Python

The text within PowerPoint presentation slides is placed within shapes. Therefore, developers can extract the text from the presentation by accessing all the shapes within each slide and extracting the text contained within them. The detailed steps are as follows:

  • Create an object of Presentation class and load PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through the slides in the presentation and then iterate through the shapes in each slide.
  • Check if a shape is an IAutoShape instance. If it is, get the paragraphs in the shape through IAutoShape.TextFrame.Paragraphs property and then get the text in the paragraphs through Paragraph.Text property.
  • Write the slide text to a text file.
  • Python
from spire.presentation import *
from spire.presentation.common import *

# Create an object of Presentation class
pres = Presentation()

# Load a PowerPoint presentation
pres.LoadFromFile("Sample.pptx")

text = []
# Loop through each slide
for slide in pres.Slides:
    # Loop through each shape
    for shape in slide.Shapes:
        # Check if the shape is an IAutoShape instance
        if isinstance(shape, IAutoShape):
            # Extract the text from the shape
            for paragraph in (shape if isinstance(shape, IAutoShape) else None).TextFrame.Paragraphs:
                text.append(paragraph.Text)

# Write the text to a text file
f = open("output/SlideText.txt","w", encoding = 'utf-8')
for s in text:
    f.write(s + "\n")
f.close()
pres.Dispose()

Python: Extract Text from PowerPoint Presentations

Extract Text from Speaker Notes with Python

Speaker notes are additional information that provides guidance to the presenter and are not visible to the audience. The text in speaker notes of each slide is stored in the notes slide and developers can extract the text through NotesSlide.NotesTextFrame.Text property. The detailed steps for extracting text in speaker notes are as follows:

  • Create an object of Presentation class and load PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through each slide.
  • Get the note slide through ISlide.NotesSlide property and retrieve the text through NotesSlide.NotesTextFrame.Text property.
  • Write the speaker note text to a text file.
  • Python
from spire.presentation import *
from spire.presentation.common import *

# Create an object of Presentation class
pres = Presentation()

# Load a PowerPoint presentation
pres.LoadFromFile("Sample.pptx")

list = []
# Iterate through each slide
for slide in pres.Slides:
    # Get the notes slide
    notesSlide = slide.NotesSlide
    # Get the notes
    notes = notesSlide.NotesTextFrame.Text
    list.append(notes)

# Write the notes to a text file
f = open("output/SpeakerNoteText.txt", "w", encoding="utf-8")
for note in list:
    f.write(note)
    f.write("\n")
f.close()
pres.Dispose()

Python: Extract Text from PowerPoint Presentations

Extract Text from Presentation Comments with Python

With Spire.Presentation for Python, developers can also extract the text from comments in PowerPoint presentations by getting comments from slides with ISlide.Comments property and retrieving text from comments with Comment.Text property. The detailed steps are as follows:

  • Create an object of Presentation class and load PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through each slide and get the comment from each slide through ISlide.Comments property.
  • Iterate through each comment and retrieve the text from each comment through Comment.Text property.
  • Write the comment text to a text file.
  • Python
from spire.presentation import *
from spire.presentation.common import *

# Create an object of Presentation class
pres = Presentation()

# Load a PowerPoint presentation
pres.LoadFromFile("Sample.pptx")

list = []
# Iterate through all slides
for slide in pres.Slides:
    # Get all comments from the slide
    comments = slide.Comments
    # Iterate through the comments
    for comment in comments:
        # Get the comment text
        commentText = comment.Text
        list.append(commentText)

# Write the comments to a text file
f = open("output/CommentText.txt", "w", encoding="utf-8")
for i in range(len(list)):
    f.write(list[i] + "\n")
f.close()
pres.Dispose()

Python: Extract Text from PowerPoint Presentations

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

  • tutorial_title:
Last modified on Thursday, 25 April 2024 02:20