MS Word allows users to view hyperlinks but lacks a built-in feature for extracting hyperlinks with a single click. This limitation makes extracting multiple links from a document time-consuming. Thankfully, Python can streamline this process significantly. In this article, we'll show you how to use Spire.Doc for Python to easily extract hyperlinks from Word documents with Python, either individual or batch, saving you time and effort.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows.

Extract Hyperlinks from Word Documents: Specified Links

Whether you're looking to retrieve just one important link or filter out certain URLs, this section will guide you through the process step by step. Using the Filed.FiledText and the Filed.Code properties provided by Spire.Doc, you can efficiently target and extract specified hyperlinks, making it easier to access the information you need.
Steps to extract specified hyperlinks from Word documents:

  • Create an instance of Document class.
  • Read a Word document from files using Document.LoadFromFile() method.
  • Iterate through elements to find all hyperlinks in this Word document.
  • Get a certain hyperlink from the hyperlink collection.
  • Retrieve the hyperlink text with Field.FieldText property.
  • Extract URLs from the hyperlink in the Word document using Field.Code property.

Here is the code example of extracting the first hyperlink in a Word document:

  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("/sample.docx")

# Find all hyperlinks in the Word document
hyperlinks = []
for i in range(doc.Sections.Count):
    section = doc.Sections.get_Item(i)
    for j in range(section.Body.ChildObjects.Count):
        sec = section.Body.ChildObjects.get_Item(j)
        if sec.DocumentObjectType == DocumentObjectType.Paragraph:
            for k in range((sec if isinstance(sec, Paragraph) else None).ChildObjects.Count):
                para = (sec if isinstance(sec, Paragraph) else None).ChildObjects.get_Item(k)
                if para.DocumentObjectType == DocumentObjectType.Field:
                    field = para if isinstance(para, Field) else None
                    if field.Type == FieldType.FieldHyperlink:
                        hyperlinks.append(field)

# Get the first hyperlink text and URL
if hyperlinks:
    first_hyperlink = hyperlinks[0]
    hyperlink_text = first_hyperlink.FieldText
    hyperlink_url = first_hyperlink.Code.split('HYPERLINK ')[1].strip('"')  
   
    # Save to a text file
    with open("/FirstHyperlink.txt", "w") as file:
        file.write(f"Text: {hyperlink_text}\nURL: {hyperlink_url}\n")

# Close the document
doc.Close()

extract the first hyperlink from a word document

Extract All Hyperlinks from Word Documents

After checking out how to extract specified hyperlinks, let's move on to extracting all hyperlinks from your Word documents. This is especially helpful when you need a list of all links, whether to check for broken ones or for other purposes. By automating this process with Spire.Doc(short for Spire Doc for Python), you can save time and ensure accuracy. Let's take a closer look at the steps and code example. Steps to extract all hyperlinks from Word documents:

  • Create a Document object.
  • Load a Word document from the local storage with Document.LoadFromFile() method.
  • Loop through elements to find all hyperlinks in the Word document.
  • Iterate through all hyperlinks in the collection.
  • Use Field.FieldText property to extract the hyperlink text from each link.
  • Use Field.Code property to get URLs from hyperlinks.

Below is a code example of extracting all hyperlinks from a Word document:

  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("/sample.docx")

# Find all hyperlinks in the Word document
hyperlinks = []
for i in range(doc.Sections.Count):
    section = doc.Sections.get_Item(i)
    for j in range(section.Body.ChildObjects.Count):
        sec = section.Body.ChildObjects.get_Item(j)
        if sec.DocumentObjectType == DocumentObjectType.Paragraph:
            for k in range((sec if isinstance(sec, Paragraph) else None).ChildObjects.Count):
                para = (sec if isinstance(sec, Paragraph) else None).ChildObjects.get_Item(k)
                if para.DocumentObjectType == DocumentObjectType.Field:
                    field = para if isinstance(para, Field) else None
                    if field.Type == FieldType.FieldHyperlink:
                        hyperlinks.append(field)

# Save all hyperlinks text and URL to a text file
with open("/AllHyperlinks.txt", "w") as file:
    for i, hyperlink in enumerate(hyperlinks):
        hyperlink_text = hyperlink.FieldText
        hyperlink_url = hyperlink.Code.split('HYPERLINK ')[1].strip('"')
        file.write(f"Hyperlink {i+1}:\nText: {hyperlink_text}\nURL: {hyperlink_url}\n\n")

# Close the document
doc.Close()

extract all hyperlinks from word documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Hyperlink

PowerPoint presentations often contain hyperlinks that guide audiences to additional resources or locations within the presentations. While these links can be useful for providing further information and easy navigation, there are instances where they may detract from the presentation's flow or compromise its professional appearance. Those invalid or unnecessary links in slides can be easily removed using Python, enhancing the overall quality of the presentations.

This article will show how Spire.Presentation for Python can be utilized to remove hyperlinks from PowerPoint presentations efficiently.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Presentation

If you are unsure how to install, please refer to: How to Install Spire.Presentation for Python on Windows

Remove Hyperlinks from Text in PowerPoint Slides

The normal text in a PowerPoint presentation is contained in auto shapes. Developers can access the text ranges within these shapes using the IAutoShape.TextFrame.Paragraphs[].TextRanges[] property and read or set the hyperlinks on them using the TextRange.ClickAction property. Hyperlinks on text can be removed by setting the TextRange.ClickAction property to None.

The detailed steps are as follows:

  • Create an instance of Presentation class and load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through the slides in the presentation and then iterate through the shapes in the slides.
  • Check if the shape is an instance of IAutoShape. If it is, iterate through the paragraphs in the shape, and then the text ranges in the paragraphs.
  • Check if the TextRange.ClickAction property of a text range is None. If it is not, remove the hyperlink by setting TextRange.ClickAction property to None.
  • Save the presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation import Presentation, IAutoShape, FileFormat

# Create an instance of Presentation
presentation = Presentation()

# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Iterate through the slides in the presentation
for slide in presentation.Slides:
    # Iterate through the shapes
    for shape in slide.Shapes:
        # Check if the shape is an AutoShape instance
        if isinstance(shape, IAutoShape):
            # Iterate through the paragraphs in the shape
            for paragraph in shape.TextFrame.Paragraphs:
                # Iterate through the text ranges in the paragraph
                for textRange in paragraph.TextRanges:
                    # Check if the text range has a hyperlink
                    if textRange.ClickAction is not None:
                        # Remove the hyperlink
                        textRange.ClickAction = None

# Save the presentation
presentation.SaveToFile("output/RemoveSlideTextHyperlink.pptx", FileFormat.Pptx2013)
presentation.Dispose()

Python: Remove Hyperlinks from PowerPoint Presentations

Remove Hyperlinks from All Shapes in PowerPoint Slides

The IShape class represents all types of shapes in a presentation slide, such as auto shapes, images, tables, and more. The hyperlink on all these shapes can be removed by setting the value obtained from the IShape.Click.get_NoAction() method as the value of the shapes’ IShape.Click property. The detailed steps are as follows:

  • Create an instance of Presentation class and load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through the slides in the presentation and then iterate through the shapes in the slides.
  • Check if the IShape.Click property is None. If it is not, remove the hyperlink by setting the property to the result of IShape.Click.get_NoAction() method.
  • Save the presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation import Presentation, FileFormat

# Create an instance of Presentation
presentation = Presentation()

# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Iterate through the slides in the presentation
for slide in presentation.Slides:
    # Iterate through the shapes in the slide
    for shape in slide.Shapes:
        # Check if the shape has a hyperlink
        if shape.Click is not None:
            # Remove the click action
            shape.Click = shape.Click.get_NoAction()

# Save the presentation
presentation.SaveToFile("output/RemoveSlideShapeHyperlink.pptx", FileFormat.Pptx2013)
presentation.Dispose()

Python: Remove Hyperlinks from PowerPoint Presentations

Remove Hyperlinks from Specific Types of Shapes in PowerPoint Slides

In addition to directly removing hyperlinks for all shapes, we can also determine the shape type before removing the hyperlinks to find and remove hyperlinks from shapes of the specified type. The detailed steps are as follows:

  • Create an instance of Presentation class and load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through the slides in the presentation and then iterate through the shapes in the slides.
  • Check if the shape is an instance of IEmbedImage, ITable, or IChart. If it is, check if the IShape.Click property of the shape is None. If it is not, remove the hyperlink by setting the property to the result of IShape.Click.get_NoAction() method.
  • Save the presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation import Presentation, FileFormat, IEmbedImage, ITable, IChart

# Create an instance of Presentation
presentation = Presentation()

# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Iterate through the slides in the presentation
for slide in presentation.Slides:
    # Iterate through the shapes in the slide
    for shape in slide.Shapes:
        # Check if the shape is an embedded image
        if isinstance(shape, (IEmbedImage, ITable, IChart)):
            # check if the click action is not None
            if shape.Click is not None:
                # Remove the click action
                shape.Click = shape.Click.get_NoAction()

# Save the presentation
presentation.SaveToFile("output/RemoveSlideShapeTypeHyperlink.pptx", FileFormat.Pptx2013)
presentation.Dispose()

Python: Remove Hyperlinks from PowerPoint Presentations

Remove Hyperlinks from Table Text in PowerPoint Slides

To remove hyperlinks from text within a table, it is necessary to iterate through the table's cells and the text ranges within each cell. Afterward, the hyperlinks on the text ranges in each cell can be removed by setting the TextRange.ClickAction property to None. The detailed steps are as follows:

  • Create an instance of Presentation class and load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through the slides in the presentation and then iterate through the shapes in the slides.
  • Check if a shape is an instance of ITable class. If it is, iterate through the rows in the table and then the cells in the rows.
  • Iterate through the paragraphs in the cells and then the text ranges in the paragraphs.
  • Check if the TextRange.ClickAction property of a text range is None. If it is not, remove the hyperlink by setting the value of the property to None.
  • Save the presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation import Presentation, ITable, FileFormat

# Create an instance of Presentation
presentation = Presentation()

# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Iterate through the slides in the presentation
for slide in presentation.Slides:
    # Iterate through the shapes in the slide
    for shape in slide.Shapes:
        # Check if the shape is a table
        if isinstance(shape, ITable):
            # Get the table
            table = ITable(shape)
            # Iterate through the rows in the table
            for row in table.TableRows:
                # Iterate through the cells in the row
                for cell in row:
                    # Iterate through the paragraphs in the cell
                    for para in cell.TextFrame.Paragraphs:
                        # Iterate through the text ranges in the paragraph
                        for range in para.TextRanges:
                            # Check if the text run contains a hyperlink
                            if range.ClickAction is not None:
                                # Remove the hyperlink
                                range.ClickAction = None

presentation.SaveToFile("output/RemoveSlideTableTextHyperlink.pptx", FileFormat.Pptx2013)
presentation.Dispose()

Python: Remove Hyperlinks from PowerPoint Presentations

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Hyperlink

A hyperlink is a clickable element, usually embedded in text or an image. It can direct users from their current location to a specific location on another web page or document. By adding hyperlinks in PowerPoint presentations, users can easily visit other related pages or slides while presenting slides. In this article, we will demonstrate how to add hyperlinks to PowerPoint presentations in Python using Spire.Presentation for Python.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Presentation

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Presentation for Python on Windows

Add Hyperlink to Text on Slide in Python

Spire.Presentation for Python allows users to insert hyperlinks to text on slides easily by using TextRange.ClickAction.Address property. The following are detailed steps.

  • Create a new PowerPoint presentation.
  • Set the background for the first slide of the presentation by using Presentation.Slides[].Shapes.AppendEmbedImageByPath() method.
  • Add a new shape to this slide using Presentation.Slides[].Shapes.AppendShape() method.
  • Add some paragraphs to it by calling TextParagraph.TextRanges.Append() method.
  • Create another TextRange instance to represent a text range and set link address for it by TextRange.ClickAction.Address property.
  • Set the font for these paragraphs.
  • Save the result file using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *
import math

outputFile = "C:/Users/Administrator/Desktop/AddHyperlinkToText.pptx"

#Create a new PowerPoint presentation
presentation = Presentation()

#Set the background for the first slide
ImageFile = "C:/Users/Administrator/Desktop/background.png"
rect = RectangleF.FromLTRB (0, 0, presentation.SlideSize.Size.Width, presentation.SlideSize.Size.Height)
presentation.Slides[0].Shapes.AppendEmbedImageByPath (ShapeType.Rectangle, ImageFile, rect)

#Add a new shape to the first slide
shape = presentation.Slides[0].Shapes.AppendShape(ShapeType.Rectangle, RectangleF.FromLTRB (80, 250, 650, 400))
shape.Fill.FillType = FillFormatType.none
shape.ShapeStyle.LineColor.Color = Color.get_White()

#Add some paragraphs to the shape
para1 = TextParagraph()
tr = TextRange("Spire.Presentation for Python")
tr.Fill.FillType = FillFormatType.Solid
tr.Fill.SolidColor.Color = Color.get_Black()
para1.TextRanges.Append(tr)
para1.Alignment = TextAlignmentType.Left
shape.TextFrame.Paragraphs.Append(para1)
shape.TextFrame.Paragraphs.Append(TextParagraph())

para2 = TextParagraph()
tr1 = TextRange("This is a professional presentation processing API that is highly compatible with PowerPoint."
                +"It supports developers to process PowerPoint presentations efficiently without installing Microsoft PowerPoint.")
tr1.Fill.FillType = FillFormatType.Solid
tr1.Fill.SolidColor.Color = Color.get_Black()
para2.TextRanges.Append(tr1)
shape.TextFrame.Paragraphs.Append(para2)
shape.TextFrame.Paragraphs.Append(TextParagraph())

#Add text with a hyperlink
para3 = TextParagraph()
tr2 = TextRange("Click to know more about Spire.Presentation for Python.")
tr2.ClickAction.Address = "https://www.e-iceblue.com/Introduce/presentation-for-python.html"
para3.TextRanges.Append(tr2)
shape.TextFrame.Paragraphs.Append(para3)
shape.TextFrame.Paragraphs.Append(TextParagraph())

#Set the font for those paragraphs
for para in shape.TextFrame.Paragraphs:
    if len(para.Text) != 0:
        para.TextRanges[0].LatinFont = TextFont("Arial")
        para.TextRanges[0].FontHeight = 16

#Save the result file
presentation.SaveToFile(outputFile, FileFormat.Pptx2010)
presentation.Dispose()

Python: Add Hyperlinks to PowerPoint Presentations

Add Hyperlink to Image on Slide in Python

Spire.Presentation for Python also supports adding a hyperlink to an image. You can create a hyperlink by ClickHyperlink class and then add it to the image using the IEmbedImage.Click property. The related steps are as follows.

  • Create a new PowerPoint presentation.
  • Load a PowerPoint file using Presentation.LoadFromFile() method.
  • Get the first slide by using Presentation.Slides[] property.
  • Add an image to this slide using ISlide.Shapes.AppendEmbedImageByPath() method.
  • Create a ClickHyperlink object and append the hyperlink to the added image using IEmbedImage.Click property.
  • Save the result file using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

inputFile = "C:/Users/Administrator/Desktop/AddHyperlinkToText.pptx"
outputFile = "C:/Users/Administrator/Desktop/AddHyperlinkToImage.pptx"

#Create a new PowerPoint presentation
presentation = Presentation()

#Load a sample file from disk
presentation.LoadFromFile(inputFile)

#Get the first slide
slide = presentation.Slides[0]

#Add an image to this slide
rect = RectangleF.FromLTRB (80, 80, 240, 240)
image = slide.Shapes.AppendEmbedImageByPath (ShapeType.Rectangle, "image.png", rect)

#Add a hyperlink to the image
hyperlink = ClickHyperlink("https://www.e-iceblue.com/Introduce/presentation-for-python.html")
image.Click = hyperlink

#Save the result file
presentation.SaveToFile(outputFile, FileFormat.Pptx2013)
presentation.Dispose()

Python: Add Hyperlinks to PowerPoint Presentations

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Hyperlink