Python: Modify or Remove Hyperlinks in PDF

Hyperlinks in PDF documents are commonly used tools for navigating to internal or external related information. However, these links need to be accurate and up-to-date in order to be effective. Document editors are supposed to have the power to change or delete hyperlinks to update outdated references, rectify errors, comply with evolving web standards, or enhance accessibility. This article will demonstrate how to use Spire.PDF for Python to modify or remove hyperlinks in PDF documents to ensure accurate information dissemination, seamless navigation, and inclusive user experience.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Update the Target Address of Hyperlinks in PDF Using Python

In PDF documents, hyperlinks are annotations displayed on the linked content on a page. Therefore, to modify hyperlinks in PDF documents, it is needed to retrieve all annotations on a page through PdfPageBase.AnnotationsWidget property. Then, a specific hyperlink annotation can be obtained from the annotation list and the target address can be updated through PdfTextWebLinkAnnotationWidget.Url property. The detailed steps are as follows:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get a page of the document using PdfDocument.Pages.get_Item() method.
  • Get all annotations on the page through PdfPageBase.AnnotationsWidget property.
  • Get a hyperlink annotation and cast it to a PdfTextWebLinkAnnotationWidget object.
  • Set a new target address for the hyperlink annotation through PdfTextWebLinkAnnotationWidget.Url property.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create an object of PdfDocument class and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Get the first page of the document
page = pdf.Pages.get_Item(0)

# Get all annotations on the page
widgetCollection = page.AnnotationsWidget

# Get the second hyperlink annotation
annotation = widgetCollection.get_Item(1)

# Cast the hyperlink annotation to a PdfTextWebLinkAnnotationWidget object
link = PdfTextWebLinkAnnotationWidget(annotation)

# Set a new target address for the second hyperlink
link.Url = "https://www.mcafee.com/learn/understanding-trojan-viruses-and-how-to-get-rid-of-them/"

#Save the document
pdf.SaveToFile("output/ModifyPDFHyperlink.pdf")
pdf.Close()

Python: Modify or Remove Hyperlinks in PDF

Remove Hyperlinks in PDF Documents Using Python

Spire.PDF for Python enables developers to effortlessly remove specific hyperlinks on a page using the PdfPageBase.AnnotationsWidget.RemoveAt() method. Additionally, developers can also iterate through each page and its annotations to identify and eliminate all hyperlink annotations in the entire PDF document with the help of Spire.PDF for Python. The detailed steps are as follows:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • To remove a specific hyperlink, get a page in the document using PdfDocument.Pages.get_Item() method and remove the hyperlink annotation using PdfPageBase.AnnotationsWidget.RemoveAt() method.
  • To remove all hyperlinks in the document, loop through the pages in the document to get the annotations on each page through PdfPageBase.AnnotationsWidget property.
  • Loop through the annotations to check if each annotation is an instance of PdfTextWebLinkAnnotationWidget class. If it is, remove it using PdfAnnotationCollection.Remove() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# # Create an object of PdfDocument class and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Remove the first hyperlink on the first page
#page = pdf.Pages.get_Item(0)
#page.AnnotationsWidget.RemoveAt(0)

# Remove all hyperlinks
# Loop through the pages in the document
for j in range(pdf.Pages.Count):
    # Get each page
    page = pdf.Pages.get_Item(j)
    # Get the annotations on each page
    annotations = page.AnnotationsWidget
    # Check if there is any annotations on a page
    if annotations.Count > 0:
        # Loop through the annotations
        i = annotations.Count - 1
        while i >=0:
            # Get an annotation
            annotation = annotations.get_Item(i)
            # Check if each annotation is a hyperlink
            if isinstance(annotation, PdfTextWebLinkAnnotationWidget):
                # Remove hyperlink annotations
                annotations.Remove(annotation)
            i -= 1

# Save the document
pdf.SaveToFile("output/RemovePDFHyperlink.pdf")
pdf.Close()

Python: Modify or Remove Hyperlinks in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.