Python: Remove Attachments from a PDF Document

2024-01-25 01:05:34 Written by  support iceblue
Rate this item
(0 votes)

The inclusion of attachments in a PDF can be useful for sharing related files or providing additional context and resources alongside the main document. However, there may be instances when you need to remove attachments from a PDF for reasons like reducing file size, protecting sensitive information, or simply decluttering the document. In this article, you will learn how to remove attachments from a PDF document in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Prerequisite Knowledge

There are typically two types of attachments in PDF, document-level attachments and annotation attachments. The following table lists the differences between them and their representations in Spire.PDF.

Attachment type Represented by Definition
Document level attachment PdfAttachment class A file attached to a PDF at the document level won't appear on a page, but can be viewed in the "Attachments" panel of a PDF reader.
Annotation attachment PdfAnnotationAttachment class A file attached as an annotation can be found on a page or in the "Attachment" panel. An annotation attachment is shown as a paper clip icon on the page; reviewers can double-click the icon to open the file.

Remove Document-Level Attachments from PDF in Python

To obtain all document-level attachments of a PDF document, use the PdfDocument.Attachments property. Then, you can remove all of them using the Clear() method or selectively remove a specific attachment using the RemoveAt() method. The following are the steps to remove document-level attachments from PDF in Python.

  • Create a PdfDocument object.
  • Load a PDF document using PdfDocument.LoadFromFile() method.
  • Get the attachment collection from the document using PdfDocument.Attachments property.
  • Remove all attachments using PdfAttachmentCollection.Clear() method. To remove a specific attachment, use PdfAttachmentCollection.RemoveAt() method.
  • Save the changes to a different PDF file using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Attachments.pdf")

# Get the attachment collection from the document
attachments = doc.Attachments

# Remove all attachments
attachments.Clear()

# Remove a specific attachment
# attachments.RemoveAt(0)

# Save the changes to file
doc.SaveToFile("output/DeleteAttachments.pdf")

# Close the document
doc.Close()

Remove Annotation Attachments from PDF in Python

Annotations are page-based elements, and to retrieve all annotations from a document, you need to iterate through the pages and obtain the annotations from each page. Next, identify if a particular annotation is an attachment annotation, and finally remove it from the annotation collection using the RemoveAt() method.

The following are the steps to remove annotation attachments from PDF in Python.

  • Create a PdfDocument object.
  • Load a PDF document using PdfDocument.LoadFromFile() method.
  • Iterate through the pages in the document
  • Get the annotation collection from a specific page through PdfPageBase.AnnotationsWidget property.
  • Iterate through the annotations in the collection.
  • Determine if a specific annotation is an instance of PdfAttachmentAnnotationWidget.
  • Remove the attachment annotation using PdfAnnotationCollection.RemoveAt() method.
  • Save the changes to a different PDF file using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\AnnotationAttachment.pdf")

# Iterate through the pages in the document
for i in range(doc.Pages.Count):

    # Get annotation collection from a certain page
    annotationCollection = doc.Pages.get_Item(i).AnnotationsWidget

    if annotationCollection.Count > 0:

        # Iterate through the annotation in the collection
        for j in range(annotationCollection.Count):

            # Get a specific annotation
            annotation = annotationCollection.get_Item(j)

            # Determine if it is an attachment annotation
            if isinstance(annotation, PdfAttachmentAnnotationWidget):

                # Remove the annotation
                annotationCollection.RemoveAt(j)

# Save the changes to file
doc.SaveToFile("output/DeleteAnnotationAttachment.pdf")

# Close the document
doc.Close()

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

  • tutorial_title:
Last modified on Thursday, 25 April 2024 02:27