Python: Load and Save PDFs with Byte Streams

2024-09-06 00:58:47 Written by  support iceblue
Rate this item
(0 votes)

Handling PDF documents using bytes and bytearray provides an efficient and flexible approach within applications. By processing PDFs directly as byte streams, developers can manage documents in memory or transfer them over networks without the need for temporary file storage, optimizing space and improving overall application performance. This method also facilitates seamless integration with web services and APIs. Additionally, using bytearray allows developers to make precise byte-level modifications to PDF documents.

This article will demonstrate how to save PDFs as bytes and bytearray and load PDFs from bytes and bytearray using Spire.PDF for Python, offering practical examples for Python developers.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to: How to Install Spire.PDF for Python on Windows

Create a PDF Document and Save It to Bytes and Bytearray

Developers can create PDF documents using the classes and methods provided by Spire.PDF for Python, save them to a Stream object, and then convert it to an immutable bytes object or a mutable bytearray object. The Stream object can also be used to perform byte-level operations.

The detailed steps are as follows:

  • Create an object of PdfDocument class to create a PDF document.
  • Add a page to the document and draw text on the page.
  • Save the document to a Stream object using PdfDocument.SaveToStream() method.
  • Convert the Stream object to a bytes object using Stream.ToArray() method.
  • The bytes object can be directly converted to a bytearray object.
  • Afterward, the byte streams can be used for further operations, such as writing them to a file using the BinaryIO.write() method.
  • Python
from spire.pdf import *

# Create an instance of PdfDocument class
pdf = PdfDocument()

# Set the page size and margins of the document
pageSettings = pdf.PageSettings
pageSettings.Size = PdfPageSize.A4()
pageSettings.Margins.Top = 50
pageSettings.Margins.Bottom = 50
pageSettings.Margins.Left = 40
pageSettings.Margins.Right = 40

# Add a new page to the document
page = pdf.Pages.Add()

# Create fonts and brushes for the document content
titleFont = PdfTrueTypeFont("HarmonyOS Sans SC", 16.0, PdfFontStyle.Bold, True)
titleBrush = PdfBrushes.get_Brown()
contentFont = PdfTrueTypeFont("HarmonyOS Sans SC", 13.0, PdfFontStyle.Regular, True)
contentBrush = PdfBrushes.get_Black()

# Draw the title on the page
titleText = "Brief Introduction to Cloud Services"
titleSize = titleFont.MeasureString(titleText)
page.Canvas.DrawString(titleText, titleFont, titleBrush, PointF(0.0, 30.0))

# Draw the body text on the page
contentText = ("Cloud computing is a service model where computing resources are provided over the internet on a pay-as-you-go basis. "
               "It is a type of infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS) model. "
               "Cloud computing is typically offered througha subscription-based model, where users pay for access to the cloud resources on a monthly, yearly, or other basis.")
# Set the string format of the body text
contentFormat = PdfStringFormat()
contentFormat.Alignment = PdfTextAlignment.Justify
contentFormat.LineSpacing = 20.0
# Create a TextWidget object with the body text and apply the string format
textWidget = PdfTextWidget(contentText, contentFont, contentBrush)
textWidget.StringFormat = contentFormat
# Create a TextLayout object and set the layout options
textLayout = PdfTextLayout()
textLayout.Layout = PdfLayoutType.Paginate
textLayout.Break = PdfLayoutBreakType.FitPage
# Draw the TextWidget on the page
rect = RectangleF(PointF(0.0, titleSize.Height + 50.0), page.Canvas.ClientSize)
textWidget.Draw(page, rect, textLayout)

# Save the PDF document to a Stream object
pdfStream = Stream()
pdf.SaveToStream(pdfStream)

# Convert the Stream object to a bytes object
pdfBytes = pdfStream.ToArray()

# Convert the Stream object to a bytearray object
pdfBytearray = bytearray(pdfStream.ToArray())

# Write the byte stream to a file
with open("output/PDFBytearray.pdf", "wb") as f:
    f.write(pdfBytearray)

Python: Load and Save PDFs with Byte Streams

Load a PDF Document from Byte Streams

Developers can use a bytes object of a PDF file to create a stream and load it using the PdfDocument.LoadFromStream() method. Once the PDF document is loaded, various operations such as reading, modifying, and converting the PDF can be performed. The following is an example of the steps:

  • Create a bytes object with a PDF file.
  • Create a Stream object using the bytes object.
  • Load the Stream object as a PDF document using PdfDocument.LoadFromStream() method.
  • Extract the text from the first page of the document and print the text.
  • Python
from spire.pdf import *

# Create a byte array from a PDF file
with open("Sample.pdf", "rb") as f:
    byteData = f.read()

# Create a Stream object from the byte array
stream = Stream(byteData)

# Load the Stream object as a PDF document
pdf = PdfDocument(stream)

# Get the text from the first page
page = pdf.Pages.get_Item(0)
textExtractor = PdfTextExtractor(page)
extractOptions = PdfTextExtractOptions()
extractOptions.IsExtractAllText = True
text = textExtractor.ExtractText(extractOptions)

# Print the text
print(text)

Python: Load and Save PDFs with Byte Streams

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

  • tutorial_title:
Last modified on Friday, 06 September 2024 01:07