Handling PDF documents using bytes and bytearray provides an efficient and flexible approach within applications. By processing PDFs directly as byte streams, developers can manage documents in memory or transfer them over networks without the need for temporary file storage, optimizing space and improving overall application performance. This method also facilitates seamless integration with web services and APIs. Additionally, using bytearray allows developers to make precise byte-level modifications to PDF documents.
This article will demonstrate how to save PDFs as bytes and bytearray and load PDFs from bytes and bytearray using Spire.PDF for Python, offering practical examples for Python developers.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to: How to Install Spire.PDF for Python on Windows
Create a PDF Document and Save It to Bytes and Bytearray
Developers can create PDF documents using the classes and methods provided by Spire.PDF for Python, save them to a Stream object, and then convert it to an immutable bytes object or a mutable bytearray object. The Stream object can also be used to perform byte-level operations.
The detailed steps are as follows:
- Create an object of PdfDocument class to create a PDF document.
- Add a page to the document and draw text on the page.
- Save the document to a Stream object using PdfDocument.SaveToStream() method.
- Convert the Stream object to a bytes object using Stream.ToArray() method.
- The bytes object can be directly converted to a bytearray object.
- Afterward, the byte streams can be used for further operations, such as writing them to a file using the BinaryIO.write() method.
- Python
from spire.pdf import * # Create an instance of PdfDocument class pdf = PdfDocument() # Set the page size and margins of the document pageSettings = pdf.PageSettings pageSettings.Size = PdfPageSize.A4() pageSettings.Margins.Top = 50 pageSettings.Margins.Bottom = 50 pageSettings.Margins.Left = 40 pageSettings.Margins.Right = 40 # Add a new page to the document page = pdf.Pages.Add() # Create fonts and brushes for the document content titleFont = PdfTrueTypeFont("HarmonyOS Sans SC", 16.0, PdfFontStyle.Bold, True) titleBrush = PdfBrushes.get_Brown() contentFont = PdfTrueTypeFont("HarmonyOS Sans SC", 13.0, PdfFontStyle.Regular, True) contentBrush = PdfBrushes.get_Black() # Draw the title on the page titleText = "Brief Introduction to Cloud Services" titleSize = titleFont.MeasureString(titleText) page.Canvas.DrawString(titleText, titleFont, titleBrush, PointF(0.0, 30.0)) # Draw the body text on the page contentText = ("Cloud computing is a service model where computing resources are provided over the internet on a pay-as-you-go basis. " "It is a type of infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS) model. " "Cloud computing is typically offered througha subscription-based model, where users pay for access to the cloud resources on a monthly, yearly, or other basis.") # Set the string format of the body text contentFormat = PdfStringFormat() contentFormat.Alignment = PdfTextAlignment.Justify contentFormat.LineSpacing = 20.0 # Create a TextWidget object with the body text and apply the string format textWidget = PdfTextWidget(contentText, contentFont, contentBrush) textWidget.StringFormat = contentFormat # Create a TextLayout object and set the layout options textLayout = PdfTextLayout() textLayout.Layout = PdfLayoutType.Paginate textLayout.Break = PdfLayoutBreakType.FitPage # Draw the TextWidget on the page rect = RectangleF(PointF(0.0, titleSize.Height + 50.0), page.Canvas.ClientSize) textWidget.Draw(page, rect, textLayout) # Save the PDF document to a Stream object pdfStream = Stream() pdf.SaveToStream(pdfStream) # Convert the Stream object to a bytes object pdfBytes = pdfStream.ToArray() # Convert the Stream object to a bytearray object pdfBytearray = bytearray(pdfStream.ToArray()) # Write the byte stream to a file with open("output/PDFBytearray.pdf", "wb") as f: f.write(pdfBytearray)
Load a PDF Document from Byte Streams
Developers can use a bytes object of a PDF file to create a stream and load it using the PdfDocument.LoadFromStream() method. Once the PDF document is loaded, various operations such as reading, modifying, and converting the PDF can be performed. The following is an example of the steps:
- Create a bytes object with a PDF file.
- Create a Stream object using the bytes object.
- Load the Stream object as a PDF document using PdfDocument.LoadFromStream() method.
- Extract the text from the first page of the document and print the text.
- Python
from spire.pdf import * # Create a byte array from a PDF file with open("Sample.pdf", "rb") as f: byteData = f.read() # Create a Stream object from the byte array stream = Stream(byteData) # Load the Stream object as a PDF document pdf = PdfDocument(stream) # Get the text from the first page page = pdf.Pages.get_Item(0) textExtractor = PdfTextExtractor(page) extractOptions = PdfTextExtractOptions() extractOptions.IsExtractAllText = True text = textExtractor.ExtractText(extractOptions) # Print the text print(text)
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.