Document Operation (9)
Barcodes in PDFs can facilitate quicker data retrieval and processing. You can add barcodes to PDF files that contain detailed information such as the document's unique identifier, version number, creator, or even the entire document content. When scanned, all information is decoded immediately. This instant access is invaluable for businesses dealing with large volumes of documents, as it minimizes the time and effort required for manual searching and data entry. In this article, you will learn how to add barcodes to PDF in Python using Spire.PDF for Python and Spire.Barcode for Python.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and Spire.Barcode for Python. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF pip install Spire.Barcode
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Add Barcodes to PDF in Python
Spire.PDF for Python support several 1D barcode types represented by different classes, such as PdfCodabarBarcode, PdfCode11Barcode, PdfCode32Barcode, PdfCode39Barcode, PdfCode93Barcode.
Each class provides corresponding properties for setting the barcode text, size, color, etc. The following are the steps to draw the common Codabar, Code39 and Code93 barcodes at the specified locations on a PDF page.
- Create a PdfDocument object.
- Add a PDF page using PdfDocument.Pages.Add() method.
- Create a PdfTextWidget object and draw text on the page using PdfTextWidget.Draw() method.
- Create PdfCodabarBarcode, PdfCode39Barcode, PdfCode93Barcode objects.
- Set the gap between the barcode and the displayed text through the BarcodeToTextGapHeight property of the corresponding classes.
- Sets the barcode text display location through the TextDisplayLocation property of the corresponding classes.
- Set the barcode text color through the TextColor property of the corresponding classes.
- Draw the barcodes at specified locations on the PDF page using the Draw(page: PdfPageBase, location: PointF) method of the corresponding classes.
- Save the result PDF file using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create a PDF document pdf = PdfDocument() # Add a page page = pdf.Pages.Add(PdfPageSize.A4()) # Initialize y-coordinate y = 20.0 # Create a true type font font = PdfTrueTypeFont("Arial", 12.0, PdfFontStyle.Bold, True) # Draw text on the page text = PdfTextWidget() text.Font = font text.Text = "Codabar:" result = text.Draw(page, 0.0, y) page = result.Page y = result.Bounds.Bottom + 2 # Draw Codabar barcode on the page Codabar = PdfCodabarBarcode("00:12-3456/7890") Codabar.BarcodeToTextGapHeight = 1.0 Codabar.EnableCheckDigit = True Codabar.ShowCheckDigit = True Codabar.TextDisplayLocation = TextLocation.Bottom Codabar.TextColor = PdfRGBColor(Color.get_Blue()) Codabar.Draw(page, PointF(0.0, y)) y = Codabar.Bounds.Bottom + 6 # Draw text on the page text.Text = "Code39:" result = text.Draw(page, 0.0, y) page = result.Page y = result.Bounds.Bottom + 2 # Draw Code39 barcode on the page Code39 = PdfCode39Barcode("16-273849") Code39.BarcodeToTextGapHeight = 1.0 Code39.TextDisplayLocation = TextLocation.Bottom Code39.TextColor = PdfRGBColor(Color.get_Blue()) Code39.Draw(page, PointF(0.0, y)) y = Code39.Bounds.Bottom + 6 # Draw text on the page text.Text = "Code93:" result = text.Draw(page, 0.0, y) page = result.Page y = result.Bounds.Bottom + 2 # Draw Code93 barcode on the page Code93 = PdfCode93Barcode("16-273849") Code93.BarcodeToTextGapHeight = 1.0 Code93.TextDisplayLocation = TextLocation.Bottom Code93.TextColor = PdfRGBColor(Color.get_Blue()) Code93.QuietZone.Bottom = 5.0 Code93.Draw(page, PointF(0.0, y)) # Save the document pdf.SaveToFile("AddBarcodes.pdf") pdf.Close()
Add QR Codes to PDF in Python
To add 2D barcodes to a PDF file, the Spire.Barcode for Python library is required to generate QR code first, and then you can add the QR code image to the PDF file with the Spire.PDF for Python library. The following are the detailed steps.
- Create a PdfDocument object.
- Add a PDF page using PdfDocument.Pages.Add() method.
- Create a BarcodeSettings object.
- Call the corresponding properties of the BarcodeSettings class to set the barcode type, data, error correction level and width, etc.
- Create a BarCodeGenerator object based on the settings.
- Generate QR code image using BarCodeGenerator.GenerateImage() method.
- Save the QR code image to a PNG file.
- Draw the QR code image at a specified location on the PDF page using PdfPageBase.Canvas.DrawImage() method.
- Save the result PDF file using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * from spire.barcode import * # Create a PdfDocument instance pdf = PdfDocument() # Add a page page = pdf.Pages.Add() # Create a BarcodeSettings object settings = BarcodeSettings() # Set the barcode type to QR code settings.Type = BarCodeType.QRCode # Set the data of the QR code settings.Data = "E-iceblue" settings.Data2D = "E-iceblue" # Set the width of the QR code settings.X = 2 # Set the error correction level of the QR code settings.QRCodeECL = QRCodeECL.M # Set to show QR code text at the bottom settings.ShowTextOnBottom = True # Generate QR code image based on the settings barCodeGenerator = BarCodeGenerator(settings) QRimage = barCodeGenerator.GenerateImage() # Save the QR code image to a .png file with open("QRCode.png", "wb") as file: file.write(QRimage) # Initialize y-coordinate y = 20.0 # Create a true type font font = PdfTrueTypeFont("Arial", 12.0, PdfFontStyle.Bold, True) # Draw text on the PDF page text = PdfTextWidget() text.Font = font text.Text = "QRCode:" result = text.Draw(page, 0.0, y) page = result.Page y = result.Bounds.Bottom + 2 # Draw QR code image on the PDF page pdfImage = PdfImage.FromFile("QRCode.png") page.Canvas.DrawImage(pdfImage, 0.0, y) # Save the document pdf.SaveToFile("PdfQRCode.pdf") pdf.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Comparing PDF documents is a common task when collaborating on projects or tracking changes. This allows users to quickly review and understand what has been modified, added, or removed between revisions. Effective PDF comparison streamlines the review process and ensures all stakeholders are aligned on the latest document content.
In this article, you will learn how to compare two PDF documents using Python and the Spire.PDF for Python library.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Compare Two PDF Documents in Python
Spire.PDF for Python provides the PdfComparer.Compare() method allowing developers to compare two PDF documents and save the comparison result to another PDF document. Here are the detailed steps.
- Load the first PDF document while initializing the PdfDocument object.
- Load the second PDF document while initializing another PdfDocument object.
- Initialize an instance of PdfComparer class, passing the two PdfDocument objects are the parameter.
- Call Compare() method of the PdfComparer object to compare the two PDF documents and save the result to a different PDF document.
- Python
from spire.pdf.common import * from spire.pdf import * # Load the first document doc_one = PdfDocument("C:\\Users\\Administrator\\Desktop\\PDF_ONE.pdf") # Load the section document doc_two = PdfDocument("C:\\Users\\Administrator\\Desktop\\PDF_TWO.pdf") # Create a PdfComparer object comparer = PdfComparer(doc_two, doc_one) # Compare two documents and save the comparison result in a pdf document comparer.Compare("output/CompareResult.pdf") # Dispose resources doc_one.Dispose() doc_two.Dispose()
Compare Selected Pages in PDF Documents in Python
Instead of comparing two entire documents, you can specify the pages to compare using the PdfComparer.PdfCompareOptions.SetPageRanges() method. The following are the detailed steps.
- Load the first PDF document while initializing the PdfDocument object.
- Load the second PDF document while initializing another PdfDocument object.
- Initialize an instance of PdfComparer class, passing the two PdfDocument objects are the parameter.
- Specify the page range to compare using PdfComparer.PdfCompareOptions.SetPageRanges() method
- Call PdfComparer.Compare() method to compare the selected pages and save the result to a different PDF document.
- Python
from spire.pdf.common import * from spire.pdf import * # Load the first document doc_one = PdfDocument("C:\\Users\\Administrator\\Desktop\\PDF_ONE.pdf") # Load the section document doc_two = PdfDocument("C:\\Users\\Administrator\\Desktop\\PDF_TWO.pdf") # Create a PdfComparer object comparer = PdfComparer(doc_two, doc_one) # Set page range for comparison comparer.PdfCompareOptions.SetPageRanges(1, 3, 1, 3) # Compare the selected pages and save the comparison result in a pdf document comparer.Compare("output/CompareResult.pdf") # Dispose resources doc_one.Dispose() doc_two.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
PDF has become the standard for sharing documents across various platforms and devices. However, large PDF files can be cumbersome to share, especially when dealing with limited storage space or slow internet connections. Compressing PDF files is an essential skill to optimize file size and ensure seamless document distribution. In this article, we will demonstrate how to compress PDF documents in Python using Spire.PDF for Python.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python. It can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Compress PDF Documents in Python
PDF files are often large when they include high-resolution images or embedded fonts. To reduce the size of PDF files, consider compressing the images and fonts contained within them. The following steps explain how to compress PDF documents by compressing fonts and images using Spire.PDF for Python:
- Create a PdfCompressor object to compress a specified PDF file at a given path.
- Get the OptimizationOptions object using PdfCompressor.OptimizationOptions property.
- Enable font compression using OptimizationOptions.SetIsCompressFonts(True) method.
- Set image quality using OptimizationOptions.SetImageQuality(imageQuality:ImageQuality) method.
- Enable image resizing using OptimizationOptions.SetResizeImages(True) method.
- Enable image compression using OptimizationOptions.SetIsCompressImage(True) method.
- Compress the PDF file and save the compressed version to a new file using PdfCompressor.CompressToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create a PdfCompressor object to compress the specified PDF file at the given path compressor = PdfCompressor("C:/Users/Administrator/Documents/Example.pdf") # Get the OptimizationOptions object compression_options = compressor.OptimizationOptions # Enable font compression compression_options.SetIsCompressFonts(True) # Enable font unembedding # compression_options.SetIsUnembedFonts(True) # Set image quality compression_options.SetImageQuality(ImageQuality.Medium) # Enable image resizing compression_options.SetResizeImages(True) # Enable image compression compression_options.SetIsCompressImage(True) # Compress the PDF file and save the compressed version to a new file compressor.CompressToFile("Compressed.pdf")
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Sometimes, when dealing with PDF documents, there is a need to split a page into different sections based on content or layout. For instance, splitting a mixed-layout page with both horizontal and vertical content into two separate parts. This type of splitting is not commonly available in basic PDF management functions but can be important for academic papers, magazine ads, or mixed-layout designs. This article explains how to use Spire.PDF for Python to perform horizontal or vertical PDF page splitting.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Split PDF Page Horizontally or Vertically with Python
Spire.PDF for Python not only supports splitting a PDF document into multiple PDF documents, but also allows splitting a specific page within a PDF into two or more pages. Here are the detailed steps to split a page:
- Create an instance of the PdfDocument class.
- Load the source PDF document using the PdfDocument.LoadFromFile() method.
- Retrieve the page(s) to be split using PdfDocument.Pages[].
- Create a new PDF document and set its page margins to 0.
- Set the width or height of the new document to half of the source document.
- Add a page to the new PDF document using the PdfDocument.Pages.Add() method.
- Create a template for the source document's page using the PdfPageBase.CreateTemplate() method.
- Draw the content of the source page onto the new page using the PdfTemplate.Draw() method.
- Save the split document using the PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create a PdfDocument object pdf = PdfDocument() # Load the PDF document pdf.LoadFromFile("Terms of service.pdf") # Get the first page page = pdf.Pages[0] # Create a new PDF document and remove the page margins newpdf = PdfDocument() newpdf.PageSettings.Margins.All=0 # Horizontal splitting: Set the width of the new document's page to be the same as the width of the first page of the original document, and the height to half of the first page's height newpdf.PageSettings.Width=page.Size.Width newpdf.PageSettings.Height=page.Size.Height/2 ''' # Vertical splitting: Set the width of the new document's page to be half of the width of the first page of the original document, and the height to the same as the first page's height newpdf.PageSettings.Width=page.Size.Width/2 newpdf.PageSettings.Height=page.Size.Height ''' # Add a new page to the new PDF document newPage = newpdf.Pages.Add() # Set the text layout format format = PdfTextLayout() format.Break=PdfLayoutBreakType.FitPage format.Layout=PdfLayoutType.Paginate # Create a template based on the first page of the original document and draw it onto the new page of the new document, automatically paginating when the page is filled page.CreateTemplate().Draw(newPage, PointF(0.0, 0.0), format) # Save the document newpdf.SaveToFile("HorizontalSplitting.pdf") # Close the objects newpdf.Close() pdf.Close()
The result of horizontal splitting is as follows:
The result of vertical splitting is as follows:
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
A PDF portfolio is a collection of files assembled into a single PDF document. It serves as a comprehensive and interactive showcase of various types of content, such as documents, images, presentations, videos, and more. Unlike a traditional PDF document, a PDF portfolio allows you to present multiple files in a cohesive and organized manner, providing a seamless browsing experience for the viewer. In this article, we will demonstrate how to create a PDF portfolio and how to identify if a PDF is a portfolio in Python using Spire.PDF for Python.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Create a PDF Portfolio with Python
Spire.PDF for Python allows you to generate a PDF portfolio by adding files to a PDF using the PdfDocument.Collection.AddFile() method. Furthermore, you can organize the files within the PDF portfolio by adding folders using the PdfDocument.Collection.Folders.CreateSubfolder() method. The detailed steps are as follows.
- Specify the output file path and the folders where the files to be included in the PDF portfolio are located.
- Create a PdfDocument object.
- Iterate through the files in the first folder and add them to the PDF portfolio using the PdfDocument.Collection.AddFile() method.
- Iterate through the files in the second folder. For each file, create a separate folder within the PDF portfolio using the PdfDocument.Collection.Folders.CreateSubfolder() method, and then add the file to the corresponding folder using the PdfFolder.AddFile() method.
- Save the resulting PDF portfolio using the PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * import glob # Specify the folders where the files to be included in the PDF portfolio are located input_folder1 = "Folder1/*" input_folder2 = "Folder2/*" # Specify the output file path output_file = "CreatePDFPortfolio.pdf" # Create a PdfDocument object doc = PdfDocument() # Get the list of file paths in the first folder files1 = glob.glob(input_folder1) # Loop through the files in the list for i, file in enumerate(files1): # Add each file to the PDF portfolio doc.Collection.AddFile(file) # Get the list of file paths in the second folder files2 = glob.glob(input_folder2) # Loop through the files in the list for j, file in enumerate(files2): # Create a separate folder for each file folder = doc.Collection.Folders.CreateSubfolder(f"SubFolder{j + 1}") # Add the file to the folder folder.AddFile(file) # Save the resulting PDF portfolio to the specified file path doc.SaveToFile(output_file) # Close the PdfDocument object doc.Close()
Identify if a PDF is a Portfolio with Python
You can use the PdfDocument.IsPortfolio property to easily identify whether a PDF document is a portfolio or not. The detailed steps are as follows.
- Specify the input and output file paths.
- Create a PdfDocument object.
- Load a PDF document using the PdfDocument.LoadFromFile() method.
- Identify whether the document is a portfolio or not using the PdfDocument.IsPortfolio property.
- Save the result to a text file.
- Python
from spire.pdf.common import * from spire.pdf import * # Specify the input and output file paths input_file = "CreatePDFPortfolio.pdf" output_file = "IsPDFPortfolio.txt" # Create a PdfDocument object doc = PdfDocument() # Load a PDF document doc.LoadFromFile(input_file) # Identify whether the document is a portfolio or not if doc.IsPortfolio: st = "The document is a portfolio" else: st = "The document is not a portfolio" # Save the result to a text file with open(output_file, "w") as text_file: text_file.write(st) # Close the PdfDocument object doc.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Layers in PDF are similar to layers in image editing software, where different elements of a document can be organized and managed separately. Each layer can contain different content, such as text, images, graphics, or annotations, and can be shown or hidden independently. PDF layers are often used to control the visibility and positioning of specific elements within a document, making it easier to manage complex layouts, create dynamic designs, or control the display of information. In this article, you will learn how to add, hide, remove layers in a PDF document in Python using Spire.PDF for Python.
- Add a Layer to PDF in Python
- Set Visibility of a Layer in PDF in Python
- Remove a Layer from PDF in Python
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Add a Layer to PDF in Python
A layer can be added to a PDF document using the Document.Layers.AddLayer() method. After the layer object is created, you can draw text, images, fields, or other elements on it to form its appearance. The detailed steps to add a layer to PDF using Spire.PDF for Java are as follows.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Create a layer using Document.Layers.AddLayer() method.
- Get a specific page through PdfDocument.Pages[index] property.
- Create a canvas for the layer based on the page using PdfLayer.CreateGraphics() method.
- Draw text on the canvas using PdfCanvas.DrawString() method.
- Save the document to a different PDF file using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * def AddLayerWatermark(doc): # Create a layer named "Watermark" layer = doc.Layers.AddLayer("Watermark") # Create a font font = PdfTrueTypeFont("Bodoni MT Black", 50.0, 1, True) # Specify watermark text watermarkText = "DO NOT COPY" # Get text size fontSize = font.MeasureString(watermarkText) # Get page count pageCount = doc.Pages.Count # Loop through the pages for i in range(0, pageCount): # Get a specific page page = doc.Pages[i] # Create canvas for layer canvas = layer.CreateGraphics(page.Canvas) # Draw sting on the graphics canvas.DrawString(watermarkText, font, PdfBrushes.get_Gray(), (canvas.Size.Width - fontSize.Width)/2, (canvas.Size.Height - fontSize.Height)/2 ) # Create a PdfDocument instance doc = PdfDocument() # Load a PDF file doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.pdf") # Invoke AddLayerWatermark method to add a layer AddLayerWatermark(doc) # Save to file doc.SaveToFile("output/AddLayer.pdf", FileFormat.PDF) doc.Close()
Set Visibility of a Layer in PDF in Python
To control the visibility of layers in a PDF document, you can use the PdfDocument.Layers[index].Visibility property. Set it to off to hide a layer, or set it to on to unhide a layer. The detailed steps are as follows.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Set the visibility of a certain layer through Document.Layers[index].Visibility property.
- Save the document to a different PDF file using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create a PdfDocument instance doc = PdfDocument() # Load a PDF file doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Layer.pdf") # Hide a layer by setting the visibility to off doc.Layers[0].Visibility = PdfVisibility.Off # Save to file doc.SaveToFile("output/HideLayer.pdf", FileFormat.PDF) doc.Close()
Remove a Layer from PDF in Python
If a layer is no more wanted, you can remove it using the PdfDocument.Layers.RmoveLayer() method. The following are the detailed steps.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Get a specific layer through PdfDocument.Layers[index] property.
- Remove the layer from the document using PdfDcument.Layers.RemoveLayer(PdfLayer.Name) method.
- Save the document to a different PDF file using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create a PdfDocument instance doc = PdfDocument() # Load a PDF file doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Layer.pdf") # Delete the specific layer doc.Layers.RemoveLayer(doc.Layers[0].Name) # Save to file doc.SaveToFile("output/RemoveLayer.pdf", FileFormat.PDF) doc.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
PDF properties refer to the information embedded within the document that provides detailed information about the documents, such as author, creation date, last modification date, etc. Users can check the properties of a PDF document in PDF viewers to quickly grasp the key information of the document. Apart from the built-in properties, PDF documents also offer the feature of customizing properties to help provide additional information about the document. Understanding how to specify and access this document information facilitates the creation of user-friendly documents and the processing of documents in large quantities. In this article, we will explore how to set and retrieve PDF properties through Python programs using Spire.PDF for Python.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Set PDF Properties with Python
Spire.PDF for Python provides several properties under the PdfDocumentInformation class for setting built-in document properties, such as author, subject, keywords. Besides, it also provides the PdfDocumentInformation.SetCustomProperty() method to set custom properties. The following are the detailed steps to set PDF properties:
- Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
- Get the properties of the document through PdfDocument.DocumentInformation property.
- Set the built-in properties through properties under PdfDocumentInformation class.
- Set custom properties using PdfDocumentInformation.SetCustomProperty() method.
- Save the document using PdfDocument.SaveToFile() method.
- Python
from spire.pdf import * from spire.pdf.common import * # Create an object of PdfDocument class and load a PDF document pdf = PdfDocument() pdf.LoadFromFile("Sample.pdf") # Get the properties of the document properties = pdf.DocumentInformation # Set built-in properties properties.Author = "Tim Taylor" properties.Creator = "Spire.PDF" properties.Keywords = "cloud service; digital business" properties.Subject = "The introduction of cloud service and its advantages" properties.Title = "The Power of Cloud Services: Empowering Businesses in the Digital Age" properties.Producer = "Spire.PDF for Python" # Set custom properties properties.SetCustomProperty("Company", "E-iceblue") properties.SetCustomProperty("Tags", "Cloud; Business; Server") # Save the document pdf.SaveToFile("output/SetPDFProperties.pdf") pdf.Close()
Retrieve PDF Properties with Python
Information in built-in PDF properties can be obtained using the properties under the PdfDocumentInformation class, while that in custom PDF properties can be obtained using PdfDocumentInformation.GetCustomProperty() method. The detailed steps are as follows:
- Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
- Get the properties of the document through PdfDocument.DocumentInformation property.
- Retrieve the built-in properties through properties under PdfDocumentInformation class and custom properties using PdfDocumentInformation.GetCustomProperty() method and print them.
- Python
from spire.pdf import * from spire.pdf.common import * # Create an object of PdfDocument class and load a PDF document pdf = PdfDocument() pdf.LoadFromFile("output\SetPDFProperties.pdf") # Get the properties of the document properties = pdf.DocumentInformation # Create a StringBuilder object information = "" # Retrieve the built-in properties information += "Author: " + properties.Author information += "\nTitle: " + properties.Title information += "\nSubject: " + properties.Subject information += "\nKeywords: " + properties.Keywords information += "\nCreator: " + properties.Creator information += "\nProducer: " + properties.Producer # Retrieve the custom properties information += "\nCompany: " + properties.GetCustomProperty("Company") information += "\nTags: " + properties.GetCustomProperty("Tags") # Print the document properties print(information) pdf.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Large PDF files can sometimes be cumbersome to handle, especially when sharing or uploading them. Splitting a large PDF file into multiple smaller PDFs reduces the file size, making it more manageable and quicker to open and process. In this article, we will demonstrate how to split PDF documents in Python using Spire.PDF for Python.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Split a PDF File into Multiple Single-Page PDFs in Python
Spire.PDF for Python offers the PdfDocument.Split() method to divide a multi-page PDF document into multiple single-page PDF files. The following are the detailed steps.
- Create a PdfDocument object.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Split the document into multiple single-page PDFs using PdfDocument.Split() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create a PdfDocument object doc = PdfDocument() # Load a PDF file doc.LoadFromFile("Sample.pdf") # Split the PDF file into multiple single-page PDFs doc.Split("Output/SplitDocument-{0}.pdf", 1) # Close the PdfDocument object doc.Close()
Split a PDF File by Page Ranges in Python
To split a PDF file into two or more PDF files by page ranges, you need to create two or more new PDF files, and then import the specific page or range of pages from the source PDF into the newly created PDF files. The following are the detailed steps.
- Create a PdfDocument object.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Create three PdfDocument objects.
- Import the first page from the source file into the first document using PdfDocument.InsertPage() method.
- Import pages 2-4 from the source file into the second document using PdfDocument.InsertPageRange() method.
- Import the remaining pages from the source file into the third document using PdfDocument.InsertPageRange() method.
- Save the three documents using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create a PdfDocument object doc = PdfDocument() # Load a PDF file doc.LoadFromFile("Sample.pdf") # Create three PdfDocument objects newDoc_1 = PdfDocument() newDoc_2 = PdfDocument() newDoc_3 = PdfDocument() # Insert the first page of the source file into the first document newDoc_1.InsertPage(doc, 0) # Insert pages 2-4 of the source file into the second document newDoc_2.InsertPageRange(doc, 1, 3) # Insert the rest pages of the source file into the third document newDoc_3.InsertPageRange(doc, 4, doc.Pages.Count - 1) # Save the three documents newDoc_1.SaveToFile("Output1/Split-1.pdf") newDoc_2.SaveToFile("Output1/Split-2.pdf") newDoc_3.SaveToFile("Output1/Split-3.pdf") # Close the PdfDocument objects doc.Close() newDoc_1.Close() newDoc_2.Close() newDoc_3.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
A large number of scattered but content-related PDF documents will be very inefficient to deal with and difficult to manage. Merging them into a single PDF file is a good solution. By consolidating these files, not only can you streamline your document handling process, but you can also share and view them conveniently, significantly increasing efficiency. This article will show how to merge PDF files in Python programs using Spire.PDF for Python.
- Merge PDF Documents in Python
- Merge PDF Documents by Cloning Pages
- Merge Selected Pages of PDF Documents
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Merge PDF Documents in Python
Spire.PDF for Python provides the PdfDocument.MergeFiles() method to easily merge multiple PDF documents into one PDF document. The detailed steps are as follows:
- Create a list of the PDF file paths.
- Merge the PDF files using Document.MergeFiles(inputFiles: List[str]) method.
- Save the merged document using PdfDocumentBase.Save(filename: str, FileFormat.PDF) method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create a list of the PDF file paths inputFile1 = "Sample1.pdf" inputFile2 = "Sample2.pdf" inputFile3 = "Sample3.pdf" files = [inputFile1, inputFile2, inputFile3] # Merge the PDF documents pdf = PdfDocument.MergeFiles(files) # Save the result document pdf.Save("output/MergePDF.pdf", FileFormat.PDF) pdf.Close()
Merge PDF Documents by Cloning Pages
Merging PDF files can also be done by cloning pages from PDF files to a new PDF file using PdfDocument.AppendPage(PdfDocument) method. The detailed steps are as follows:
- Create a list of the PDF file paths.
- Load each PDF document as an object of PdfDocument class and add them to a list.
- Create an object of PdfDocument class to create a new PDF document.
- Loop through each loaded PDF document and insert their pages to the new PDF document using PdfDocument.appendPage() method.
- Save the new PDF document using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create a list of the PDF file paths file1 = "Sample1.pdf" file2 = "Sample2.pdf" file3 = "Sample3.pdf" files = [file1, file2, file3] # Load each PDF file as an PdfDocument object and add them to a list pdfs = [] for file in files: pdfs.append(PdfDocument(file)) # Create an object of PdfDocument class newPdf = PdfDocument() # Insert the pages of the loaded PDF documents into the new PDF document for pdf in pdfs: newPdf.AppendPage(pdf) # Save the new PDF document newPdf.SaveToFile("output/ClonePage.pdf")
Merge Selected Pages of PDF Documents
Spire.PDF for Python also allows users to select and insert pages from one PDF document to another using PdfDocument.InsertPage() method and PdfDocument.InsertPageRange() method, enabling the merging of specified PDF pages. The detailed steps are as follows:
- Create a list of the PDF file paths.
- Load each PDF document as an object of PdfDocument class and add them to a list.
- Create an object of PdfDocument class to create a new PDF document.
- Insert the selected pages of the loaded documents into the new PDF document using PdfDocument.InsertPage(PdfDocument, pageIndex: int) method and PdfDocument.InsertPageRange(PdfDocument, startIndex: int, endIndex: int) method.
- Save the new PDF document using PdfDocument.SaveToFile() method.
- Python
from spire.pdf import * from spire.pdf.common import * # Create a list of the PDF file paths file1 = "Sample1.pdf" file2 = "Sample2.pdf" file3 = "Sample3.pdf" files = [file1, file2, file3] # Load each PDF file as an PdfDocument object and add them to a list pdfs = [] for file in files: pdfs.append(PdfDocument(file)) # Create an object of PdfDocument class newPdf = PdfDocument() # Insert the selected pages from the loaded PDF documents into the new document newPdf.InsertPage(pdfs[0], 0) newPdf.InsertPage(pdfs[1], 1) newPdf.InsertPageRange(pdfs[2], 0, 1) # Save the new PDF document newPdf.SaveToFile("output/SelectedPages.pdf")
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.