Displaying items by tag: enpython

Thursday, 28 December 2023 03:47

Converting Image to PDF with Python: 4 Code Examples

PDF Convertor API for Python
Steps to Convert an Image to PDF
Convert an Image to a PDF Document
Convert Multiple Images to a PDF Document
Create a PDF from Multiple Images Customizing Page Margins
Create a PDF with Several Images per Page
Conclusion
See Also

Install with Pip

pip install Spire.PDF

Convert an Image to a PDF Document in Python

This code example converts an image file to a PDF document using the Spire.PDF for Python library by creating a blank document, adding a page with the same dimensions as the image, and drawing the image onto the page.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Set the page margins to 0
doc.PageSettings.SetMargins(0.0)

# Load a particular image 
image = PdfImage.FromFile("C:\\Users\\Administrator\\Desktop\\Images\\img-1.jpg")
    
# Get the image width and height
width = image.PhysicalDimension.Width
height = image.PhysicalDimension.Height

# Add a page that has the same size as the image
page = doc.Pages.Add(SizeF(width, height))

# Draw image at (0, 0) of the page
page.Canvas.DrawImage(image, 0.0, 0.0, width, height)
      
# Save to file
doc.SaveToFile("output/ImageToPdf.pdf")
doc.Dispose()

Convert Image to PDF with Python

Convert Multiple Images to a PDF Document in python

This example illustrates how to convert a collection of images into a PDF document using Spire.PDF for Python. The following code snippet reads images from a specified folder, creates a PDF document, adds each image to a separate page in the PDF, and saves the resulting PDF file.

Python

from spire.pdf.common import *
from spire.pdf import *
import os

# Create a PdfDocument object
doc = PdfDocument()

# Set the page margins to 0
doc.PageSettings.SetMargins(0.0)

# Get the folder where the images are stored
path = "C:\\Users\\Administrator\\Desktop\\Images\\"
files = os.listdir(path)

# Iterate through the files in the folder
for root, dirs, files in os.walk(path):
    for file in files:

        # Load a particular image 
        image = PdfImage.FromFile(os.path.join(root, file))
    
        # Get the image width and height
        width = image.PhysicalDimension.Width
        height = image.PhysicalDimension.Height

        # Add a page that has the same size as the image
        page = doc.Pages.Add(SizeF(width, height))

        # Draw image at (0, 0) of the page
        page.Canvas.DrawImage(image, 0.0, 0.0, width, height)
      
# Save to file
doc.SaveToFile('output/ImagesToPdf.pdf')
doc.Dispose()

Convert Image to PDF with Python

Create a PDF from Multiple Images Customizing Page Margins in Python

This code example creates a PDF document and populates it with images from a specified folder, adjusts the page margins and saves the resulting document to a file.

Python

from spire.pdf.common import *
from spire.pdf import *
import os

# Create a PdfDocument object
doc = PdfDocument()

# Set the left, top, right, bottom page margin
doc.PageSettings.SetMargins(30.0, 30.0, 30.0, 30.0)

# Get the folder where the images are stored
path = "C:\\Users\\Administrator\\Desktop\\Images\\"
files = os.listdir(path)

# Iterate through the files in the folder
for root, dirs, files in os.walk(path):
    for file in files:

        # Load a particular image 
        image = PdfImage.FromFile(os.path.join(root, file))
    
        # Get the image width and height
        width = image.PhysicalDimension.Width
        height = image.PhysicalDimension.Height

        # Specify page size
        size = SizeF(width + doc.PageSettings.Margins.Left + doc.PageSettings.Margins.Right, height + doc.PageSettings.Margins.Top+ doc.PageSettings.Margins.Bottom)

        # Add a page with the specified size
        page = doc.Pages.Add(size)

        # Draw image on the page at (0, 0)
        page.Canvas.DrawImage(image, 0.0, 0.0, width, height)
      
# Save to file
doc.SaveToFile('output/CustomizeMargins.pdf')
doc.Dispose()

Convert Image to PDF with Python

Create a PDF with Several Images per Page in Python

This code demonstrates how to use the Spire.PDF library in Python to create a PDF document with two images per page. The images in this example are the same size, if your image size is not consistent, then you need to adjust the code to achieve a desired result.

Python

from spire.pdf.common import *
from spire.pdf import *
import os

# Create a PdfDocument object
doc = PdfDocument()

# Set the left, top, right, bottom page margins
doc.PageSettings.SetMargins(15.0, 15.0, 15.0, 15.0)

# Get the folder where the images are stored
path = "C:\\Users\\Administrator\\Desktop\\Images\\"
files = os.listdir(path)

# Iterate through the files in the folder
for root, dirs, files in os.walk(path):

    for i in range(len(files)):
        
        # Load a particular image 
        image = PdfImage.FromFile(os.path.join(root, files[i]))

        # Get the image width and height
        width = image.PhysicalDimension.Width
        height = image.PhysicalDimension.Height

        # Specify page size
        size = SizeF(width + doc.PageSettings.Margins.Left + doc.PageSettings.Margins.Right, height*2 + doc.PageSettings.Margins.Top+ doc.PageSettings.Margins.Bottom + 15.0)

        if i % 2 == 0:

            # Add a page with the specified size
            page = doc.Pages.Add(size)
            
            # Draw first image on the page at (0, 0)
            page.Canvas.DrawImage(image, 0.0, 0.0, width, height)
        else :

            # Draw second image on the page at (0, height + 15)
            page.Canvas.DrawImage(image, 0.0, height + 15.0, width, height)
      
# Save to file
doc.SaveToFile('output/SeveralImagesPerPage.pdf')
doc.Dispose()

Convert Image to PDF with Python

Conclusion

In this blog post, we explored how to use Spire.PDF for python to create PDF documents from images, containing one or more images per page. Additionally, we demonstrated how to customize the PDF page size and the margins around the images. For more tutorials, please check out our online documentation. If you have any questions, feel free to contact us by email or on the forum.

Python PDF to Text Conversion: Retrieve Text from PDFs

Python API for PDF to Text
Convert PDF to Text in Python
PDF to Text Not Keeping Layout
PDF to Text Keep Layout
PDF Page Area to Text
Get a Free License
Learn More
Conclusion
See Also

Install with Pip

pip install Spire.PDF

Python to Convert PDF to Text Without Keeping Layout

When using the simple extraction method to extract text from PDF, the program will not retain the blank areas and keep track of the current Y position of each string and insert a line break into the output if the Y position has changed.

Python

from spire.pdf import PdfDocument
from spire.pdf import PdfTextExtractOptions
from spire.pdf import PdfTextExtractor

# Create an object of PdfDocument class and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Create a string object to store the text
extracted_text = ""

# Create an object of PdfExtractor
extract_options = PdfTextExtractOptions()
# Set to use simple extraction method
extract_options.IsSimpleExtraction = True

# Loop through the pages in the document
for i in range(pdf.Pages.Count):
    # Get a page
    page = pdf.Pages.get_Item(i)
    # Create an object of PdfTextExtractor passing the page as paramter
    text_extractor = PdfTextExtractor(page)
    # Extract the text from the page
    text = text_extractor.ExtractText(extract_options)
    # Add the extracted text to the string object
    extracted_text += text

# Write the extracted text to a text file
with open("output/ExtractedText.txt", "w") as file:
    file.write(extracted_text)
pdf.Close()

Python PDF to Text Conversion: Retrieve Text from PDFs

Python to Convert PDF to Text and Keep Layout

When using the default extraction method to extract text from PDF, the program will extract text line by line including blanks.

Python

from spire.pdf import PdfDocument
from spire.pdf import PdfTextExtractOptions
from spire.pdf import PdfTextExtractor

# Create an object of PdfDocument class and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Create a string object to store the text
extracted_text = ""

# Create an object of PdfExtractor
extract_options = PdfTextExtractOptions()

# Loop through the pages in the document
for i in range(pdf.Pages.Count):
    # Get a page
    page = pdf.Pages.get_Item(i)
    # Create an object of PdfTextExtractor passing the page as paramter
    text_extractor = PdfTextExtractor(page)
    # Extract the text from the page
    text = text_extractor.ExtractText(extract_options)
    # Add the extracted text to the string object
    extracted_text += text

# Write the extracted text to a text file
with open("output/ExtractedText.txt", "w") as file:
    file.write(extracted_text)
pdf.Close()

Python PDF to Text Conversion: Retrieve Text from PDFs

Python to Convert a Specified PDF Page Area to Text

Python

from spire.pdf import PdfDocument
from spire.pdf import PdfTextExtractOptions
from spire.pdf import PdfTextExtractor
from spire.pdf import RectangleF

# Create an object of PdfDocument class and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Create an object of PdfExtractor
extract_options = PdfTextExtractOptions()

# Set to extract specific page area
extract_options.ExtractArea = RectangleF(50.0, 220.0, 700.0, 230.0)

# Get a page
page = pdf.Pages.get_Item(0)

# Create an object of PdfTextExtractor passing the page as paramter
text_extractor = PdfTextExtractor(page)

# Extract the text from the page
extracted_text = text_extractor.ExtractText(extract_options)

# Write the extracted text to a text file
with open("output/ExtractedText.txt", "w") as file:
    file.write(extracted_text)
pdf.Close()

Python PDF to Text Conversion: Retrieve Text from PDFs

Get a Free License for the API to Convert PDF to Text in Python

Users can apply for a free temporary license to try Spire.PDF for Python and evaluate the Python PDF to Text Conversion features without any limitations.

Learn More About PDF Processing with Python

Apart from converting PDF to text with Python, we can also explore more PDF processing features of this API through the following sources:

Conclusion

In this blog post, we have explored Python in PDF to text conversion. By following the operational steps and referring to the code examples in the article, we can achieve fast PDF to text conversion in Python programs. Additionally, the article provides insights into the benefits of converting PDF documents to text files. More importantly, we can gain further knowledge on handling PDF documents with Python and methods to convert image-based PDF documents to text through OCR tools from the references in the article. If any issues arise during the usage of Spire.PDF for Python, technical support can be obtained by reaching out to our team via the Spire.PDF forum or email.

Read Excel Files with Python

Python Library for Reading Excel
Classes and Properties in Spire.XLS for Python API
Read Data of a Particular Cell
Read Data from a Cell Range
Read Data from an Excel Worksheet
Read Value Rather than Formula in a Cell
Conclusion
See Also

Install with Pip

pip install Spire.XLS

Read Data of a Particular Cell in Python

With Spire.XLS for Python, you can easily get the value of a certain cell by using the CellRange.Value property. The steps to read data of a particular Excel cell in Python are as follows.

Instantiate Workbook class
Load an Excel document using LoadFromFile method.
Get a specific worksheet using Workbook.Worksheets[index] property.
Get a specific cell using Worksheet.Range property.
Get the value of the cell using CellRange.Value property

Python

from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
wb = Workbook()

# Load an Excel file
wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Data.xlsx")

# Get a specific worksheet
sheet = wb.Worksheets[0]

# Get a specific cell
certainCell = sheet.Range["D9"]

# Get the value of the cell
print("D9 has the value: " + certainCell.Value)

Read Excel Files with Python

Read Data from a Cell Range in Python

We already know how to get the value of a cell, to get the values of a range of cells, such as certain rows or columns, we just need to use loop statements to iterate through the cells, and then extract them one by one. The steps to read data from an Excel cell range in Python are as follows.

Instantiate Workbook class
Load an Excel document using LoadFromFile method.
Get a specific worksheet using Workbook.Worksheets[index] property.
Get a specific cell range using Worksheet.Range property.
Use for loop statements to retrieve each cell in the range, and get the value of a specific cell using CellRange.Value property

Python

from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
wb = Workbook()

# Load an existing Excel file
wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Data.xlsx")

# Get a specific worksheet
sheet = wb.Worksheets[0]

# Get a cell range
cellRange = sheet.Range["A2:H5"]

# Iterate through the rows
for i in range(len(cellRange.Rows)):

    # Iterate through the columns
    for j in range(len(cellRange.Rows[i].Columns)):

        # Get data of a specific cell
        print(cellRange[i + 2, j + 1].Value + "  ", end='')

    print("")

Read Excel Files with Python

Read Data from an Excel Worksheet in Python

Spire.XLS for Python offers the Worksheet.AllocatedRange property to automatically obtain the cell range that contains data from a worksheet. Then, we traverse the cells within the cell range rather than the entire worksheet, and retrieve cell values one by one. The following are the steps to read data from an Excel worksheet in Python.

Instantiate Workbook class
Load an Excel document using LoadFromFile method.
Get a specific worksheet using Workbook.Worksheets[index] property.
Get the cell range containing data from the worksheet using Worksheet.AllocatedRange property.
Use for loop statements to retrieve each cell in the range, and get the value of a specific cell using CellRange.Value property

Python

from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
wb = Workbook()

# Load an existing Excel file
wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Data.xlsx")

# Get the first worksheet
sheet = wb.Worksheets[0]

# Get the cell range containing data
locatedRange = sheet.AllocatedRange

# Iterate through the rows
for i in range(len(sheet.Rows)):

    # Iterate through the columns
    for j in range(len(locatedRange.Rows[i].Columns)):

        # Get data of a specific cell
        print(locatedRange[i + 1, j + 1].Value + "  ", end='')

    print("")

Read Excel Files with Python

Read Value Rather than Formula in a Cell in Python

As mentioned earlier, when a cell contains a formula, the CellRange.Value property returns the formula itself, not the value of the formula. If you want to get the value, you need to use the str(CellRange.FormulaValue) method. The following are the steps to read value rather than formula in an Excel cell in Python.

Instantiate Workbook class
Load an Excel document using LoadFromFile method.
Get a specific worksheet using Workbook.Worksheets[index] property.
Get a specific cell using Worksheet.Range property.
Determine if the cell has a formula using CellRange.HasFormula property.
Get the formula value of the cell using str(CellRange.FormulaValue) method

Python

from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
wb = Workbook()

# Load an Excel file
wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Formula.xlsx")

# Get a specific worksheet
sheet = wb.Worksheets[0]

# Get a specific cell
certainCell = sheet.Range["D4"]

# Determine if the cell has formula
if(certainCell.HasFormula):

    # Get the formula value of the cell
    print(str(certainCell.FormulaValue))

Read Excel Files with Python

Conclusion

In this blog post, we learned how to read data from cells, cell regions, and worksheets in Python with the help of Spire.XLS for Python API. We also discussed how to determine if a cell has a formula and how to get the value of the formula. This library supports extraction of many other elements in Excel such as images, hyperlinks and OEL objects. Check out our online documentation for more tutorials. If you have any questions, please contact us by email or on the forum.

Python: Merge Word Documents

Install Spire.Doc for Python
Merge Word Documents by Inserting Files with Python
Merge Word Documents by Cloning Contents with Python
See Also

Install with Pip

pip install Spire.Doc

Merge Word Documents by Inserting Files with Python

The method Document.insertTextFromFile() is used to insert other Word documents to the current one, and the inserted content will start from a new page. The detailed steps for merging Word documents by inserting are as follows:

Create an object of Document class and load a Word document using Document.LoadFromFile() method.
Insert the content from another document to it using Document.InsertTextFromFile() method.
Save the document using Document.SaveToFile() method.

Python

from spire.doc import *
    from spire.doc.common import *
    
    # Create an object of Document class and load a Word document
    doc = Document()
    doc.LoadFromFile("Sample1.docx")
    
    # Insert the content from another Word document to this one
    doc.InsertTextFromFile("Sample2.docx", FileFormat.Auto)
    
    # Save the document
    doc.SaveToFile("output/InsertDocuments.docx")
    doc.Close()

Python: Merge Word Documents

Merge Word Documents by Cloning Contents with Python

Merging Word documents can also be achieved by cloning contents from one Word document to another. This method maintains the formatting of the original document, and content cloned from another document continues at the end of the current document without starting a new Page. The detailed steps are as follows:

Create two objects of Document class and load two Word documents using Document.LoadFromFile() method.
Get the last section of the destination document using Document.Sections.get_Item() method.
Loop through the sections in the document to be cloned and then loop through the child objects of the sections.
Get a section child object using Section.Body.ChildObjects.get_Item() method.
Add the child object to the last section of the destination document using Section.Body.ChildObjects.Add() method.
Save the result document using Document.SaveToFile() method.

Python

from spire.doc import *
    from spire.doc.common import *
    
    # Create two objects of Document class and load two Word documents
    doc1 = Document()
    doc1.LoadFromFile("Sample1.docx")
    doc2 = Document()
    doc2.LoadFromFile("Sample2.docx")
    
    # Get the last section of the first document
    lastSection = doc1.Sections.get_Item(doc1.Sections.Count - 1)
    
    # Loop through the sections in the second document
    for i in range(doc2.Sections.Count):
        section = doc2.Sections.get_Item(i)
        # Loop through the child objects in the sections
        for j in range(section.Body.ChildObjects.Count):
            obj = section.Body.ChildObjects.get_Item(j)
            # Add the child objects from the second document to the last section of the first document
            lastSection.Body.ChildObjects.Add(obj.Clone())
    
    # Save the result document
    doc1.SaveToFile("output/MergeByCloning.docx")
    doc1.Close()
    doc2.Close()

Python: Merge Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Converting Image to PDF with Python: 4 Code Examples

Table of Contents

Install with Pip

Related Links

PDF Converter API for Python

Steps to Convert an Image to PDF in Python

Convert an Image to a PDF Document in Python

Convert Multiple Images to a PDF Document in python

Create a PDF from Multiple Images Customizing Page Margins in Python

Create a PDF with Several Images per Page in Python

Conclusion

See Also

Python PDF to Text Conversion: Retrieve Text from PDFs

Table of Contents

Install with Pip

Related Links

Python API for PDF to Text Conversion

Guide for Converting PDF to Text in Python

Python to Convert PDF to Text Without Keeping Layout

Python to Convert PDF to Text and Keep Layout

Python to Convert a Specified PDF Page Area to Text

Get a Free License for the API to Convert PDF to Text in Python

Learn More About PDF Processing with Python

Conclusion

See Also

Read Excel Files with Python

Table of Contents

Install with Pip

Related Links

Python Library for Reading Excel

Classes and Properties in Spire.XLS for Python API

Read Data of a Particular Cell in Python

Read Data from a Cell Range in Python

Read Data from an Excel Worksheet in Python

Read Value Rather than Formula in a Cell in Python

Conclusion

See Also

Python: Merge Word Documents

Table of Contents

Install with Pip

Related Links

Install Spire.Doc for Python

Merge Word Documents by Inserting Files with Python

Merge Word Documents by Cloning Contents with Python

Apply for a Temporary License

See Also