Text

Text (4)

PDFs often use a variety of fonts and there are situations where you may need to get or replace these fonts. For instance, getting fonts allows you to inspect details such as font name, size, type, and style, which is especially useful for maintaining design consistency or adhering to specific standards. On the other hand, replacing fonts can help address compatibility issues, particularly when the original fonts are not supported on certain devices or software. In this article, we will explain how to get and replace the used fonts in PDF in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Get Used Fonts in PDF in Python

Spire.PDF for Python provides the PdfDocument.UsedFonts property to retrieve a list of all fonts used in a PDF. By iterating through this list, you can easily access detailed font information such as the font name, size, type and style using the PdfUsedFont.Name, PdfUsedFont.Size, PdfUsedFont.Type and PdfUsedFont.Style properties. The detailed steps are as follows.

  • Create an object of the PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile() method.
  • Get the list of fonts used in this document using the PdfDocument.UsedFonts property.
  • Create a text file to save the extracted font information.
  • Iterate through the font list.
  • Get the information of each font, such as font name, size, type and style using the PdfUsedFont.Name, PdfUsedFont.Size, PdfUsedFont.Type and PdfUsedFont.Style properties, and save it to the text file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
pdf = PdfDocument()
# Load a PDF document
pdf.LoadFromFile("Input1.pdf")

# Get the list of fonts used in this document 
usedFonts = pdf.UsedFonts

# Create a text file to save the extracted font information
with open("font_info.txt", "w") as file:
    # Iterate through the font list
    for font in usedFonts:
        # Get the information of each font, such as font name, size, type and style
        font_info = f"Name: {font.Name}, Size: {font.Size}, Type: {font.Type}, Style: {font.Style}\n"
        file.write(font_info)

pdf.Close()

Python: Get or Replace Used Fonts in PDF

Replace Used Fonts in PDF in Python

You can replace the fonts used in a PDF with the desired font using the PdfUsedFont.Replace() method. The detailed steps are as follows.

  • Create an object of the PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile() method.
  • Get the list of fonts used in this document using the PdfDocument.UsedFonts property.
  • Create a new font using the PdfTrueTypeFont class.
  • Iterate through the font list.
  • Replace each used font with the new font using the PdfUsedFont.Replace() method.
  • Save the resulting document to a new PDF using the PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
pdf = PdfDocument()
# Load a PDF document
pdf.LoadFromFile("Input2.pdf")

# Get the list of fonts used in this document 
usedFonts = pdf.UsedFonts

# Create a new font 
newFont = PdfTrueTypeFont("Arial", 13.0, PdfFontStyle.Italic ,True)

# Iterate through the font list
for font in usedFonts:
    # Replace each font with the new font
    font.Replace(newFont)

# Save the resulting document to a new PDF
pdf.SaveToFile("ReplaceFonts.pdf")
pdf.Close()

Python: Get or Replace Used Fonts in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Using Python to manipulate text formatting in PDFs provides a powerful way to automate and customize documents. With the Spire.PDF for Python library, developers can efficiently find text with advanced search options to retrieve and modify text properties like font, size, color, and style, enabling users to find and update text formatting across large document sets, saving time and reducing manual work. This article will demonstrate how to use Spire.PDF for Python to retrieve and modify text formatting in PDF documents with Python code.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to: How to Install Spire.PDF for Python on Windows

Find Text and Retrieve Formatting Information in PDFs

Developers can use the PdfTextFinder and PdfTextFindOptions classes provided by Spire.PDF for Python to precisely search for specific text in a PDF document and obtain a collection of PdfTextFragment objects representing the search results. Then, developers can access the format information of the specified search result text through properties such as FontName, FontSize, and FontFamily, under PdfTextFragment.TextStates[] property.

The detailed steps for finding text in PDF and retrieving its font information are as follows:

  • Create an instance of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get a page using PdfDocument.Pages.get_Item() method.
  • Create a PdfTextFinder object using the page.
  • Create a PdfTextFindOptions object, set the search options, and apply the search options through PdfTextFinder.Options property.
  • Find specific text on the page using PdfTextFinder.Find() method and get a collection of PdfTextFragment objects.
  • Get the formatting of the first finding result through PdfTextFragment.TextStates property.
  • Get the font name, font size, and font family of the result through PdfTextStates[0].FontName, PdfTextStates[0].FontSize, and PdfTextStates[0].FontFamily properties.
  • Print the result.
  • Python
from spire.pdf import *

# Create a PdfDocument instance
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Get the first page
page = pdf.Pages.get_Item(0)

# Create a PdfTextFinder instance
finder = PdfTextFinder(page)

# Create a PdfTextFindOptions instance and set the search options
options = PdfTextFindOptions()
options.CaseSensitive = True
options.WholeWords = True

# Apply the options
finder.Options = options

# Find the specified text
fragments = finder.Find("History and Cultural Significance:")

# Get the formatting of the first fragment
formatting = fragments[0].TextStates

# Get the formatting information
fontInfo = ""
fontInfo += "Text: " + fragments[0].Text
fontInfo += "Font: " + formatting[0].FontName
fontInfo += "\nFont Size: " + str(formatting[0].FontSize)
fontInfo += "\nFont Family: " + formatting[0].FontFamily

# Output font information
print(fontInfo)

# Release resources
pdf.Dispose()

Python: Retrieve and Modify Text Formatting in PDF

Find and Modify Text Formatting in PDF Documents

After finding specific text, developers can overlay it with a rectangle in the same color as the background and then redraw the text in a new format at the same position, thus achieving text format modification of simple PDF text fragments on solid color pages. The detailed steps are as follows:

  • Create an instance of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get a page using PdfDocument.Pages.get_Item() method.
  • Create a PdfTextFinder object using the page.
  • Create a PdfTextFindOptions object, set the search options, and apply the search options through PdfTextFinder.Options property.
  • Find specific text on the page using PdfTextFinder.Find() method and get the first result.
  • Get the color of the page background through PdfPageBase.BackgroundColor property and change the color to white if the background is empty.
  • Draw rectangles with the obtained color in the position of the found text using PdfPageBase.Canvas.DrawRectangle() method.
  • Create a new font, brush, and string format and calculate the text frame.
  • Draw the text in the new format in the same position using PdfPageBase.Canvas.DrawString() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *

# Create a PdfDocument instance
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Get the first page
page = pdf.Pages.get_Item(0)

# Create a PdfTextFinder instance
finder = PdfTextFinder(page)

# Create a PdfTextFindOptions instance and set the search options
options = PdfTextFindOptions()
options.CaseSensitive = True
options.WholeWords = True
finder.Options = options

# Find the specified text
fragments = finder.Find("History and Cultural Significance:")
# Get the first result
fragment = fragments[0]

# Get the background color and change it to white if its empty
backColor = page.BackgroundColor
if backColor.ToArgb() == 0:
    backColor = Color.get_White()
# Draw a rectangle with the background color to cover the text
for i in range(len(fragment.Bounds)):
    page.Canvas.DrawRectangle(PdfSolidBrush(PdfRGBColor(backColor)), fragment.Bounds[i])

# Create a new font and a new brush
font = PdfTrueTypeFont("Times New Roman", 16.0, 3, True)
brush = PdfBrushes.get_Brown()
# Create a PdfStringFormat instance
stringFormat = PdfStringFormat()
stringFormat.Alignment = PdfTextAlignment.Left
# Calculate the rectangle that contains the text
point = fragment.Bounds[0].Location
size = SizeF(fragment.Bounds[-1].Right, fragment.Bounds[-1].Bottom)
rect = RectangleF(point, size)

# Draw the text with the specified format in the same rectangle
page.Canvas.DrawString("History and Cultural Significance", font, brush, rect, stringFormat)

# Save the document
pdf.SaveToFile("output/FindModifyTextFormat.pdf")

# Release resources
pdf.Close()

Python: Retrieve and Modify Text Formatting in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Finding and replacing text is a common need in document editing, as it helps users correct minor errors or make adjustments to terms appearing in the document. Although PDF documents have a fixed layout and editing can be challenging, users can still perform small modifications such as replacing text with Python, and achieve a satisfactory editing result. In this article, we will explore how to utilize Spire.PDF for Python to find and replace text in PDF documents within a Python program.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.PDF

If you are unsure how to install, please refer to: How to Install Spire.PDF for Python on Windows

Find Text and Replace the First Match in PDF with Python

Spire.PDF for Python enables users to find text and replace the first match in PDF documents with the PdfTextReplacer.ReplaceText(string originalText, string newText) method. This replacement method is great for making simple replacements for words or phrases that only appear once on a single page of a document.

The detailed steps for finding text and replacing the first match are as follows:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get a page of the document using PdfDocument.Pages.get_Item() method.
  • Create an object of PdfTextReplacer class based on the page.
  • Find specific text and replace the first match on the page using PdfTextReplacer.ReplaceText() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create an object of PdfDocument
pdf = PdfDocument()

# Load a PDF document
pdf.LoadFromFile("Sample.pdf")

# Get a page
page = pdf.Pages.get_Item(0)

# Create an object of PdfTextReplacer class
replacer = PdfTextReplacer(page)

# Find and replace the first matched text
replacer.ReplaceText("compressing", "comparing")

# Save the document
pdf.SaveToFile("output/ReplaceFirstMatch.pdf")
pdf.Close()

Python: Find and Replace Text in PDF

Find Text and Replace All Matches in PDF with Python

Spire.PDF for Python also provides the PdfTextReplacer.ReplaceAllText(string originalText, string newText, Color textColor) method to find specific text and replace all matches with new text (optionally resetting the text color). The detailed steps are as follows:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Loop through the pages in the document.
  • Get a page using PdfDocument.Pages.get_Item() method.
  • Create an object of PdfTextReplacer class based on the page.
  • Find specific text and replace all the matches with new text in a new color using PdfTextReplacer.ReplaceAllText() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create an object of PdfDocument
pdf = PdfDocument()

# Load a PDF document
pdf.LoadFromFile("Sample.pdf")

# Loop through the pages in the document
for i in range(pdf.Pages.Count):
    # Get a page
    page = pdf.Pages.get_Item(0)
    # Create an object of PdfTextReplacer class based on the page
    replacer = PdfTextReplacer(page)
    # Find and replace all matched text with a new color
    replacer.ReplaceAllText("PYTHON", "Python", Color.get_Red())

# Save the document
pdf.SaveToFile("output/ReplaceAllMatches.pdf")
pdf.Close()

Python: Find and Replace Text in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Highlighting important text with vibrant colors is a commonly employed method for navigating and emphasizing content in PDF documents. Particularly in lengthy PDFs, emphasizing key information aids readers swiftly comprehending the document content, thereby enhancing reading efficiency. Utilizing Python programs enables document creators to effortlessly and expeditiously execute the highlighting process. This article will explain how to use Spire.PDF for Python to find and highlight text in PDF documents with Python programs.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Find and Highlight Specific Text in PDF with Python

Spire.PDF for Python enables developers to find all occurrences of specific text on a page with PdfPageBase.FindText() method and apply highlight color to an occurrence with ApplyHighLight() method. Below is an example of using Spire.PDF for Python to highlight all occurrence of specific text:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Loop through the pages in the document.
  • Get a page using PdfDocument.Pages.get_Item() method.
  • Find all occurrences of specific text on the page using PdfPageBase.FindText() method.
  • Loop through the occurrences and apply a highlight color to each occurrence using ApplyHighLight() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *
from spire.pdf.common import*

# Create an object of PdfDocument class and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Loop through the pages in the PDF document
for i in range(pdf.Pages.Count):
    # Get a page
    page = pdf.Pages.get_Item(i)
    # Find all occurrences of specific text on the page
    result = page.FindText("cloud server", TextFindParameter.IgnoreCase).Finds
    # Highlight all the occurrences
    for text in result:
        text.ApplyHighLight(Color.get_Cyan())

# Save the document
pdf.SaveToFile("output/FindHighlight.pdf")
pdf.Close()

Python: Find and Highlight Text in PDF

Find and Highlight Text in a Specified PDF Page Area with Python

In addition to finding and highlighting specified text on the entire PDF page, Spire.PDF for Python also supports finding and highlighting specified text in specified areas on the page by passing a RectangleF instance as parameter to the PdfPageBase.FindText() method. The detailed steps are as follows:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get the first page of the document using PdfDocument.Pages.get_Item() method.
  • Define a rectangular area.
  • Find all occurrences of specific text in the specified rectangular area on the first page using PdfPageBase.FindText() method.
  • Loop through the occurrences and apply a highlight color to each occurrence using ApplyHighLight() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create an objetc of PdfDocument and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Get a page
pdfPageBase = pdf.Pages.get_Item(0)

# Define a rectangular area
rctg = RectangleF(0.0, 0.0, pdfPageBase.ActualSize.Width, 300.0)

# Find all the occurrences of specified text in the rectangular area
findCollection = pdfPageBase.FindText(rctg,"cloud server",TextFindParameter.IgnoreCase)

# Find text in the rectangle
for find in findCollection.Finds:
        #Highlight searched text
        find.ApplyHighLight(Color.get_Green())

# Save the document
pdf.SaveToFile("output/FindHighlightArea.pdf")
pdf.Close()

Python: Find and Highlight Text in PDF

Find and Highlight Text in PDF using Regular Expression with Python

Sometimes the text that needs to be highlighted is not exactly the same words. In this case, the use of regular expressions allows more flexibility in text search. By passing TextFindParameter.Regex as parameter to the PdfPageBase.FindText() method, we can find text using regular expression in PDF documents. The detailed steps are as follows:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Specify the regular expression.
  • Get a page using PdfDocument.Pages.get_Item() method.
  • Find matched text with the regular expression on the page using PdfPageBase.FindText() method.
  • Loop through the matched text and apply Highlight color to the text using ApplyHighLight() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *
from spire.pdf.common import*

# Create an object of PdfDocument class and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Specify the regular expression that matches two words after *
regex = "\\*(\\w+\\s+\\w+)"

# Get the second page
page = pdf.Pages.get_Item(1)

# Find matched text on the page using regular expression
result = page.FindText(regex, TextFindParameter.Regex)

# Highlight the matched text
for text in result.Finds:
    text.ApplyHighLight(Color.get_DeepPink())

# Save the document
pdf.SaveToFile("output/FindHighlightRegex.pdf")

Python: Find and Highlight Text in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

page