C#: Find and Highlight Text in PDF

The task of searching for specific text within a PDF document and highlighting it serves as a valuable function across various situations. Whether you aim to find critical information, make annotations on significant details, or extract specific content, the capability to locate and highlight text within a PDF significantly enhances productivity and understanding.

This article provides guidance on how to effectively find and highlight text in a PDF document in C# using Spire.PDF for .NET.

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.PDF

Find and Highlight Text in a Specific PDF Page in C#

Spire.PDF provides the PdfTextFinder class, which allows users to search for specific text within a page. By utilizing the Options property of this class, users have the ability to define search options such as WholeWord, IgnoreCase, and Regex. When utilizing the Find method of the class, users can locate all occurrences of the searched text within a page.

The following are the steps to find and highlight text in a specific PDF page in C#.

  • Create a PdfDocument object.
  • Load a PDF file from a given path.
  • Get a specific page from the document.
  • Create a PdfTextFinder object based on the page.
  • Specify search options using PdfTextFinder.Options property.
  • Find all instance of searched text using PdfTextFinder.Find() method.
  • Iterate through the find results, and highlight each instance using PdfTextFragment.Highlight() method.
  • Save the document to a different PDF file.
  • C#
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.Drawing;

namespace FindAndHighlightTextInPage
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            // Load a PDF file
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

            // Get a specific page
            PdfPageBase page = doc.Pages[1];

            // Create a PdfTextFinder object based on the page
            PdfTextFinder finder = new PdfTextFinder(page);

            // Specify the find options
            finder.Options.Parameter = TextFindParameter.WholeWord;
            finder.Options.Parameter = TextFindParameter.IgnoreCase;

            // Find the instances of the specified text
            List<PdfTextFragment> finds = finder.Find(".NET Framework");

            // Iterate through the find results
            foreach (PdfTextFragment fragment in finds)
            {
                // Highlight text
                fragment.HighLight(Color.LightYellow);
            }

            // Save to a different PDF file
            doc.SaveToFile("HighlightTextInPage.pdf", FileFormat.PDF);

            // Dispose resources
            doc.Dispose();
        }
    }
}

C#: Find and Highlight Text in PDF

Find and Highlight Text in a Rectangular Area in C#

By highlighting text within a rectangular area of a page, users can draw attention to a specific section or piece of information within the document. To specify a rectangular area, you can use the Options.Area property.

The following are the steps to find and highlight text in a rectangular area in C#.

  • Create a PdfDocument object.
  • Load a PDF file from a given path.
  • Get a specific page from the document.
  • Create a PdfTextFinder object based on the page.
  • Specify a rectangular area to search text using PdfTextFinder.Options.Area property.
  • Find all instance of searched text within the rectangular area using PdfTextFinder.Find() method.
  • Iterate through the find results, and highlight each instance using PdfTextFragment.Highlight() method.
  • Save the document to a different PDF file.
  • C#
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.Drawing;

namespace FindAndHighlightTextInRectangularArea
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            // Load a PDF file
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

            // Get a specific page
            PdfPageBase page = doc.Pages[1];

            // Create a PdfTextFinder object based on the page
            PdfTextFinder finder = new PdfTextFinder(page);

            // Specify a rectangular area for searching text
            finder.Options.Area = new RectangleF(0, 0, 841, 200);

            // Specify other options
            finder.Options.Parameter = TextFindParameter.WholeWord;
            finder.Options.Parameter = TextFindParameter.IgnoreCase;

            // Find the instances of the specified text
            List<PdfTextFragment> finds = finder.Find(".NET Framework");

            // Iterate through the find results
            foreach (PdfTextFragment fragment in finds)
            {
                // Highlight text
                fragment.HighLight(Color.LightYellow);
            }

            // Save to a different PDF file
            doc.SaveToFile("HighlightTextInRectangularArea.pdf", FileFormat.PDF);

            // Dispose resources
            doc.Dispose();
        }
    }
}

C#: Find and Highlight Text in PDF

Find and Highlight Text in an Entire PDF Document in C#

The initial code example illustrates how to highlight text in a specific page. To extend this functionality and find and highlight text throughout the entire document, you can iterate through each page of the document and sequentially apply the highlighting to the searched text.

The steps to find and highlight text in an entire PDF document using C# are as follows.

  • Create a PdfDocument object.
  • Load a PDF file from a given path.
  • Iterate through each page in the document.
    • Create a PdfTextFinder object based on a certain page.
    • Specify search options using PdfTextFinder.Options property.
    • Find all instance of searched text using PdfTextFinder.Find() method.
    • Iterate through the find results, and highlight each instance using PdfTextFragment.Highlight() method.
  • Save the document to a different PDF file.
  • C#
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.Drawing;

namespace FindAndHighlightTextInDocument
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            // Load a PDF file
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

            // Iterate through each page of the document
            foreach(PdfPageBase page in doc.Pages){

                // Create a PdfTextFinder object for the current page
                PdfTextFinder finder = new PdfTextFinder(page);

                // Specify the find options
                finder.Options.Parameter = TextFindParameter.WholeWord;
                finder.Options.Parameter = TextFindParameter.IgnoreCase;

                // Find the instances of the specified text
                List<PdfTextFragment> finds = finder.Find(".NET Framework");

                // Iterate through the find results
                foreach (PdfTextFragment fragment in finds)
                {
                    // Highlight text
                    fragment.HighLight(Color.LightYellow);
                }
            }

            // Save to a different PDF file
            doc.SaveToFile("HighlightAll.pdf", FileFormat.PDF);

            // Dispose resources
            doc.Dispose();
        }
    }
}

Find and Highlight Text in PDF Using a Regular Expression in C#

When searching for text in a document, using regular expressions can provide more flexibility and control over the search criteria. To utilize a regular expression, you need to configure the PdfTextFinder.Options.Parameter property to TextFindParameter.Regex, and provide the regular expression pattern as an input to the Find() method.

Here are the steps to find and highlight text in PDF using a regular expression in C#.

  • Create a PdfDocument object.
  • Load a PDF file from a given path.
  • Iterate through each page in the document.
    • Create a PdfTextFinder object based on a certain page.
    • Set the PdfTextFinder.Options.Parameter property to TextFindParameter.Regex.
    • Create a regular expression pattern that matches the specific text patterns you are seeking.
    • Find all instance of the searched text using PdfTextFinder.Find() method.
    • Iterate through the find results, and highlight each instance using PdfTextFragment.Highlight() method.
  • Save the document to a different PDF file.
  • C#
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.Drawing;

namespace FindAndHighlightUsingRegularExpression
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            // Load a PDF file
            doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

            // Iterate through each page of the document
            foreach (PdfPageBase page in doc.Pages)
            {
                // Create a PdfTextFinder object based on the page
                PdfTextFinder finder = new PdfTextFinder(page);

                // Specify the search model as Regex
                finder.Options.Parameter = TextFindParameter.Regex;

                // Find the text that conforms to a regular expression
                string pattern = @"\bM\w*t\b";
                List<PdfTextFragment> finds = finder.Find(pattern);

                // Iterate through the find results
                foreach (PdfTextFragment fragment in finds)
                {
                    // Highlight text
                    fragment.HighLight(Color.LightYellow);
                }
            }

            // Save to a different PDF file
            doc.SaveToFile("HighlightTextUsingRegex.pdf", FileFormat.PDF);

            // Dispose resources
            doc.Dispose();
        }
    }
}

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.