Extract and Insert PDF Images and Text in WPF

2012-06-20 02:33:32 Written by  support iceblue
Rate this item
(1 Vote)

There can be no doubt that image and text are the most basic elements for a PDF document. In most cases, people need to insert images and text in a PDF file. While actually, things are not as easy as that. For example, you want to insert an image to your PDF document, but this image is in another PDF file, you can neither find a same picture from internet nor paste it directly to your own PDF. In such a situation, you have to extract the PDF image first and then, insert the images you want to your PDF.

This article is designed to share a method to extract images and text from PDF document for WPF via Spire.PDF for WPF. Using Spire.PDF for WPF, you can easily and quickly extract the PDF images and text, then, add any images you want to another PDF. Please follow the below steps.

Download Spire.PDF (Spire.Office) and with .NET Framework 2.0 (or above) together. Install and follow the guide below.

Step 1: Create a new project

  • Create a new project in WPF Application
  • Add a button in MainWindow and set the button Content property to be "Run".
  • Add Spire.Pdf.Wpf.dll and System.Drawing as references. After adding the namespaces, you can view the below codes.
[C#]
using System.Drawing;
using Spire.Pdf;
using Spire.Pdf.Graphics;
using System.IO;
using System.Drawing.Imaging;

namespace pdfextractwpf
{
    /// 
    /// Interaction logic for MainWindow.xaml
    /// 
    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, RoutedEventArgs e)
        {

        }
    }
}
[VB.NET]
Imports System.Drawing
Imports Spire.Pdf
Imports Spire.Pdf.Graphics
Imports System.IO
Imports System.Drawing.Imaging

Namespace pdfextractwpf
	''' 
	''' Interaction logic for MainWindow.xaml
	''' 
	Public Partial Class MainWindow
		Inherits Window
		Public Sub New()
			InitializeComponent()
		End Sub

		Private Sub button1_Click(sender As Object, e As RoutedEventArgs)
	          End Sub
	End Class
End Namespace

Step 2: Extract images and text from PDF document

Load a PDF file from system

[C#]
            //Create a pdf document.
            PdfDocument doc = new PdfDocument();                
            doc.LoadFromFile(@"D:\e-iceblue\Spire.PDF\Demos\Data\Sample2.pdf");
[VB.NET]
	'Create a pdf document.
	Dim doc As New PdfDocument()		
        doc.LoadFromFile("D:\e-iceblue\Spire.PDF\Demos\Data\Sample2.pdf")

Extract images and text from PDF document

[C#]
            StringBuilder buffer = new StringBuilder();
            IList images = new List();

            foreach (PdfPageBase page in doc.Pages)
            {
                buffer.Append(page.ExtractText());
                foreach (System.Drawing.Image image in page.ExtractImages())
                {
                    images.Add(image);
                }
            }

            doc.Close();
[VB.NET]
	Dim buffer As New StringBuilder()
	Dim images As IList(Of System.Drawing.Image) = New List(Of System.Drawing.Image)()

	For Each page As PdfPageBase In doc.Pages
		buffer.Append(page.ExtractText())
		For Each image As System.Drawing.Image In page.ExtractImages()
			images.Add(image)
		Next
	Next

	doc.Close()

Save the extracted images and text.

[C#]
            //save text
            String fileName = "TextInPdf.txt";
            File.WriteAllText(fileName, buffer.ToString());

            //save image
            int index = 0;
            foreach (System.Drawing.Image image in images)
            {
                String imageFileName
                    = String.Format("Image-{0}.png", index++);
                image.Save(imageFileName, ImageFormat.Png);
            }
[VB.NET]
	'save text
	Dim fileName As [String] = "TextInPdf.txt"
	File.WriteAllText(fileName, buffer.ToString())

	'save image
	Dim index As Integer = 0
	For Each image As System.Drawing.Image In images
		Dim imageFileName As [String] = [String].Format("Image-{0}.png", System.Math.Max(System.Threading.Interlocked.Increment(index),index - 1))
		image.Save(imageFileName, ImageFormat.Png)
	Next

Step 3: Insert the extracted image to a newly built PDF file

Create a new PDF document and add a page in it

[C#]
            PdfDocument newDoc = new PdfDocument();
            PdfPageBase newPage = newDoc.Pages.Add();
[VB.NET]
	Dim newDoc As New PdfDocument()
        Dim newPage As PdfPageBase = newDoc.Pages.Add()

Draw the PDF text. And insert the extracted the third image in the newly built PDF document.

[C#]
            newPage.Canvas.DrawString("Extract PDF images & text and insert an extracted image to a newly built PDF",
                                   new PdfFont(PdfFontFamily.Helvetica, 14.5f),
                                   new PdfSolidBrush(new PdfRGBColor(0,100,200)),
                                   10, 40);
            PdfImage img = PdfImage.FromImage(images[2]);
            float width = img.Width * 0.75f;
            float height = img.Height * 0.75f;
            float x = (newPage.Canvas.ClientSize.Width - width) / 2;

            newPage.Canvas.DrawImage(img, x, 100, width, height);
[VB.NET]
	newPage.Canvas.DrawString("Extract PDF images & text and insert an extracted image to a newly built PDF", 
                                    New PdfFont(PdfFontFamily.Helvetica, 14.5F), 
                                    New PdfSolidBrush(New PdfRGBColor(0, 100, 200)), 10, 40)
	Dim img As PdfImage = PdfImage.FromImage(images(2))
	Dim width As Single = img.Width * 0.75F
	Dim height As Single = img.Height * 0.75F
	Dim x As Single = (newPage.Canvas.ClientSize.Width - width) / 2
          newPage.Canvas.DrawImage(img, x, 100, width, height)

Save and launch the PDF file

[C#]
            newDoc.SaveToFile("Image.pdf");
            newDoc.Close();
            System.Diagnostics.Process.Start("Image.pdf");
[VB.NET]
	newDoc.SaveToFile("Image.pdf")
	newDoc.Close()
	System.Diagnostics.Process.Start("Image.pdf")

Effecive Screenshot:

>Read PDF Text and Images

Spire.PDF for WPF allows its users not only to extract images and text from PDF document, but also can save the images to the most popular formats such as .PNG, JPG, BMP, GIF and so on. Click to know more

Additional Info

  • tutorial_title:
Last modified on Friday, 24 September 2021 09:56