There can be no doubt that image and text are the most basic elements for a PDF document. In most cases, people need to insert images and text in a PDF file. While actually, things are not as easy as that. For example, you want to insert an image to your PDF document, but this image is in another PDF file, you can neither find a same picture from internet nor paste it directly to your own PDF. In such a situation, you have to extract the PDF image first and then, insert the images you want to your PDF.
This article is designed to share a method to extract images and text from PDF document for WPF via Spire.PDF for WPF. Using Spire.PDF for WPF, you can easily and quickly extract the PDF images and text, then, add any images you want to another PDF. Please follow the below steps.
Download Spire.PDF (Spire.Office) and with .NET Framework 2.0 (or above) together. Install and follow the guide below.
Step 1: Create a new project
- Create a new project in WPF Application
- Add a button in MainWindow and set the button Content property to be "Run".
- Add Spire.Pdf.Wpf.dll and System.Drawing as references. After adding the namespaces, you can view the below codes.
using System.Drawing; using Spire.Pdf; using Spire.Pdf.Graphics; using System.IO; using System.Drawing.Imaging; namespace pdfextractwpf { /// /// Interaction logic for MainWindow.xaml /// public partial class MainWindow : Window { public MainWindow() { InitializeComponent(); } private void button1_Click(object sender, RoutedEventArgs e) { } } }
Imports System.Drawing Imports Spire.Pdf Imports Spire.Pdf.Graphics Imports System.IO Imports System.Drawing.Imaging Namespace pdfextractwpf ''' ''' Interaction logic for MainWindow.xaml ''' Public Partial Class MainWindow Inherits Window Public Sub New() InitializeComponent() End Sub Private Sub button1_Click(sender As Object, e As RoutedEventArgs) End Sub End Class End Namespace
Step 2: Extract images and text from PDF document
Load a PDF file from system
//Create a pdf document. PdfDocument doc = new PdfDocument(); doc.LoadFromFile(@"D:\e-iceblue\Spire.PDF\Demos\Data\Sample2.pdf");
'Create a pdf document. Dim doc As New PdfDocument() doc.LoadFromFile("D:\e-iceblue\Spire.PDF\Demos\Data\Sample2.pdf")
Extract images and text from PDF document
StringBuilder buffer = new StringBuilder(); IList images = new List(); foreach (PdfPageBase page in doc.Pages) { buffer.Append(page.ExtractText()); foreach (System.Drawing.Image image in page.ExtractImages()) { images.Add(image); } } doc.Close();
Dim buffer As New StringBuilder() Dim images As IList(Of System.Drawing.Image) = New List(Of System.Drawing.Image)() For Each page As PdfPageBase In doc.Pages buffer.Append(page.ExtractText()) For Each image As System.Drawing.Image In page.ExtractImages() images.Add(image) Next Next doc.Close()
Save the extracted images and text.
//save text String fileName = "TextInPdf.txt"; File.WriteAllText(fileName, buffer.ToString()); //save image int index = 0; foreach (System.Drawing.Image image in images) { String imageFileName = String.Format("Image-{0}.png", index++); image.Save(imageFileName, ImageFormat.Png); }
'save text Dim fileName As [String] = "TextInPdf.txt" File.WriteAllText(fileName, buffer.ToString()) 'save image Dim index As Integer = 0 For Each image As System.Drawing.Image In images Dim imageFileName As [String] = [String].Format("Image-{0}.png", System.Math.Max(System.Threading.Interlocked.Increment(index),index - 1)) image.Save(imageFileName, ImageFormat.Png) Next
Step 3: Insert the extracted image to a newly built PDF file
Create a new PDF document and add a page in it
PdfDocument newDoc = new PdfDocument(); PdfPageBase newPage = newDoc.Pages.Add();
Dim newDoc As New PdfDocument() Dim newPage As PdfPageBase = newDoc.Pages.Add()
Draw the PDF text. And insert the extracted the third image in the newly built PDF document.
newPage.Canvas.DrawString("Extract PDF images & text and insert an extracted image to a newly built PDF", new PdfFont(PdfFontFamily.Helvetica, 14.5f), new PdfSolidBrush(new PdfRGBColor(0,100,200)), 10, 40); PdfImage img = PdfImage.FromImage(images[2]); float width = img.Width * 0.75f; float height = img.Height * 0.75f; float x = (newPage.Canvas.ClientSize.Width - width) / 2; newPage.Canvas.DrawImage(img, x, 100, width, height);
newPage.Canvas.DrawString("Extract PDF images & text and insert an extracted image to a newly built PDF", New PdfFont(PdfFontFamily.Helvetica, 14.5F), New PdfSolidBrush(New PdfRGBColor(0, 100, 200)), 10, 40) Dim img As PdfImage = PdfImage.FromImage(images(2)) Dim width As Single = img.Width * 0.75F Dim height As Single = img.Height * 0.75F Dim x As Single = (newPage.Canvas.ClientSize.Width - width) / 2 newPage.Canvas.DrawImage(img, x, 100, width, height)
Save and launch the PDF file
newDoc.SaveToFile("Image.pdf"); newDoc.Close(); System.Diagnostics.Process.Start("Image.pdf");
newDoc.SaveToFile("Image.pdf") newDoc.Close() System.Diagnostics.Process.Start("Image.pdf")
Effecive Screenshot:
>
Spire.PDF for WPF allows its users not only to extract images and text from PDF document, but also can save the images to the most popular formats such as .PNG, JPG, BMP, GIF and so on. Click to know more