Extract Image from PDF and Save it to New PDF File in C#
Extracting image from PDF is not a complex operation, but sometimes we need to save the image to new PDF file. So the article describes the simple c# code to extract image from PDF and save it to new PDF file through a professional PDF .NET library Spire.PDF.
First we need to complete the preparatory work before the procedure:
- Download the Spire.PDF and install it on your machine.
- Add the Spire.PDF.dll files as reference.
- Open bin folder and select the three dll files under .NET 4.0.
- Right click property and select properties in its menu.
- Set the target framework as .NET 4.
- Add Spire.PDF as namespace.
The following steps will show you how to do this with ease:
Step 1: Create a PDF document.
PdfDocument doc = new PdfDocument(); doc.LoadFromFile(@"..\..\Sample.pdf");
Step 2: Extract the image from PDF document.
doc.ExtractImages();
Step 3: Save image file.
image.Save("image.png",System.Drawing.Imaging.ImageFormat.Png); PdfImage image2 = PdfImage.FromFile(@"image.png"); PdfDocument doc2=new PdfDocument (); PdfPageBase page=doc2.Pages.Add();
Step 4: Draw image to new PDF file.
float width = image.Width * 0.75f; float height = image.Height * 0.75f; float x = (page.Canvas.ClientSize.Width - width) / 2; page.Canvas.DrawImage(image2,x,60,width,height); doc2.SaveToFile(@"..\..\Image.pdf");
Here is the whole code:
static void Main(string[] args) { PdfDocument doc = new PdfDocument(); doc.LoadFromFile(@"..\..\Sample.pdf"); doc.ExtractImages(); image.Save("image.png",System.Drawing.Imaging.ImageFormat.Png); PdfImage image2 = PdfImage.FromFile(@"image.png"); PdfDocument doc2=new PdfDocument (); PdfPageBase page=doc2.Pages.Add(); float width = image.Width * 0.75f; float height = image.Height * 0.75f; float x = (page.Canvas.ClientSize.Width - width) / 2; page.Canvas.DrawImage(image2,x,60,width,height); doc2.SaveToFile(@"..\..\Image.pdf"); }
Here comes to the preview of the effect picture:
C#/VB.NET: Extract Images from PDF
Images are often used in PDF documents to present information in an easily understandable manner. In certain cases, you may need to extract images from PDF documents. For example, when you want to use a chart image from a PDF report in a presentation or another document. This article will demonstrate how to extract images from PDF in C# and VB.NET using Spire.PDF for .NET.
Install Spire.PDF for .NET
To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.
PM> Install-Package Spire.PDF
Extract Images from PDF in C# and VB.NET
The following are the main steps to extract images from a PDF document using Spire.PDF for .NET:
- Create a PdfDocument object.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Loop through all the pages in the document.
- Extract images from each page using PdfPageBase.ExtractImages() method and save them to a specified file path.
- C#
- VB.NET
using Spire.Pdf; using System.Drawing; namespace ExtractImages { class Program { static void Main(string[] args) { //Create a PdfDocument instance PdfDocument pdf = new PdfDocument(); //Load a PDF document pdf.LoadFromFile("Input.pdf"); int i = 1; //Loop through all pages in the document foreach (PdfPageBase page in pdf.Pages) { //Extract images from each page and save them to a specified file path foreach (Image image in page.ExtractImages()) { image.Save(@"C:/Users/Administrator/Desktop/Images/" + "image" + i + ".png", System.Drawing.Imaging.ImageFormat.Png); i++; } } } } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
C#/VB.NET: Extract Text from PDF Documents
PDF documents are fixed in layout and do not allow users to perform modifications in them. To make the PDF content editable again, you can convert PDF to Word or extract text from PDF. In this article, you will learn how to extract text from a specific PDF page, how to extract text from a particular rectangle area, and how to extract text by SimpleTextExtractionStrategy in C# and VB.NET using Spire.PDF for .NET.
- Extract Text from a Specified Page
- Extract Text from a Rectangle
- Extract Text using SimpleTextExtractionStrategy
Install Spire.PDF for .NET
To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.
PM> Install-Package Spire.PDF
Extract Text from a Specified Page
The following are the steps to extract text from a certain page of a PDF document using Spire.PDF for .NET.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Get the specific page through PdfDocument.Pages[index] property.
- Create a PdfTextExtractor object.
- Create a PdfTextExtractOptions object, and set the IsExtractAllText property to true.
- Extract text from the selected page using PdfTextExtractor.ExtractText() method.
- Write the extracted text to a TXT file.
- C#
- VB.NET
using System; using System.IO; using Spire.Pdf; using Spire.Pdf.Texts; namespace ExtractTextFromPage { class Program { static void Main(string[] args) { //Create a PdfDocument object PdfDocument doc = new PdfDocument(); //Load a PDF file doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Terms of Service.pdf"); //Get the second page PdfPageBase page = doc.Pages[1]; //Create a PdfTextExtractot object PdfTextExtractor textExtractor = new PdfTextExtractor(page); //Create a PdfTextExtractOptions object PdfTextExtractOptions extractOptions = new PdfTextExtractOptions(); //Set isExtractAllText to true extractOptions.IsExtractAllText = true; //Extract text from the page string text = textExtractor.ExtractText(extractOptions); //Write to a txt file File.WriteAllText("Extracted.txt", text); } } }
Extract Text from a Rectangle
The following are the steps to extract text from a rectangle area of a page using Spire.PDF for .NET.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Get the specific page through PdfDocument.Pages[index] property.
- Create a PdfTextExtractor object.
- Create a PdfTextExtractOptions object, and specify the rectangle area through the ExtractArea property of it.
- Extract text from the rectangle using PdfTextExtractor.ExtractText() method.
- Write the extracted text to a TXT file.
- C#
- VB.NET
using Spire.Pdf; using Spire.Pdf.Texts; using System.IO; using System.Drawing; namespace ExtractTextFromRectangleArea { class Program { static void Main(string[] args) { //Create a PdfDocument object PdfDocument doc = new PdfDocument(); //Load a PDF file doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Terms of Service.pdf"); //Get the second page PdfPageBase page = doc.Pages[1]; //Create a PdfTextExtractot object PdfTextExtractor textExtractor = new PdfTextExtractor(page); //Create a PdfTextExtractOptions object PdfTextExtractOptions extractOptions = new PdfTextExtractOptions(); //Set the rectangle area extractOptions.ExtractArea = new RectangleF(0, 0, 890, 170); //Extract text from the rectangle string text = textExtractor.ExtractText(extractOptions); //Write to a txt file File.WriteAllText("Extracted.txt", text); } } }
Extract Text using SimpleTextExtractionStrategy
The above methods extract text line by line. When extracting text using SimpleTextExtractionStrategy, it keeps track of the current Y position of each string and inserts a line break into the output if the Y position has changed. The following are the detailed steps.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Get the specific page through PdfDocument.Pages[index] property.
- Create a PdfTextExtractor object.
- Create a PdfTextExtractOptions object and set the IsSimpleExtraction property to true.
- Extract text from the selected page using PdfTextExtractor.ExtractText() method.
- Write the extracted text to a TXT file.
- C#
- VB.NET
using System.IO; using Spire.Pdf; using Spire.Pdf.Texts; namespace SimpleExtraction { class Program { static void Main(string[] args) { //Create a PdfDocument object PdfDocument doc = new PdfDocument(); //Load a PDF file doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Invoice.pdf"); //Get the first page PdfPageBase page = doc.Pages[0]; //Create a PdfTextExtractor object PdfTextExtractor textExtractor = new PdfTextExtractor(page); //Create a PdfTextExtractOptions object PdfTextExtractOptions extractOptions = new PdfTextExtractOptions(); //Set IsSimpleExtraction to true extractOptions.IsSimpleExtraction = true; //Extract text from the selected page string text = textExtractor.ExtractText(extractOptions); //Write to a txt file File.WriteAllText("Extracted.txt", text); } } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.