PdfTextExtractOptions.IsExtractAllText

Wed Apr 17, 2024 7:22 am

Hi

I am reading the text of pdf invoices to extract different informations (eg. customer, id, amount etc.) and am currently trying out different variants.

What exactly does

PdfTextExtractOptions.IsExtractAllText

do?

Regards Peter

Thu Apr 18, 2024 1:45 am

Hi,

Thanks for your inquiry.
Yes, you can use PdfTextExtractOptions.IsExtractAllText method to extract all text of Pdf file, then handle the text to get the desired content. I put the complete code below for your reference.

Code: Select all: String input = @"..\..\..\..\..\..\Data\PDFTemplate-Az.pdf"; PdfDocument doc = new PdfDocument(); // Read a pdf file doc.LoadFromFile(input); // Get the first page PdfPageBase page = doc.Pages[0]; // Extract text from page keeping white space PdfTextExtractOptions options = new PdfTextExtractOptions(); options.IsExtractAllText = true; //false->Extract text from page without keeping white space PdfTextExtractor pdfTextExtractor = new PdfTextExtractor(page); String text = pdfTextExtractor.ExtractText(options); String result = Path.GetFullPath("ExtractTextFromParticularPage_out.txt"); // Create a writer to put the extracted text TextWriter tw = new StreamWriter(result); // Write a line of text to the file tw.WriteLine(text);

Sincerely
Abel
E-iceblue support team