Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Wed Apr 17, 2024 7:22 am

Hi

I am reading the text of pdf invoices to extract different informations (eg. customer, id, amount etc.) and am currently trying out different variants.

What exactly does

PdfTextExtractOptions.IsExtractAllText

do?

Regards Peter

peterInStIngbert
 
Posts: 14
Joined: Wed Nov 25, 2020 2:08 pm

Thu Apr 18, 2024 1:45 am

Hi,

Thanks for your inquiry.
Yes, you can use PdfTextExtractOptions.IsExtractAllText method to extract all text of Pdf file, then handle the text to get the desired content. I put the complete code below for your reference.
Code: Select all
String input = @"..\..\..\..\..\..\Data\PDFTemplate-Az.pdf";
            PdfDocument doc = new PdfDocument();
            // Read a pdf file
            doc.LoadFromFile(input);

            // Get the first page
            PdfPageBase page = doc.Pages[0];

            // Extract text from page keeping white space
            PdfTextExtractOptions options = new PdfTextExtractOptions();
            options.IsExtractAllText = true; //false->Extract text from page without keeping white space
            PdfTextExtractor pdfTextExtractor = new PdfTextExtractor(page);
            String text = pdfTextExtractor.ExtractText(options);

            String result = Path.GetFullPath("ExtractTextFromParticularPage_out.txt");
            // Create a writer to put the extracted text
            TextWriter tw = new StreamWriter(result);

            // Write a line of text to the file
            tw.WriteLine(text);



Sincerely
Abel
E-iceblue support team
User avatar

Abel.He
 
Posts: 1010
Joined: Tue Mar 08, 2022 2:02 am

Return to Spire.PDF