Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Tue Jul 16, 2024 2:55 am

We're evaluating your java pdf library to see if it meets our needs.

One of the source documents, has a table that looks like this:

EGPD AD2-2.png


When I parse that table I get an extra column on the right side of the table.
Resulting Text from EGPD AD2-2.png


Do you know why this is?

The source doc is here:

pmaneely
 
Posts: 4
Joined: Mon Jul 15, 2024 6:34 pm

Tue Jul 16, 2024 3:24 am

Hello,

Thanks for your inquiry.
Kindly note that there is actually no table object in the PDF file. When we extract the table, we use the lines as the borders of the table. Therefore, the lines outside the table may cause inaccurate recognition. As a solution, you can remove the part in your screenshot and retest. In addition, you can also provide us with your original pdf file, and we will further investigate whether we can optimize the extraction results. You can upload it to the attachment or send it to this email : support@e-iceblue.com. Thanks in advance.

Sincerely,
William
E-iceblue support team
User avatar

William.Zhang
 
Posts: 454
Joined: Mon Dec 27, 2021 2:23 am

Wed Jul 17, 2024 5:37 am

Hello,

Thanks for your message.
We have received the file you provided and reproduced the issue you mentioned. I have logged this issue to our tracking system with the ticket SPIREPDF-6909. Our dev team will further investigate whether there is any solution to optimize the extraction effect. Once there is any progress, I will inform you as soon as possible.

Sincerely,
William
E-iceblue support team
User avatar

William.Zhang
 
Posts: 454
Joined: Mon Dec 27, 2021 2:23 am

Return to Spire.PDF