Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Thu Sep 05, 2024 7:57 am

Hello,

We are using Spire.PDF for Python.

When we read pdf document with Greek alphabet like θ or Σ, spire will ignore senteces inclded this character.

Below is a expamle,

This station is near a bank. <- correct sentence
This staθon is near a bank. <- actual sentence
ti -> θ
This is originally PDF document problem.
But spire will ignore this page of content after "This sta".

We expect "This staon is near a bank. " Igonore only invalid charactor.
Is there any solution?

Thank you for your support.

H.Okamura
 
Posts: 7
Joined: Thu May 23, 2024 11:54 pm

Fri Sep 06, 2024 8:14 am

Hello,

Thanks for your inquiry.I conducted preliminary testing using the latest version(Spire.PDF for Python version: 10.8.1) and found that sentences containing 'θ' or '∑' characters can be extracted normally. If you are not using the latest version, please update to the latest version and try again. If your issue persists after testing, to help us investigate your problem accurately,please provide us with your original input PDF document.You can upload here or send it to us via email( support@e-iceblue.com ). Thank you in advance.

Sincerely,
Amin
E-iceblue support team
User avatar

Amin.Gan
 
Posts: 277
Joined: Mon Jul 15, 2024 5:40 am

Wed Sep 18, 2024 6:14 am

Hi Team,
Thank you for prompt reply.

I atttached the original pdf document(Test.pdf).
Please check and investigate it.

In this case, the original text is below.
-----------------------------------------------------------------------------------------------
The bit rate shall be 500 kbits per second for both the arbitration phase as well as the data phase of the frame.
The bit rate tolerance shall be < 0.15%, including possible PLL jiiter.
9.2.2 AA
9.2.2.1BB

-----------------------------------------------------------------------------------------------

However,

Spire extracted below and other sentences were ignored.
-----------------------------------------------------------------------------------------------
The bit rate shall be 500 kbits per second for both the arbitra
-----------------------------------------------------------------------------------------------

I expect like below.
-----------------------------------------------------------------------------------------------
The bit rate shall be 500 kbits per second for both the arbitra on phase as well as the data phase of the frame.
The bit rate tolerance shall be < 0.15%, including possible PLL ji er.
9.2.2 AA
9.2.2.1BB

-----------------------------------------------------------------------------------------------

Thank you so much.

H.Okamura
 
Posts: 7
Joined: Thu May 23, 2024 11:54 pm

Thu Sep 19, 2024 2:39 am

Hello,

Thank you for your feedback. I have successfully reproduced your issue using the pdf document you provided. This issue has been logged in our bug tracking system under the number SPIREPDF-7047. Our Dev team will investigate it further, once there is any update, we will let you know.

Sincerely,
Amin
E-iceblue support team
User avatar

Amin.Gan
 
Posts: 277
Joined: Mon Jul 15, 2024 5:40 am

Return to Spire.PDF