Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Wed Sep 11, 2024 6:16 am

Hello, Team,

I am using spire.pdf for Python.
I would like to extract section numbers from pdf document?
Is there any method to extract it?

ex.
1. Introduction
xxxxxxxxxxxxxxxxxxxxxx

1.1 Purpose
xxxxxxxxxxxxxxxxx

Only section number and subtitle like below.
1. Introduction
1.1 Purpose

H.Okamura
 
Posts: 7
Joined: Thu May 23, 2024 11:54 pm

Thu Sep 12, 2024 3:09 am

Hello,

Thanks for your inquiry.
We suggest using regular expressions to match the content of titles. You can refer to the following code. If it does not meet your expected results, you can provide us with your input PDF document for a detailed investigation. You can upload here or send it to us via email( support@e-iceblue.com ). Thank you in advance.
Code: Select all
pdf = PdfDocument()

# Load PDF document
pdf.LoadFromFile("../content.pdf")

MyFindOptions = PdfTextFindOptions()
# Define regular expressions that match titles
s_Title = "\\d+\\."
# Set as regular search
MyFindOptions.Parameter = TextFindParameter.Regex
# Set specified page numbers
MyPage = pdf.Pages[0]
# Search on the current page
MyFinder = PdfTextFinder(MyPage)
# Set as regular query method
MyFinder.Options = MyFindOptions
# Find a list of matching texts
L_Find = MyFinder.Find(s_Title)
# Circular list
for index, element in enumerate(L_Find): 
# Get the current line of text
    print(f"Index: {index}, title: {element.LineText}") 

Sincerely,
Amin
E-iceblue support team
User avatar

Amin.Gan
 
Posts: 164
Joined: Mon Jul 15, 2024 5:40 am

Tue Sep 17, 2024 11:30 pm

Hi, thank you so much for prompt reply.
I will try it.

H.Okamura
 
Posts: 7
Joined: Thu May 23, 2024 11:54 pm

Wed Sep 18, 2024 8:12 am

Hello,

Thank you for your feedback. Looking forward to your test results. If there are any differences from your expectations, you can provide us with your input PDF document for investigation. You can upload here or send it to us via email( support@e-iceblue.com ). Thank you in advance.

Sincerely,
Amin
E-iceblue support team
User avatar

Amin.Gan
 
Posts: 164
Joined: Mon Jul 15, 2024 5:40 am

Return to Spire.PDF

cron