Knowing how to remove headers or footers in Word is an essential skill as there may be times you need to change the formatting of your document or collaborate with others who do not need the headers or footers. In this article, you will learn how to remove headers or footers in Word in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Remove Headers in a Word Document in Python

Spire.Doc for Python supports getting different headers in the first pages, odd pages, and even pages, and then delete all of them through the HeaderFooter.ChildObjects.Clear() method. The following are the detailed steps:

  • Create a Document instance.
  • Load a Word document using Document.LoadFromFile() method.
  • Get a specified section using Document.Sections[] property.
  • Iterate through all paragraphs in the section, and then all child objects in each paragraph.
  • Get the headers for the first, odd, and even pages using Section.HeadersFooters[hfType: HeaderFooterType] property, and then delete them using HeaderFooter.ChildObjects.Clear() method.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

inputFile = "HeaderFooter.docx"
outputFile = "RemoveHeaders.docx"

# Create a Document instance
doc = Document()

# Load a Word document
doc.LoadFromFile(inputFile)

# Get the first section
section = doc.Sections[0]

# Iterate through all paragraphs in the section
for i in range(section.Paragraphs.Count):
    para = section.Paragraphs.get_Item(i)

    # Iterate through all child objects in each paragraph
    for j in range(para.ChildObjects.Count):
        obj = para.ChildObjects.get_Item(j)

        # Delete header in the first page
        header = None
        header = section.HeadersFooters[HeaderFooterType.HeaderFirstPage]
        if header is not None:
            header.ChildObjects.Clear()

        # Delete headers in the odd pages
        header = section.HeadersFooters[HeaderFooterType.HeaderOdd]
        if header is not None:
            header.ChildObjects.Clear()

        # Delete headers in the even pages
        header = section.HeadersFooters[HeaderFooterType.HeaderEven]
        if header is not None:
            header.ChildObjects.Clear()

# Save the result document
doc.SaveToFile(outputFile, FileFormat.Docx)
doc.Close()

Python: Remove Headers or Footers in Word

Remove Footers in a Word Document in Python

Deleting footers is similar to that of deleting headers, you can also get the footers on different pages first and then delete them at once. The following are the detailed steps:

  • Create a Document instance.
  • Load a Word document using Document.LoadFromFile() method.
  • Get a specified section using Document.Sections[] property.
  • Iterate through all paragraphs in the section, and then all child objects in each paragraph.
  • Get the footers for the first, odd, and even pages using Section.HeadersFooters[hfType: HeaderFooterType] property, and then delete them using HeaderFooter.ChildObjects.Clear() method.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

inputFile = "HeaderFooter.docx"
outputFile = "RemoveFooters.docx"

# Create a Document instance
doc = Document()

# Load a Word document
doc.LoadFromFile(inputFile)

# Get the first section
section = doc.Sections[0]

# Iterate through all paragraphs in the section
for i in range(section.Paragraphs.Count):
    para = section.Paragraphs.get_Item(i)

    # Iterate through all child objects in each paragraph
    for j in range(para.ChildObjects.Count):
        obj = para.ChildObjects.get_Item(j)

        # Delete footer in the first page
        footer = None
        footer = section.HeadersFooters[HeaderFooterType.FooterFirstPage]
        if footer is not None:
            footer.ChildObjects.Clear()

        # Delete footers in the odd pages
        footer = section.HeadersFooters[HeaderFooterType.FooterOdd]
        if footer is not None:
            footer.ChildObjects.Clear()

        # Delete footers in the even pages
        footer = section.HeadersFooters[HeaderFooterType.FooterEven]
        if footer is not None:
            footer.ChildObjects.Clear()

# Save the result document
doc.SaveToFile(outputFile, FileFormat.Docx)
doc.Close()

Python: Remove Headers or Footers in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Friday, 07 June 2024 07:59

Python: Extract Comments from Word

Comments in Word documents are often used for collaborative review and feedback purposes. They may contain text and images that provide valuable information to guide document improvements. Extracting the text and images from comments allows you to analyze and evaluate the feedback provided by reviewers, helping you gain a comprehensive understanding of the strengths, weaknesses, and suggestions related to the document. In this article, we will demonstrate how to extract text and images from Word comments in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Extract Text from Word Comments in Python

You can easily retrieve the author and text of a Word comment using the Comment.Format.Author and Comment.Body.Paragraphs[index].Text properties provided by Spire.Doc for Python. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Create a list to store the extracted comment data.
  • Iterate through the comments in the document.
  • For each comment, iterate through the paragraphs of the comment body.
  • For each paragraph, get the text using the Comment.Body.Paragraphs[index].Text property.
  • Get the author of the comment using the Comment.Format.Author property.
  • Add the text and author of the comment to the list.
  • Save the content of the list to a text file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
document = Document()
# Load a Word document containing comments
document.LoadFromFile("Comments.docx")

# Create a list to store the extracted comment data
comments = []

# Iterate through the comments in the document
for i in range(document.Comments.Count):
    comment = document.Comments[i]
    comment_text = ""

    # Iterate through the paragraphs in the comment body
    for j in range(comment.Body.Paragraphs.Count):
        paragraph = comment.Body.Paragraphs[j]
        comment_text += paragraph.Text + "\n"

    # Get the comment author
    comment_author = comment.Format.Author

    # Append the comment data to the list
    comments.append({
        "author": comment_author,
        "text": comment_text
    })

# Write the comment data to a file
with open("comment_data.txt", "w", encoding="utf-8") as file:
    for comment in comments:
        file.write(f"Author: {comment['author']}\nText: {comment['text']}\n\n")

Python: Extract Comments from Word

Extract Images from Word Comments in Python

To extract images from Word comments, you need to iterate through the child objects in the paragraphs of the comments to find the DocPicture objects, then get the image data using DocPicture.ImageBytes property, finally save the image data to image files.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Create a list to store the extracted image data.
  • Iterate through the comments in the document.
  • For each comment, iterate through the paragraphs of the comment body.
  • For each paragraph, iterate through the child objects of the paragraph.
  • Check if the object is a DocPicture object.
  • If the object is a DocPicture, get the image data using the DocPicture.ImageBytes property and add it to the list.
  • Save the image data in the list to individual image files.
  • Python
from spire.doc import *
from spire.doc.common import *
 
# Create an object of the Document class
document = Document()
# Load a Word document containing comments
document.LoadFromFile("Comments.docx")
 
# Create a list to store the extracted image data
images = []
 
# Iterate through the comments in the document
for i in range(document.Comments.Count):
    comment = document.Comments[i]
    # Iterate through the paragraphs in the comment body
    for j in range(comment.Body.Paragraphs.Count):
        paragraph = comment.Body.Paragraphs[j]
        # Iterate through the child objects in the paragraph
        for o in range(paragraph.ChildObjects.Count):
            obj = paragraph.ChildObjects[o]
            # Find the images
            if isinstance(obj, DocPicture):
                picture = obj
                # Get the image data and add it to the list
                data_bytes = picture.ImageBytes
                images.append(data_bytes)
 
# Save the image data to image files
for i, image_data in enumerate(images):
    file_name = f"CommentImage-{i}.png"
    with open(os.path.join("CommentImages/", file_name), 'wb') as image_file:
        image_file.write(image_data)

Python: Extract Comments from Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Drop-down lists in Excel worksheets are an indispensable tool for enhancing data accuracy, efficiency, and usability in spreadsheet management. By offering pre-defined options within a cell, they not only streamline data entry processes but also enforce consistency, reducing the likelihood of input errors. This feature is particularly valuable when working with large datasets or collaborative projects where maintaining uniformity across multiple entries is crucial. This article demonstrates how to create customized drop-down lists within Excel worksheets using Spire.XLS for Python, empowering users to create organized and user-friendly worksheets.

Install Spire.XLS for Python

This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.XLS

If you are unsure how to install, please refer to: How to Install Spire.XLS for Python on Windows

Create Drop-Down Lists Based on Cell Values Using Python

In Excel worksheets, creating drop-down lists is accomplished through the data validation feature. With Spire.XLS for Python, developers can use the CellRange.DataValidation.DataRange property to create drop-down lists within cells and use the data from the specified cell range as list options.

The detailed steps for creating a drop-down list based on cell values are as follows:

  • Create an instance of Workbook class.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a worksheet using Workbook.Worksheets.get_Item() method.
  • Get a specific cell range through Worksheet.Range[] property.
  • Set the data range for data validation of the cell range through CellRange.DataValidation.DataRange property to create drop-down lists with cell values.
  • Save the workbook using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create an instance of Workbook
workbook = Workbook()

# Load an Excel file
workbook.LoadFromFile("Sample.xlsx")

# Get the first worksheet
sheet = workbook.Worksheets.get_Item(0)

# Get a specific cell range
cellRange = sheet.Range["C3:C7"]

# Set the data range for data validation to create drop-down lists in the cell range
cellRange.DataValidation.DataRange = sheet.Range["F4:H4"]

# Save the workbook
workbook.SaveToFile("output/DropDownListExcel.xlsx", FileFormat.Version2016)
workbook.Dispose()

Python: Create Drop-Down Lists in Excel Worksheets

Create Drop-Down Lists Based on String Using Python

Spire.XLS for Python also provides the CellRange.DataValidation.Values property to create drop-down lists in cells directly using string lists.

The detailed steps for creating drop-down lists based on values are as follows:

  • Create an instance of Workbook class.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a worksheet using Workbook.Worksheets.get_Item() method.
  • Get a specific cell range through Worksheet.Range[] property.
  • Set a string list as the values of data validation in the cell range through CellRange.DataValidation.Values property to create drop-down lists based on strings.
  • Save the workbook using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create an instance of Workbook
workbook = Workbook()

# Load an Excel file
workbook.LoadFromFile("Sample.xlsx")

# Get the first worksheet
sheet = workbook.Worksheets.get_Item(0)

# Get a cell range
cellRange = sheet.Range["D3:D7"]

# Set the value for data validation to create drop-down lists
cellRange.DataValidation.Values = ["Starter", "Technician", "Director", "Executive"]

# Save the workbook
workbook.SaveToFile("output/ValueDropDownListExcel.xlsx", FileFormat.Version2016)
workbook.Dispose()

Python: Create Drop-Down Lists in Excel Worksheets

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Presenting information in a logical sequence is vital for an effective PowerPoint presentation. Reordering slides in a PowerPoint document gives you the flexibility to fine-tune your presentation and ensure that it delivers your message with maximum impact. By organizing your slides strategically, you can create a dynamic and engaging presentation experience.

In this article, you will learn how to reorder slides in a PowerPoint document in Java using the Spire.Presentation for Java library.

Install Spire.Presentation for Java

First of all, you're required to add the Spire.Presentation.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.presentation</artifactId>
        <version>9.10.2</version>
    </dependency>
</dependencies>
    

Reorder Slides in a PowerPoint Document in Java

To rearrange the slides in a PowerPoint presentation, you need to create two presentation objects. One object will be used to load the original document while the other will be used to create a new document. By copying the slides from the original document to the new document in the desired order, you can effectively rearrange the slides.

The following are the steps to rearrange slides in a PowerPoint document using Java.

  • Create a Presentationobject.
  • Load a PowerPoint document using Presentation.loadFromFile() method.
  • Specify the slide order within an array.
  • Create another Presentation object for creating a new presentation.
  • Add the slides from the original document to the new presentation in the specified order using Presentation.getSlides().append() method.
  • Save the new presentation to a PPTX file using Presentation.saveToFile() method.
  • Java
import com.spire.presentation.FileFormat;
import com.spire.presentation.Presentation;

public class ReorderSlides {

    public static void main(String[] args) throws Exception {

        // Create a Presentation object
        Presentation presentation = new Presentation();

        // Load a PowerPoint file
        presentation.loadFromFile("C:\\Users\\Administrator\\Desktop\\input.pptx");

        // Specify the new slide order within an array
        int[] newSlideOrder = new int[] { 3, 4, 1, 2 };

        // Create another Presentation object
        Presentation new_presentation = new Presentation();

        // Remove the default slide
        new_presentation.getSlides().removeAt(0);

        // Iterate through the array
        for (int i = 0; i < newSlideOrder.length; i++)
        {
            // Add the slides from the original PowerPoint file to the new PowerPoint document in the new order
            new_presentation.getSlides().append(presentation.getSlides().get(newSlideOrder[i] - 1));
        }

        // Save the new presentation to file
        new_presentation.saveToFile("output/NewOrder.pptx", FileFormat.PPTX_2019);

        // Dispose resources
        presentation.dispose();
        new_presentation.dispose();
    }
}

Java: Reorder Slides in a PowerPoint Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Thursday, 06 June 2024 09:12

Spire.Doc 12.6.1 supports AI features

We are excited to announce the release of Spire.Doc 12.6.1. This version supports AI features, such as document generating, translating, polishing, and summarizing. Besides, it also enhances the conversion from Word to HTML, OFD, and PDF. Moreover, some known issues are fixed in this version, such as the issue that the layout was incorrect when printing Word documents. More details are listed below.

Here is a list of changes made in this release

Category ID Description
New feature - Supports AI features: document generation, document polishing, document translation, abstract generation, summary creation, spelling check, object recognition, article continuation, as well as questions and answers.
Bug SPIREDOC-10196 Fixes the issue that the layout was incorrect when printing Word documents.
Bug SPIREDOC-10211 Fixes the issue that the fonts were incorrect after converting Word to HTML.
Bug SPIREDOC-10221 Fixes the issue that the contents were incorrect after converting Word to OFD.
Bug SPIREDOC-10353 Fixes the issue that the program threw a "System.NullReferenceException" when loading a Word document.
Bug SPIREDOC-10515 Fixes the issue that the program threw a "System.NullReferenceException" when converting a Word document to PDF.
Click the link to download Spire.Doc 12.6.1:
More information of Spire.Doc new release or hotfix:

We are pleased to announce the release of Spire.Presentation for Java 9.6.0. This version mainly fixes several issues that occurred when converting PPTX to PDF/images, merging, loading and saving PowerPoint documents. More details are listed below.

Here is a list of changes made in this release

Category ID Description
Bug SPIREPPT-2487 Fixes the issue that the links were missing when merging PowerPoint documents.
Bug SPIREPPT-2499 Fixes the issue that the chart dynamics were lost after loading and saving PowerPoint documents.
Bug SPIREPPT-2500 Fixes the issue that the program hanged when converting PowerPoint to PDF.
Bug SPIREPPT-2517 Fixes the issue that the application threw "DocumentEditException" when merging PowerPoint documents.
Bug SPIREPPT-2518 Fixes the issue that the fonts retrieved in shapes were incorrect.
Bug SPIREPPT-2519 Fixes the issue that the application threw "NullPointerException" when retrieving chart data sources in PowerPoint documents.
Bug SPIREPPT-2525 Fixes the issue that the application threw "outOfMemoryError" when converting PowerPoint to images.
Click the link below to download Spire.Presentation for Java 9.6.0:
Wednesday, 05 June 2024 01:02

Python: Get Revisions of Word Document

With the increasing popularity of team collaboration, the track changes function in Word documents has become the cornerstone of version control and content review. However, for developers who pursue automation and efficiency, how to flexibly extract these revision information from Word documents remains a significant challenge. This article will introduce you to how to use Spire.Doc for Python to obtain revision information in Word documents.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Get Revisions of Word Document in Python

Spire.Doc for Python provides the IsInsertRevision and DeleteRevision properties to support determining whether an element in a Word document is an insertion revision or a deletion revision. Here are the detailed steps:

  • Create an instance of the Document class and load the Word document that contains revisions.
  • Initialize lists to collect insertion and deletion revision information.
  • Iterate through the sections of the document and their body elements.
  • Obtain the paragraphs in the body and use the IsInsertRevision property to determine if the paragraph is an insertion revision.
  • Get the type, author, and associated text of the insertion revision.
  • Use the IsDeleteRevision property to determine if the paragraph is a deletion revision, and obtain its revision type, author, and associated text.
  • Iterate through the child elements of the paragraph, similarly checking if the TextRange is an insertion or deletion revision, and retrieve the revision type, author, and associated text.
  • Define a WriteAllText function to save the insertion and deletion revision information to TXT documents.
  • Python
from spire.doc import *

# Function to write text to a file
def WriteAllText(fname: str, text: str):
    with open(fname, "w", encoding='utf-8') as fp:
        fp.write(text)

# Input and output file names
inputFile = "sample.docx"
outputFile1 = "InsertRevision.txt"
outputFile2 = "DeleteRevision.txt"

# Create a Document object
document = Document()

# Load the Word document
document.LoadFromFile(inputFile)

# Initialize lists to store insert and delete revisions
insert_revisions = []
delete_revisions = []

# Iterate through sections in the document
for k in range(document.Sections.Count):
    sec = document.Sections.get_Item(k)

    # Iterate through body elements in the section
    for m in range(sec.Body.ChildObjects.Count):
        # Check if the item is a Paragraph
        docItem = sec.Body.ChildObjects.get_Item(m)
        if isinstance(docItem, Paragraph):
            para = docItem
            para.AppendField("",FieldType.FieldDocVariable)

            # Check if the paragraph is an insertion revision
            if para.IsInsertRevision:
                insRevison = para.InsertRevision
                insType = insRevison.Type
                insAuthor = insRevison.Author

                # Add insertion revision details to the list
                insert_revisions.append(f"Revision Type: {insType.name}\n")
                insert_revisions.append(f"Revision Author: {insAuthor}\n")
                insert_revisions.append(f"Insertion Text: {para.Text}\n")
            # Check if the paragraph is a deletion revision
            elif para.IsDeleteRevision:
                delRevison = para.DeleteRevision
                delType = delRevison.Type
                delAuthor = delRevison.Author

                # Add deletion revision details to the list
                delete_revisions.append(f"Revision Type:: {delType.name}\n")
                delete_revisions.append(f"Revision Author: {delAuthor}\n")
                delete_revisions.append(f"Deletion Text: {para.Text}\n")
            else:
                # Iterate through all child objects of Paragraph
                for j in range(para.ChildObjects.Count):
                    obj = para.ChildObjects.get_Item(j)
                    # Check if the current object is an instance of TextRange
                    if isinstance(obj, TextRange):
                        textRange = obj

                        # Check if the textrange is an insertion revision
                        if textRange.IsInsertRevision:
                            insRevison = textRange.InsertRevision
                            insType = insRevison.Type
                            insAuthor = insRevison.Author

                            # Add insertion revision details to the list
                            insert_revisions.append(f"Revision Type: {insType.name}\n")
                            insert_revisions.append(f"Revision Author: {insAuthor}\n")
                            insert_revisions.append(f"Insertion Text: {textRange.Text}\n")
                        # Check if the textrange is a deletion revision
                        elif textRange.IsDeleteRevision:
                            delRevison = textRange.DeleteRevision
                            delType = delRevison.Type
                            delAuthor = delRevison.Author

                            # Add deletion revision details to the list
                            delete_revisions.append(f"Revision Type: {delType.name}\n")
                            delete_revisions.append(f"Revision Author: {delAuthor}\n")
                            delete_revisions.append(f"Deletion Text: {textRange.Text}\n")

# Write all the insertion revision details to the 'outputFile1' file
WriteAllText(outputFile1, ''.join(insert_revisions))

# Write all the deletion revision details to the 'outputFile2' file
WriteAllText(outputFile2, ''.join(delete_revisions))

# Dispose the document
document.Dispose()

Python: Get Revisions of Word Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

We are glad to announce the release of Spire.PDF 10.6.0. This version enhances the conversion from OFD and HTML to PDF as well as PDF to Excel and images. Moreover, many known issues are fixed successfully in this version, such as the issue that the printing results of PDF files were incorrect. More details are listed below.

Here is a list of changes made in this release

Category ID Description
Bug SPIREPDF-6663 Fixes the issue that "System.ArgumentOutOfRangeException" was thrown when converting OFD to PDF.
Bug SPIREPDF-6678 Fixes the issue that lines were lost after converting PDF to Excel.
Bug SPIREPDF-6713 Fixes the issue that "System.NullReferenceException" was thrown when converting OFD to PDF.
Bug SPIREPDF-6719 Fixes the issue that the effect of font display was incorrect after converting OFD to PDF.
Bug SPIREPDF-6720 Fixes the issue that modifying the annotation text of PdfFreeTextAnnotation did not take effect.
Bug SPIREPDF-6739 Fixes the issue that "Schema namespace URI and prefix mismatch" was thrown when adding a watermark to a PDF file.
Bug SPIREPDF-6740 Fixes the issue that the result was incorrect when printing PDF files.
Bug SPIREPDF-6741 Fixes the issue that the text overlapped after converting PDF to images.
Bug SPIREPDF-6743 Fixes the issue that the pictures were not clear after converting OFD to PDF.
Bug SPIREPDF-6748 Fixes the issue that some contents were lost after drawing HTML contents on PDF.
Bug SPIREPDF-6770 Fixes the issue that the drawn text was lost after setting the transparency of PDF and then converting it to OFD.
Bug SPIREPDF-6771 Fixes the issue that "System.NullReferenceException" was thrown when the PdfDocument object was not released and called again.
Click the link to download Spire.PDF 10.6.0:
More information of Spire.PDF new release or hotfix:

We're pleased to announce the release of Spire.PDF for Java 10.6.0. This version mainly fixes some issues that occurred when converting PDF to SVG, Word, OFD, and also optimizes the time spent on merging PDF files. More details are listed below.

Here is a list of changes made in this release

Category ID Description
Bug SPIREPDF-6601 Optimizes the issue of font naming when converting PDF to SVG.
Bug SPIREPDF-6628 Fixes the issue that the application threw an exception when converting PDF to Word.
Bug SPIREPDF-6676 Optimizes the time-consuming for merging PDF files.
Bug SPIREPDF-6705 Fixes the issue that the PdfInkAnnotation added to PDF was lost after conversion to OFD.
Bug SPIREPDF-6709 Fixes the issue that the PdfInkAnnotation added to PDF was not rendered incorrectly.
Bug SPIREPDF-6747 Fixes the issue that the PdfTextExtractOptions.setExtractHiddenText(false) method didn’t take effect.
Click the link below to download Spire.PDF for Java 10.6.0:
Tuesday, 04 June 2024 01:03

Java: Extract Bookmarks from PDF

Bookmarks in PDF function as interactive table of contents, allowing users to quickly jump to specific sections within the document. Extracting these bookmarks not only provides a comprehensive overview of the document's structure, but also reveals its core parts or key information, providing users with a streamlined and intuitive method of accessing content. In this article, you will learn how to extract PDF bookmarks in Java using Spire.PDF for Java.

Install Spire.PDF for Java

First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>10.10.7</version>
    </dependency>
</dependencies>
    

Extract Bookmarks from PDF in Java

With Spire.PDF for Java, you can create custom methods GetBookmarks() and GetChildBookmark() to get the title and text styles of both parent and child bookmarks in a PDF file, then export them to a TXT file. The following are the detailed steps.

  • Create a PdfDocument instance.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Get bookmarks collection in the PDF file using PdfDocument.getBookmarks() method.
  • Call custom methods GetBookmarks() and GetChildBookmark() to get the text content and text style of parent and child bookmarks.
  • Export the extracted PDF bookmarks to a TXT file.
  • Java
import com.spire.pdf.*;
import com.spire.pdf.bookmarks.*;
import java.io.*;

public class getAllPdfBookmarks {
    public static void main(String[] args) throws IOException{
        //Create a PdfDocument instance
        PdfDocument pdf = new PdfDocument();

        //Load a PDF file
        pdf.loadFromFile("AnnualReport.pdf");

        //Get bookmarks collections of the PDF file
        PdfBookmarkCollection bookmarks = pdf.getBookmarks();

        //Get the contents of bookmarks and save them to a TXT file
        GetBookmarks(bookmarks, "GetPdfBookmarks.txt");

    }

    private static void GetBookmarks(PdfBookmarkCollection bookmarks, String result) throws IOException {
        //create a StringBuilder instance
        StringBuilder content = new StringBuilder();

        //Get parent bookmarks information
        if (bookmarks.getCount() > 0) {
            content.append("Pdf bookmarks:");
            for (int i = 0; i < bookmarks.getCount(); i++) {
                PdfBookmark parentBookmark = bookmarks.get(i);
                content.append(parentBookmark.getTitle() + "\r\n");

                //Get the text style
                String textStyle = parentBookmark.getDisplayStyle().toString();
                content.append(textStyle + "\r\n");
                GetChildBookmark(parentBookmark, content);
            }
        }

        writeStringToTxt(content.toString(),result);
    }

    private static void GetChildBookmark(PdfBookmark parentBookmark, StringBuilder content)
    {
        //Get child bookmarks information
        if (parentBookmark.getCount() > 0)
        {
            content.append("Pdf bookmarks:" + "\r\n");
            for (int i = 0; i < parentBookmark.getCount(); i++)
            {
                PdfBookmark childBookmark = parentBookmark.get(i);
                content.append(childBookmark.getTitle() +"\r\n");

                //Get the text style
                String textStyle = childBookmark.getDisplayStyle().toString();
                content.append(textStyle +"\r\n");
                GetChildBookmark(childBookmark, content);
            }
        }
    }
    public static void writeStringToTxt(String content, String txtFileName) throws IOException {
        FileWriter fWriter = new FileWriter(txtFileName, true);
        try {
            fWriter.write(content);
        } catch (IOException ex) {
            ex.printStackTrace();
        } finally {
            try {
                fWriter.flush();
                fWriter.close();
            } catch (IOException ex) {
                ex.printStackTrace();
            }
        }
    }
}

Java: Extract Bookmarks from PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.