Java: Extract Tables from Word Documents

Java: Extract Tables from Word Documents

2025-01-24 06:34:18 Written by  Koohji
Rate this item
(0 votes)

Extracting tables from Word documents is essential for many applications, as they often contain critical data for analysis, reporting, or system integration. By automating this process with Java, developers can create robust applications that seamlessly access this structured data, enabling efficient conversion into alternative formats suitable for databases, spreadsheets, or web-based visualizations. This article will demonstrate how to use Spire.Doc for Java to efficiently extract tables from Word documents in Java programs.

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>13.4.1</version>
    </dependency>
</dependencies>
    

Extract Tables from Word Documents with Java

With Spire.Doc for Java, developers can extract tables from Word documents using the Section.getTables() method. Table data can be accessed by iterating through rows and cells. The process for extracting tables is detailed below:

  • Create a Document object.
  • Load a Word document using the Document.loadFromFile() method.
  • Access the sections in the document using the Document.getSections() method and iterate through them.
  • Access the tables in each section using the Section.getTables() method and iterate through them.
  • Access the rows in each table using the Table.getRows() method and iterate through them.
  • Access the cells in each row using the TableRow.getCells() method and iterate through them.
  • Retrieve text from each cell by iterating through its paragraphs using the TableCell.getParagraphs() and Paragraph.getText() methods.
  • Add the extracted table data to a StringBuilder object.
  • Write the StringBuilder object to a text file or use it as needed.
  • Java
import com.spire.doc.*;
import com.spire.doc.documents.Paragraph;

import java.io.FileWriter;
import java.io.IOException;

public class ExtractWordTable {
    public static void main(String[] args) {
        // Create a Document object
        Document doc = new Document();

        try {
            // Load a Word document
            doc.loadFromFile("Sample.docx");

            // Iterate the sections in the document
            for (int i = 0; i < doc.getSections().getCount(); i++) {
                // Get a section
                Section section = doc.getSections().get(i);
                // Iterate the tables in the section
                for (int j = 0; j < section.getTables().getCount(); j++) {
                    // Get a table
                    Table table = section.getTables().get(j);
                    // Collect all table content
                    StringBuilder tableText = new StringBuilder();
                    for (int k = 0; k < table.getRows().getCount(); k++) {
                        // Get a row
                        TableRow row = table.getRows().get(k);
                        // Iterate the cells in the row
                        StringBuilder rowText = new StringBuilder();
                        for (int l = 0; l < row.getCells().getCount(); l++) {
                            // Get a cell
                            TableCell cell = row.getCells().get(l);
                            // Iterate the paragraphs to get the text in the cell
                            String cellText = "";
                            for (int m = 0; m < cell.getParagraphs().getCount(); m++) {
                                Paragraph paragraph = cell.getParagraphs().get(m);
                                cellText += paragraph.getText() + " ";
                            }
                            if (l < row.getCells().getCount() - 1) {
                                rowText.append(cellText).append("\t");
                            } else {
                                rowText.append(cellText).append("\n");
                            }
                        }
                        tableText.append(rowText);
                    }

                    // Write the table text to a file using try-with-resources
                    try (FileWriter writer = new FileWriter("output/Tables/Section-" + (i + 1) + "-Table-" + (j + 1) + ".txt")) {
                        writer.write(tableText.toString());
                    }
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Extract Word Tables to Text with Java

Extract Tables from Word Documents to Excel Worksheets

Developers can use Spire.Doc for Java with Spire.XLS for Java to extract table data from Word documents and write it to Excel worksheets. To get started, download Spire.XLS for Java or add the following Maven configuration:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.xls</artifactId>
        <version>15.4.0</version>
    </dependency>
</dependencies>
    

The detailed steps for extracting tables from Word documents to Excel workbooks are as follows:

  • Create a Document object.
  • Create a Workbook object and remove the default worksheets using the Workbook.getWorksheets().clear() method.
  • Load a Word document using the Document.loadFromFile() method.
  • Access the sections in the document using the Document.getSections() method and iterate through them.
  • Access the tables in each section using the Section.getTables() method and iterate through them.
  • Create a worksheet for each table using the Workbook.getWorksheets().add() method.
  • Access the rows in each table using the Table.getRows() method and iterate through them.
  • Access the cells in each row using the TableRow.getCells() method and iterate through them.
  • Retrieve text from each cell by iterating through its paragraphs using the TableCell.getParagraphs() and Paragraph.getText() methods.
  • Write the extracted cell text to the corresponding cell in the worksheet using the Worksheet.getRange().get(row, column).setValue() method.
  • Format the worksheet as needed.
  • Save the workbook to an Excel file using the Workbook.saveToFile() method.
  • Java
import com.spire.doc.*;
import com.spire.doc.documents.Paragraph;
import com.spire.xls.FileFormat;
import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;

public class ExtractWordTableToExcel {
    public static void main(String[] args) {
        // Create a Document object
        Document doc = new Document();

        // Create a Workbook object
        Workbook workbook = new Workbook();
        // Remove the default worksheets
        workbook.getWorksheets().clear();

        try {
            // Load a Word document
            doc.loadFromFile("Sample.docx");

            // Iterate the sections in the document
            for (int i = 0; i < doc.getSections().getCount(); i++) {
                // Get a section
                Section section = doc.getSections().get(i);
                // Iterate the tables in the section
                for (int j = 0; j < section.getTables().getCount(); j++) {
                    // Get a table
                    Table table = section.getTables().get(j);
                    // Create a worksheet for each table
                    Worksheet sheet = workbook.getWorksheets().add("Section-" + (i + 1) + "-Table-" + (j + 1));
                    for (int k = 0; k < table.getRows().getCount(); k++) {
                        // Get a row
                        TableRow row = table.getRows().get(k);
                        for (int l = 0; l < row.getCells().getCount(); l++) {
                            // Get a cell
                            TableCell cell = row.getCells().get(l);
                            // Iterate the paragraphs to get the text in the cell
                            String cellText = "";
                            for (int m = 0; m < cell.getParagraphs().getCount(); m++) {
                                Paragraph paragraph = cell.getParagraphs().get(m);
                                if (m > 0 && m < cell.getParagraphs().getCount() - 1) {
                                    cellText += paragraph.getText() + "\n";
                                }
                                else {
                                    cellText += paragraph.getText();
                                }
                                // Write the cell text to the corresponding cell in the worksheet
                                sheet.getRange().get(k + 1, l + 1).setValue(cellText);
                            }
                            // Auto-fit columns
                            sheet.autoFitColumn(l + 1);
                        }
                    }
                }
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
        workbook.saveToFile("output/WordTableToExcel.xlsx", FileFormat.Version2016);
    }
}

Extract Tables from Word Documents to Excel Worksheets with Java

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

  • tutorial_title:
Last modified on Friday, 24 January 2025 06:43