C#: Extract Tables from Word Documents

2024-07-11 01:05:48 Written by  support iceblue
Rate this item
(0 votes)

Tables in Word documents often contain valuable information, ranging from financial data and research results to survey results and statistical records. Extracting the data contained within these tables can unlock a wealth of opportunities, empowering you to leverage it for a variety of purposes, such as in-depth data analysis, trend identification, and seamless integration with other tools or databases. In this article, we will demonstrate how to extract tables from Word documents in C# using Spire.Doc for .NET.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

Extract Tables from Word in C#

In Spire.Doc for .NET, the Section.Tables property is used to access the tables contained within a section of a Word document. This property returns a collection of ITable objects, where each object represents a distinct table in the section. Once you have the ITable objects, you can iterate through their rows and cells, and then retrieve the textual content of each cell using cell.Paragraphs[index].Text property.

The detailed steps to extract tables from a Word document are as follows:

  • Create an object of the Document class and load a Word document using Document.LoadFromFile() method.
  • Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
  • Iterate through the tables in each section and create a string object for each table.
  • Iterate through the rows in each table and the cells in each row, then get the text of each cell through TableCell.Paragraphs[index].Text property and add the cell text to the string.
  • Save each string to a text file.
  • C#
using Spire.Doc;
using Spire.Doc.Collections;
using Spire.Doc.Interface;
using System.IO;
using System.Text;

namespace ExtractWordTable
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create an object of the Document class
            Document doc = new Document();
            // Load a Word document
            doc.LoadFromFile("Tables.docx");

            // Iterate through the sections in the document
            for (int sectionIndex = 0; sectionIndex < doc.Sections.Count; sectionIndex++)
            {
                // Get the current section
                Section section = doc.Sections[sectionIndex];

                // Get the table collection of the section
                TableCollection tables = section.Tables;

                // Iterate through the tables in the section
                for (int tableIndex = 0; tableIndex < tables.Count; tableIndex++)
                {
                    // Get the current table
                    ITable table = tables[tableIndex];

                    // Initialize a string to store the table data
                    string tableData = "";

                    // Iterate through the rows in the table
                    for (int rowIndex = 0; rowIndex < table.Rows.Count; rowIndex++)
                    {
                        // Get the current row
                        TableRow row = table.Rows[rowIndex];
                        // Iterate through the cells in the row
                        for (int cellIndex = 0; cellIndex < row.Cells.Count; cellIndex++)
                        {
                            // Get the current cell
                            TableCell cell = table.Rows[rowIndex].Cells[cellIndex];

                            // Get the text in the cell
                            string cellText = "";
                            for (int paraIndex = 0; paraIndex < cell.Paragraphs.Count; paraIndex++)
                            {
                                cellText += (cell.Paragraphs[paraIndex].Text.Trim() + " ");
                            }

                            // Add the text to the string
                            tableData += cellText.Trim();
                            if (cellIndex < table.Rows[rowIndex].Cells.Count - 1)
                            {
                                tableData += "\t";
                            }
                        }

                        // Add a new line
                        tableData += "\n";
                    }

                    // Save the table data to a text file
                    string filePath = Path.Combine("Tables", $"Section{sectionIndex + 1}_Table{tableIndex + 1}.txt");
                    
                    File.WriteAllText(filePath, tableData, Encoding.UTF8);
                }
            }

            doc.Close();
        }
    }
}

C#: Extract Tables from Word Documents

Extract Tables from Word to Excel in C#

In addition to saving the extracted table data to text files, you can also write the data directly into Excel worksheets by using the Spire.XLS for .NET library. However, before you can use Spire.XLS, you need to install it via NuGet:

Install-Package Spire.XLS

The detailed steps to extract tables from Word documents to Excel worksheets are as follows:

  • Create an object of the Document class and load a Word document using the Document.LoadFromFile() method.
  • Create an object of the Workbook class and clear the default worksheets using Workbook.Worksheets.Clear() method.
  • Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
  • Iterate through the tables in the section and add a worksheet for each table to the workbook using Workbook.Worksheets.Add() method.
  • Iterate through the rows in each table and the cells in each row, then get the text of each cell through TableCell.Paragraphs[index].Text property and write the text to the worksheet using Worksheet.SetCellValue() method.
  • Save the workbook to an Excel file using Workbook.SaveToFile() method.
  • C#
using Spire.Doc;
using Spire.Doc.Interface;
using Spire.Xls;

namespace ExtractWordTableToExcel
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create an object of the Document class
            Document doc = new Document();
            // Load a Word document
            doc.LoadFromFile("Tables.docx");

            // Create an object of the Workbook class
            Workbook wb = new Workbook();
            // Remove the default worksheets
            wb.Worksheets.Clear();

            // Iterate through the sections in the document
            for (int sectionIndex = 0; sectionIndex < doc.Sections.Count; sectionIndex++)
            {
                // Get the current section
                Section section = doc.Sections[sectionIndex];
                // Iterate through the tables in the section
                for (int tableIndex = 0; tableIndex < section.Tables.Count; tableIndex++)
                {
                    // Get the current table
                    ITable table = section.Tables[tableIndex];
                    // Add a worksheet to the workbook
                    Worksheet ws = wb.Worksheets.Add($"Section{sectionIndex + 1}_Table{tableIndex + 1}");

                    // Iterate through the rows in the table
                    for (int rowIndex = 0; rowIndex < table.Rows.Count; rowIndex++)
                    {
                        // Get the current row
                        TableRow row = table.Rows[rowIndex];
                        // Iterate through the cells in the row
                        for (int cellIndex = 0; cellIndex < row.Cells.Count; cellIndex++)
                        {
                            // Get the current cell
                            TableCell cell = row.Cells[cellIndex];
                            // Get the text in the cell
                            string cellText = "";
                            for (int paraIndex = 0; paraIndex < cell.Paragraphs.Count; paraIndex++)
                            {
                                cellText += (cell.Paragraphs[paraIndex].Text.Trim() + " ");
                            }
                            // Write the cell text to the worksheet
                            ws.SetCellValue(rowIndex + 1, cellIndex + 1, cellText);
                        }
                        // Autofit the width of the columns in the worksheet
                        ws.Range.AutoFitColumns();
                    }
                }
            }

            // Save the workbook to an Excel file
            wb.SaveToFile("Tables/WordTableToExcel.xlsx", ExcelVersion.Version2016);
            doc.Close();
            wb.Dispose();
        }
    }
}

C#: Extract Tables from Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

  • tutorial_title:
Last modified on Thursday, 11 July 2024 01:09