Tables in Word documents often contain valuable information, ranging from financial data and research results to survey results and statistical records. Extracting the data contained within these tables can unlock a wealth of opportunities, empowering you to leverage it for a variety of purposes, such as in-depth data analysis, trend identification, and seamless integration with other tools or databases. In this article, we will demonstrate how to extract tables from Word documents in C# using Spire.Doc for .NET.
Install Spire.Doc for .NET
To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.
PM> Install-Package Spire.Doc
Extract Tables from Word in C#
In Spire.Doc for .NET, the Section.Tables property is used to access the tables contained within a section of a Word document. This property returns a collection of ITable objects, where each object represents a distinct table in the section. Once you have the ITable objects, you can iterate through their rows and cells, and then retrieve the textual content of each cell using cell.Paragraphs[index].Text property.
The detailed steps to extract tables from a Word document are as follows:
- Create an object of the Document class and load a Word document using Document.LoadFromFile() method.
- Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
- Iterate through the tables in each section and create a string object for each table.
- Iterate through the rows in each table and the cells in each row, then get the text of each cell through TableCell.Paragraphs[index].Text property and add the cell text to the string.
- Save each string to a text file.
- C#
using Spire.Doc; using Spire.Doc.Collections; using Spire.Doc.Interface; using System.IO; using System.Text; namespace ExtractWordTable { internal class Program { static void Main(string[] args) { // Create an object of the Document class Document doc = new Document(); // Load a Word document doc.LoadFromFile("Tables.docx"); // Iterate through the sections in the document for (int sectionIndex = 0; sectionIndex < doc.Sections.Count; sectionIndex++) { // Get the current section Section section = doc.Sections[sectionIndex]; // Get the table collection of the section TableCollection tables = section.Tables; // Iterate through the tables in the section for (int tableIndex = 0; tableIndex < tables.Count; tableIndex++) { // Get the current table ITable table = tables[tableIndex]; // Initialize a string to store the table data string tableData = ""; // Iterate through the rows in the table for (int rowIndex = 0; rowIndex < table.Rows.Count; rowIndex++) { // Get the current row TableRow row = table.Rows[rowIndex]; // Iterate through the cells in the row for (int cellIndex = 0; cellIndex < row.Cells.Count; cellIndex++) { // Get the current cell TableCell cell = table.Rows[rowIndex].Cells[cellIndex]; // Get the text in the cell string cellText = ""; for (int paraIndex = 0; paraIndex < cell.Paragraphs.Count; paraIndex++) { cellText += (cell.Paragraphs[paraIndex].Text.Trim() + " "); } // Add the text to the string tableData += cellText.Trim(); if (cellIndex < table.Rows[rowIndex].Cells.Count - 1) { tableData += "\t"; } } // Add a new line tableData += "\n"; } // Save the table data to a text file string filePath = Path.Combine("Tables", $"Section{sectionIndex + 1}_Table{tableIndex + 1}.txt"); File.WriteAllText(filePath, tableData, Encoding.UTF8); } } doc.Close(); } } }
Extract Tables from Word to Excel in C#
In addition to saving the extracted table data to text files, you can also write the data directly into Excel worksheets by using the Spire.XLS for .NET library. However, before you can use Spire.XLS, you need to install it via NuGet:
Install-Package Spire.XLS
The detailed steps to extract tables from Word documents to Excel worksheets are as follows:
- Create an object of the Document class and load a Word document using the Document.LoadFromFile() method.
- Create an object of the Workbook class and clear the default worksheets using Workbook.Worksheets.Clear() method.
- Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
- Iterate through the tables in the section and add a worksheet for each table to the workbook using Workbook.Worksheets.Add() method.
- Iterate through the rows in each table and the cells in each row, then get the text of each cell through TableCell.Paragraphs[index].Text property and write the text to the worksheet using Worksheet.SetCellValue() method.
- Save the workbook to an Excel file using Workbook.SaveToFile() method.
- C#
using Spire.Doc; using Spire.Doc.Interface; using Spire.Xls; namespace ExtractWordTableToExcel { internal class Program { static void Main(string[] args) { // Create an object of the Document class Document doc = new Document(); // Load a Word document doc.LoadFromFile("Tables.docx"); // Create an object of the Workbook class Workbook wb = new Workbook(); // Remove the default worksheets wb.Worksheets.Clear(); // Iterate through the sections in the document for (int sectionIndex = 0; sectionIndex < doc.Sections.Count; sectionIndex++) { // Get the current section Section section = doc.Sections[sectionIndex]; // Iterate through the tables in the section for (int tableIndex = 0; tableIndex < section.Tables.Count; tableIndex++) { // Get the current table ITable table = section.Tables[tableIndex]; // Add a worksheet to the workbook Worksheet ws = wb.Worksheets.Add($"Section{sectionIndex + 1}_Table{tableIndex + 1}"); // Iterate through the rows in the table for (int rowIndex = 0; rowIndex < table.Rows.Count; rowIndex++) { // Get the current row TableRow row = table.Rows[rowIndex]; // Iterate through the cells in the row for (int cellIndex = 0; cellIndex < row.Cells.Count; cellIndex++) { // Get the current cell TableCell cell = row.Cells[cellIndex]; // Get the text in the cell string cellText = ""; for (int paraIndex = 0; paraIndex < cell.Paragraphs.Count; paraIndex++) { cellText += (cell.Paragraphs[paraIndex].Text.Trim() + " "); } // Write the cell text to the worksheet ws.SetCellValue(rowIndex + 1, cellIndex + 1, cellText); } // Autofit the width of the columns in the worksheet ws.Range.AutoFitColumns(); } } } // Save the workbook to an Excel file wb.SaveToFile("Tables/WordTableToExcel.xlsx", ExcelVersion.Version2016); doc.Close(); wb.Dispose(); } } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.