Tuesday, 03 September 2024 00:56

Python: Edit or Modify a Word Document

Programmatic editing of Word documents involves using code to alter or modify the contents of these documents. This approach enables automation and customization, making it particularly advantageous for handling large document collections. Through the use of Spire.Doc library, developers can perform a wide range of operations, including text manipulation, formatting changes, and the addition of images or tables.

The following sections will demonstrate how to edit or modify a Word document in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Modify Text in a Word Document in Python

In order to alter the content of a paragraph, the initial step is to obtain the desired paragraph from a specific section through the use of the Section.Paragraphs[index] property. Following this, you can replace the existing text with the new content by assigning it to the Paragraph.Text property of the chosen paragraph.

Here are the steps to edit text in a Word document with Python:

  • Create a Document object.
  • Load a Word file from the given file path.
  • Get a specific section using Document.Sections[index] property.
  • Get a specific paragraph using Section.Paragraphs[index] property.
  • Reset the text of the paragraph using Paragraph.Text property.
  • Save the updated document to a different Word file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load an existing Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx");

# Get a specific section
section = document.Sections[0]

# Get a specific paragraph
paragraph = section.Paragraphs[0]

# Modify the text of the paragraph 
paragraph.Text = "The text has been modified"

# Save the document to a different Word file
document.SaveToFile("output/ModifyText.docx", FileFormat.Docx)

# Dispose resource
document.Dispose()

Python: Edit or Modify a Word Document

Change Formatting of Text in a Word Document in Python

To alter the text appearance of a particular paragraph, you first need to obtain the specified paragraph. Next, go through its child objects to find the individual text ranges. The formatting of each text range can then be updated using the TextRange.CharacterFormat property.

The steps to change text formatting in a Word document are as follows:

  • Create a Document object.
  • Load a Word file from the given file path.
  • Get a specific section using Document.Sections[index] property.
  • Get a specific paragraph using Section.Paragraphs[index] property.
  • Iterate through the child objects in the paragraph.
    • Determine if a child object is a text range.
    • Get a specific text range.
    • Reset the text formatting using TextRange.CharacterFormat property.
  • Save the updated document to a different Word file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document
doc = Document()

# Load a Word document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Get a specific section
section = doc.Sections[0]

# Get a specific paragraph
paragraph = section.Paragraphs[1];

# Iterate through the child objects in the paragraph
for i in range(paragraph.ChildObjects.Count):
    
    # Determine if a child object is text range
    if isinstance(paragraph.ChildObjects[i], TextRange):

        # Get a specific text range
        textRange = paragraph.ChildObjects[i]

        # Reset font name
        textRange.CharacterFormat.FontName = "Corbel Light"

        # Reset font size
        textRange.CharacterFormat.FontSize = 11.0

        # Reset text color
        textRange.CharacterFormat.TextColor = Color.get_Blue()

        # Apply italic to the text range 
        textRange.CharacterFormat.Italic = True

# Save the document to a different Word file
doc.SaveToFile("output/ChangeFormatting.docx", FileFormat.Docx2019)

# Dispose resource
doc.Dispose()

Python: Edit or Modify a Word Document

Add New Elements to a Word Document in Python

In a Word document, most elements—such as text, images, lists, and charts—are fundamentally organized around the concept of a paragraph. To insert a new paragraph into a specific section, use the Section.AddParagraph() method.

After creating the new paragraph, you can add various elements to it by leveraging the methods and properties of the Paragraph object.

The steps to add new elements (text and images) to a Word document are as follows:

  • Create a Document object.
  • Load a Word file from the given file path.
  • Get a specific section through Document.Sections[index] property.
  • Add a paragraph to the section using Section.AddParagraph() method.
  • Add text to the paragraph using Paragraph.AppendText() method.
  • Add an image to the paragraph using Paragraph.AppendPicture() method.
  • Save the updated document to a different Word file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document
doc = Document()

# Load a Word document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.docx")

# Get the last section
lastSection = doc.LastSection

# Add a paragraph to the section
paragraph = lastSection.AddParagraph()

# Add an image to the paragraph
picture = paragraph.AppendPicture("C:\\Users\\Administrator\\Desktop\\logo.png");

# Set text wrap style 
picture.TextWrappingStyle = TextWrappingStyle.TopAndBottom

# Add text to the paragraph
paragraph.AppendText("This text and the image above are added by Spire.Doc for Python.")

# Create a paragraph style
style = ParagraphStyle(doc)
style.Name = "FontStyle"
style.CharacterFormat.FontName = "Times New Roman"
style.CharacterFormat.FontSize = 12
doc.Styles.Add(style)

# Apply the style to the paragraph
paragraph.ApplyStyle(style.Name)

# Save the document to a different Word file
doc.SaveToFile("output/AddNewElements.docx", FileFormat.Docx2019)

# Dispose resource
doc.Dispose()

Python: Edit or Modify a Word Document

Remove Paragraphs from a Word Document in Python

To eliminate a specific paragraph from a document, simply invoke the ParagraphCollection.RemoveAt() method and supply the index of the paragraph you intend to delete.

The steps to remove paragraphs from a Word document are as follows:

  • Create a Document object.
  • Load a Word file from the given file path.
  • Get a specific section through Document.Sections[index] property.
  • Remove a specific paragraph from the section using Section.Paragraphs.RemoveAt() method.
  • Save the updated document to a different Word file.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document
doc = Document()

# Load a Word document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Get a specific section
section = doc.Sections[0]

# Remove a specific paragraph
section.Paragraphs.RemoveAt(0)

# Save the document to a different Word file
doc.SaveToFile("output/RemoveParagraph.docx", FileFormat.Docx);

# Dispose resource
doc.Dispose()

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Monday, 02 September 2024 09:02

Spire.XLS for JavaScript is Now Available

In a significant stride towards enhancing the capabilities of web developers, Spire.XLS for JavaScript, a powerful Excel development library, has been released. This cutting-edge solution empowers developers to seamlessly integrate Excel functionalities into their JavaScript applications, simplifying the way data is managed and manipulated within web environments.

Spire.XLS for JavaScript

Spire.XLS for JavaScript is designed to streamline Excel programming tasks, enabling developers to create, read, modify, and convert Excel files directly within their JavaScript environment. As a robust JavaScript Excel API, it allows seamless compatibility with popular frameworks such as Vue, React, Angular, and plain JavaScript.

To learn more about Spire.XLS for JavaScript and its capabilities, click to visit:

Contact Us

Monday, 02 September 2024 00:51

Python: Remove Forms from a PDF Document

Interactive forms in PDFs are valuable tools that allow users to fill in information, complete surveys, or sign documents electronically. However, these forms can also add layers of complexity to a PDF, impacting both file size and the overall user experience. When forms are no longer needed, or when a document needs to be simplified for distribution or archiving, removing these interactive elements can be beneficial. In this article, we will demonstrate how to remove forms from a PDF document in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Remove a Specific Form from a PDF Document in Python

Spire.PDF for Python allows you to remove specific form fields from a PDF file by using either the indexes or the names of the form fields. The detailed steps are as follows.

  • Create an instance of the PdfDocument class.
  • Load a PDF document containing form fields using the PdfDocument.LoadFromFile() method.
  • Get the form of the document using the PdfDocument.Form property.
  • Get the form field collection using the PdfFormWidget.FieldsWidget property.
  • Remove a specific form field by its index using the PdfFormFieldWidgetCollection.RemoveAt(index) method. Or retrieve a form field by its name using the PdfFormFieldWidgetCollection[name] property, and then remove it using the PdfFormFieldWidgetCollection.Remove(field) method.
  • Save the resulting document using the PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create an instance of the PdfDocument class
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile("Forms.pdf")

# Get the form of the document
pdfForm = doc.Form
formWidget = PdfFormWidget(pdfForm)

# Get the form field collection
field_collection = formWidget.FieldsWidget

# Remove a specific form field by its index
field_collection.RemoveAt(0)

# Or remove a specific form field by its name
# text_box = field_collection["Name"]
# field_collection.Remove(text_box)

# Save the resulting document
doc.SaveToFile("remove_specific_form.pdf")
doc.Close()

Python: Remove Forms from a PDF Document

Remove All Forms from a PDF Document in Python

To remove all form fields from a PDF document, you need to iterate through the form field collection, and then remove each form field from the collection using the PdfFormFieldWidgetCollection.RemoveAt(index) method. The detailed steps are as follows.

  • Create an instance of the PdfDocument class.
  • Load a PDF document containing form fields using the PdfDocument.LoadFromFile() method.
  • Get the form of the document using the PdfDocument.Form property.
  • Get the form field collection using the PdfFormWidget.FieldsWidget property.
  • Iterate through all form fields in the collection.
  • Remove each form field from the collection using the PdfFormFieldWidgetCollection.RemoveAt(index) method.
  • Save the resulting document using the PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create an instance of the PdfDocument class
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile("Forms.pdf")

# Get the form of the document
pdfForm = doc.Form
formWidget = PdfFormWidget(pdfForm)

# Get the form field collection
field_collection = formWidget.FieldsWidget

# Check if there are any form fields in the collection
if field_collection.Count > 0:
    # Iterate through all form fields in the collection
    for i in range(field_collection.Count - 1, -1, -1):
        # Remove the current form field from the collection
        field_collection.RemoveAt(i)

# Save the resulting document
doc.SaveToFile("remove_all_forms.pdf")
doc.Close()

Python: Remove Forms from a PDF Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Monday, 02 September 2024 01:15

Spire.Office 9.8.0 is released

We are excited to announce the release of Spire.Office 9.8.0. In this version, Spire.PDF supports converting HTML to PDF using the Chrome; Spire.XLS supports enabling revision mode and obtaining custom properties of worksheets; Spire.Presentation supports converting PowerPoint documents to Markdown files; Spire.OCR supports configuring OCR models, languages, and dependency libraries. Besides, a lot of known issues are fixed successfully in this version. More details are listed below.

In this version, the most recent versions of Spire.Doc, Spire.PDF, Spire.XLS, Spire.Presentation, Spire.Email, Spire.DocViewer, Spire.PDFViewer, Spire.Spreadsheet, Spire.OfficeViewer, Spire.DataExport, and Spire.Barcode are included.

DLL Versions:

  • Spire.Doc.dll v12.8.12
  • Spire.Pdf.dll v10.8.1
  • Spire.XLS.dll v14.8.2
  • Spire.Presentation.dll v9.8.3
  • Spire.Barcode.dll v7.3.3
  • Spire.Email.dll v6.6.0
  • Spire.DocViewer.Forms.dll v8.8.1
  • Spire.PdfViewer.Asp.dll v7.12.23
  • Spire.PdfViewer.Forms.dll v7.12.23
  • Spire.Spreadsheet.dll v7.5.2
  • Spire.OfficeViewer.Forms.dll v8.7.15
  • Spire.DataExport.dll 4.9.0
  • Spire.DataExport.ResourceMgr.dll v2.1.0
Click the link to get the version Spire.Office 9.8.0:
*More information of Spire.Office new release or hotfix:

Here is a list of changes made in this release

Spire.Doc

Category ID Description
Bug SPIREDOC-10644 Fixes the issue that the image data was failed to be filled during mail merge.
Bug SPIREDOC-10219 Fixes the issue that extra blank pages are generated when converting Word to OFD.
Bug SPIREDOC-10365 Fixes the issue that the VBA was lost after saving Docm.
Bug SPIREDOC-10389 Fixes the issue that the application threw the exception "Wrong Word version" when loading Word documents encrypted using the WPS tool.
Bug SPIREDOC-10425 Fixes the issue that the application threw the exception 'System. ArgumentNullException' when converting Word to PDF.
Bug SPIREDOC-10456 Fixes the issue that the borderless text boxes added using the WPS tool were not successfully parsed.
Bug SPIREDOC-10554 Fixes the issue that the images didn't display when adding HTML that contains the image paths starting with https.
Bug SPIREDOC-10587 Fixes the issue that the application threw the exception "Unknown boolean value" when converting Word to PDF.
Bug SPIREDOC-10738 Fixes the issue that the customXml appeared after replacing text.
Bug SPIREDOC-10743 Optimizes the file size for converting Word to OFD.

Spire.XLS

Category ID Description
New feature SPIREXLS-5274 Supports obtaining custom properties of worksheets.
Workbook workbook = new Workbook ();
workbook.LoadFromFile("funds-test.xlsx");
ICustomPropertiesCollection customProperties = workbook.Worksheets[0].CustomProperties;
for (int i = 0; i < customProperties.Count; i++)
{     
XlsCustomProperty xcp = customProperties[i];     
string name = xcp.Name;    
 string value = xcp.Value; 
}
New feature SPIREXLS-5306 Supports obtaining the original document name of the embedded OLE object.
ole.OleOriginName
New feature SPIREXLS-5254 Supports enabling revision mode.
Workbook.TrackedChanges=true;//default value is false
New feature SPIREXLS-5348 Supports setting global custom font folders.
Workbook.SetGlobalCustomFontsFolders(string[] fontPath);
Bug SPIREXLS-5271 Fixes the issue that the cell data was inaccurate when converting Excel to images.
Bug SPIREXLS-5298 Fixes the issue that the cell contents were lost when converting Excel to PDF.
Bug SPIREXLS-5304 Fixes the issue that the row height was incorrect when converting Excel to PDF.
Bug SPIREXLS-5309 Fixes the issue that the pagination was incorrect when converting Excel to PDF.
Bug SPIREXLS-5320 Fixes the issue that the first 10 data entries were incorrect after adding a "FilterTop10" filter in a worksheet.
Bug SPIREXLS-5196 Fixes the issue that the checkbox was not converted to image format when converting Excel to PDF.
Bug SPIREXLS-5305 Fixes the issue that multiple calls to the ApplyStyleToRange method in pivot tables result in incorrect results.
Bug SPIREXLS-5308 Fixes the issue that setting the color transparency of rich text did not take effect.
Bug SPIREXLS-5317 Fixes the issue that the Filter formula is calculated incorrectly.
Bug SPIREXLS-5330 Fixes the issue that pivot table column names are incorrect when converting Excel to images.
Bug SPIREXLS-5345 Fixes the issue that the text location is incorrect when converting Excel to PDF.
Bug SPIREXLS-5349 Fixes the issue that ChartSheet retrieval is incorrect.
Bug SPIREXLS-5352 Fixes the issue that cell content is incorrect when converting Excel to HTML or images.

Spire.PDF

Category ID Description
New feature SPIREPDF-5742 Adds the PreserveAllowedMetadata property to support preserving XMP data when converting PDF to PDFA format documents.
PdfStandardsConverter converter = new PdfStandardsConverter(stream);
converter.Options.PreserveAllowedMetadata = true;
New feature - Supports converting HTML to PDF using the Chrome plugin.

Save the resulting PDF document to a file:
string chromeLocation = baseDirectory + "chrome\\Chrome-bin\\chrome.exe";
ChromeHtmlConverter converter = new ChromeHtmlConverter(chromeLocation);
ConvertOptions options = new ConvertOptions();
options.Timeout = 10 * 1000;
options.PageSettings = new PageSettings()
{
    PaperWidth = 8.77,
    PaperHeight = 6.20,
    MarginBottom = 0,
    MarginTop = 0,
    MarginLeft = 0,
    MarginRight = 0
    };
converter.ConvertToPdf("https://www.e-iceblue.com/", outputFile, options);
Save the resulting PDF document to a stream:
var pdfAsStream = new MemoryStream();
string chromeLocation = baseDirectory + "chrome\\Chrome-bin\\chrome.exe";
ChromeHtmlConverter converter = new ChromeHtmlConverter(chromeLocation);
ConvertOptions options = new ConvertOptions();
options.Timeout = 10 * 1000;
options.PageSettings = new PageSettings()
{
    PaperWidth = 8.77,
    PaperHeight = 6.2,
    MarginBottom = 0,
    MarginTop = 0,
    MarginLeft = 0,
    MarginRight = 0
    };
converter.ConvertToPdf("https://www.e-iceblue.com/", pdfAsStream, options);
New feature SPIREPDF-6820 Optimizes the layout of Excel documents converted from PDF documents.
Use the new code:
pdfDocument.ConvertOptions.SetPdfToXlsxOptions(new XlsxLineLayoutOptions(true, false, false)); //bool convertToMultipleSheet, bool rotatedText, bool splitCell
Optimization SPIREPDF-5744 Optimizes the time consumption of PDF to image conversion.
Bug SPIREPDF-6789 Fixes the issue that the program reported an error when printing PDF documents.
Bug SPIREPDF-6806 Fixes the issue that images in PDF documents converted from XPS documents were not displayed in the PDF.js viewer.
Bug SPIREPDF-6848 Fixes the issue that the border was lost after filling the text field with content and opening the document in Google Chrome or previewing it in Adobe when printing.
Bug SPIREPDF-6896 Fixes the issue that the program threw System.NullReferenceException when converting a PDF document to a PDFA1B document.
Bug SPIREPDF-6898
SPIREPDF-6899
Fixes the issue that the program threw System.NullReferenceException when extracting text from PDF documents.
Bug SPIREPDF-6906 Fixes the issue that the program threw System.ArgumentNullException when converting PDF documents to images.
Bug SPIREPDF-6832 Fixes the issue that verification of signature validity was inaccurate.
Bug SPIREPDF-6836 Fixes the issue that removing the PdfWatermarkAnnotationWidget object failed.
Bug SPIREPDF-6875 Fixes the issue that printing PDF to B5 paper size was incorrect.
Bug SPIREPDF-2659
SPIREPDF-4454
Fixes the issue that importing and exporting form data (in FDF, XFDF, and XML formats) was incorrect.
Bug SPIREPDF-6797 Fixes the issue that highlighting covered the text after converting PDF to images.
Bug SPIREPDF-6896 Fixes the issue that the program threw an exception System.NullReferenceException when converting an OFD document to a PDF document.
Bug SPIREPDF-6908 Fixes the issue that the contents were blank after converting a PDF document to images.
Bug SPIREPDF-6910 Fixes the issue that it failed to get the action script of annotations.
Bug SPIREPDF-6922 Fixes the issue that the program threw an exception System.ArgumentException when importing FDF file data to PDF.
Bug SPIREPDF-6925 Fixes the issue that spaces were lost when copying content to Notepad after adding content to a PDF document using the PdfTaggedContent interface.
Bug SPIREPDF-6929
SPIREPDF-6940
Fixes the issue that the program threw an exception System.OutOfMemoryException when converting a PDF document to images.
Bug SPIREPDF-6941 Fixes the issue that signatures were reversed by mirroring after flattening form fields.
Bug SPIREPDF-6949 Fixes the issue that the program threw an exception System.NullReferenceException when converting a PDF document to an Excel document.
Bug SPIREPDF-6968 Fixes the issue that the program threw an exception System.NullReferenceException when loading a PDF document.

Spire.Presentation

Category ID Description
New feature - Supports converting PowerPoint documents to Markdown files.
Presentation ppt = new Presentation();
ppt.LoadFromFile("1.pptx");
ppt.SaveToFile("1.md", FileFormat.Markdown);
ppt.Dispose();
Bug SPIREPPT-2522 Fixes the issue that the waterfall chart is displayed incorrectly after modifying its data.
Bug SPIREPPT-2534 Fixes the issue that the program threw System.ArgumentException when setting document property "_MarkAsFinal".
Bug SPIREPPT-2535 Fixes the issue that the tilt angle of text was lost after converting slides to pictures.
Bug SPIREPPT-2547 Fixes the issue that the application threw the exception "System.ArgumentOutOfRangeException" when copying slides to a new PowerPoint presentation.
Bug SPIREPPT-2549 Fixes the issue that the message "PowerPoint found a problem with content" prompted when opening the resulting file generated after copying slides to a new PowerPoint presentation.

Spire.OCR

Category ID Description
New feature - Adds the ConfigureOptions class and new method ConfigureDependencies(ConfigureOptions configureOptions), which supports configuring OCR models, languages, and dependency libraries.
OcrScanner scanner = new OcrScanner();

// Create a new instance of the ConfigureOptions class to set up the scanner configuration
ConfigureOptions configureOptions = new ConfigureOptions();

// Set the path to the model
configureOptions.ModelPath = @"D:\Model&Lib\Model\V4.Server";

// Set the path to the dependency libraries
configureOptions.LibPath = @"D:\Model&Lib\Lib\V2.7\Net\X64";

// Specify the language
configureOptions.Language = "Chinese";

// Apply the configuration options to the scanner using the ConfigureDependencies method
scanner.ConfigureDependencies(configureOptions);

scanner.Scan("test.tif");

Spire.OfficeViewer

Category ID Description
Bug SPIREPPT-2502 Fixes the issue that the warning watermark of presentations cannot be removed after applying a valid license.

We are pleased to announce the release of Spire.Presentation for Java 9.8.3. This version provides a new method to set format when replacing text, and also fixes some known issues occurred when loading PowerPoint files, adding formulas in table cells, and adding LatexMath. More details are listed below.

Here is a list of changes made in this release

Category ID Description
New feature SPIREPPT-2579 Provides the ReplaceAndFormatText(String matchedString, String newValue, PortionFormatEx format) method to support setting format when replacing text.
Presentation ppt = new Presentation();
// Load a PowerPoint presentation from the specified file.
ppt.loadFromFile(inputFile);
// Create a new object to store the default text range formatting properties.
PortionFormatEx format = new PortionFormatEx();
// Set the IsBold property of the text range formatting to true, making the text bold.
format.isBold(TriState.TRUE);
// Set the FillType property of the text range fill to Solid, indicating a solid fill color.
format.getFill().setFillType(FillFormatType.SOLID);
// Set the Color property of the solid fill color to red.
format.getFill().getSolidColor().setColor(Color.red);
// Set the FontHeight property of the text range formatting to 25, indicating the font size.
format.setFontHeight(25);
format.isBold(TriState.TRUE);

// Replace all occurrences of the text "Spire.Presentation for .NET" with "Spire.PPT" and apply the specified formatting.
ppt.ReplaceAndFormatText("Spire.Presentation for .NET", "Spire.PPT", format);

// Save the modified presentation to the specified output file in the PPTX format compatible with PowerPoint 2016.
ppt.saveToFile(outputFile, FileFormat.PPTX_2016);
// Dispose of the Presentation object to free up resources
ppt.dispose();
Bug SPIREPPT-2544 Fixes the issue that the content did not fit automatically after changing the page orientation.
Bug SPIREPPT-2562 Fixes the issue that the formulas were incorrect when adding LatexMath.
Bug SPIREPPT-2577 Fixes the issue where Chinese characters in formula were displayed as "x".
Bug SPIREPPT-2589 Fixes the issue that the application threw the exception "java.lang.ClassCastException" when adding formulas in table cells.
Bug SPIREPPT-2593 Fixes the issue that the application threw the exception "OutOfMemory" when loading PowerPoint files.
Click the link below to download Spire.Presentation for Java 9.8.3:
Friday, 30 August 2024 01:03

Java: Convert HTML to Excel

HTML files often contain valuable datasets embedded within tables. However, analyzing this data directly in HTML can be cumbersome and inefficient. Converting HTML tables to Excel format allows you to take advantage of Excel's powerful data manipulation and analysis tools, making it easier to sort, filter, and visualize the information. Whether you need to analyze data for a report, perform calculations, or simply organize it in a more user-friendly format, converting HTML to Excel streamlines the process. In this article, we will demonstrate how to convert HTML files to Excel format in Java using Spire.XLS for Java.

Install Spire.XLS for Java

First of all, you're required to add the Spire.Xls.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.xls</artifactId>
        <version>14.11.0</version>
    </dependency>
</dependencies>
    

Convert HTML to Excel in Java

Spire.XLS for Java provides the Workbook.loadFromHtml() method for loading an HTML file. Once the HTML file is loaded, you can convert it to Excel format using the Workbook.saveToFile() method. The detailed steps are as follows.

  • Create an object of the Workbook class.
  • Load an HTML file using the Workbook.loadFromHtml() method.
  • Save the HTML file in Excel format using the Workbook.saveToFile() method.
  • Java
import com.spire.xls.ExcelVersion;
import com.spire.xls.Workbook;

public class ConvertHtmlToExcel {
    public static void main(String[] args) {
        // Specify the input HTML file path
        String filePath = "C:\\Users\\Administrator\\Desktop\\Sample.html";

        // Create an object of the workbook class
        Workbook workbook = new Workbook();
        // Load the HTML file
        workbook.loadFromHtml(filePath);

        // Save the HTML file in Excel XLSX format
        String result = "C:\\Users\\Administrator\\Desktop\\ToExcel.xlsx";
        workbook.saveToFile(result, ExcelVersion.Version2013);

        workbook.dispose();
    }
}

Java: Convert HTML to Excel

Insert HTML String into Excel in Java

In addition to converting HTML files to Excel, Spire.XLS for Java allows you to insert HTML strings directly into Excel cells using the CellRange.setHtmlString() method. The detailed steps are as follows.

  • Create an object of the Workbook class.
  • Get a specific worksheet by its index (0-based) using the Workbook.getWorksheets().get(index) method.
  • Get the cell that you want to add an HTML string to using the Worksheet.getCellRange() method.
  • Add an HTML sting to the cell using the CellRange.setHtmlString() method.
  • Save the resulting workbook to a new file using the Workbook.saveToFile() method.
  • Java
import com.spire.xls.CellRange;
import com.spire.xls.ExcelVersion;
import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;

public class InsertHtmlStringInExcelCell {
    public static void main(String[] args) {
        // Create an object of the workbook class
        Workbook workbook = new Workbook();
        // Get the first sheet
        Worksheet sheet = workbook.getWorksheets().get(0);

        // Specify the HTML string
        String htmlCode = "<p><font size='12'>This is a <b>paragraph</b> with <span style='color: red;'>colored text</span>.</font></p>";

        // Get the cell that you want to add the HTML string to
        CellRange range = sheet.getCellRange("A1");
        // Add the HTML string to the cell
        range.setHtmlString(htmlCode);

        // Auto-adjust the width of the first column based on its content
        sheet.autoFitColumn(1);

        // Save the resulting workbook to a new file
        String result = "C:\\Users\\Administrator\\Desktop\\InsertHtmlStringIntoCell.xlsx";
        workbook.saveToFile(result, ExcelVersion.Version2013);

        workbook.dispose();
    }
}

Java: Convert HTML to Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Thursday, 29 August 2024 01:01

Python: Create Actions in PDF Documents

One powerful feature that enhances the interactivity and utility of PDF documents is the actions in these documents. By embedding actions such as document jumping, navigation controls, or even media playing, users can transform static documents into dynamic tools that streamline workflows, improve user engagement, and automate routine tasks, making the use of PDFs more efficient and versatile than ever before. This article will show how to use Spire.PDF for Python to create actions in PDF documents with Python code effortlessly.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to: How to Install Spire.PDF for Python on Windows

Create a Navigation Action in PDF with Python

A navigation button is an action that allows users to jump to a specified position on a designated page within a document. Developers can create a PdfDestination object, use it to create a PdfGoToAction, and then create an annotation based on this object and add it to the page to complete the creation of the navigation button. The following are the detailed steps:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Create a PdfDestination object and set its property.
  • Create a PdfGoToAction object based on the destination.
  • Draw a rectangle on a page using PdfPageBase.Canvas.DrawRectangle() method.
  • Create a PdfActionAnnotation object based on the action and add it to the page using PdfPageBase.Annotations.Add() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *

# Create an instance of PdfDocument class and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Create a PdfDestination instance and set its properties
destination = PdfDestination(pdf.Pages[0])
destination.Location = PointF(0.0, 0.0)
destination.Mode = PdfDestinationMode.Location
destination.Zoom = 0.8

# Create a rectangle
rect = RectangleF.FromLTRB(70, pdf.PageSettings.Size.Height - 120, 140, pdf.PageSettings.Size.Height - 100)

# Create a PdfGoToAction instance
action = PdfGoToAction(destination)

# Draw a rectangle on the second page
pdf.Pages.get_Item(1).Canvas.DrawRectangle(PdfBrushes.get_LightGray(), rect)

# Draw text of the button
font = PdfFont(PdfFontFamily.TimesRoman, 14.0)
stringFormat = PdfStringFormat(PdfTextAlignment.Center)
pdf.Pages.get_Item(1).Canvas.DrawString("To Page 1", font, PdfBrushes.get_Green(), rect, stringFormat)

# Create a PdfActionAnnotation instance
annotation = PdfActionAnnotation(rect, action)

# Add the annotation to the second page
pdf.Pages.get_Item(1).Annotations.Add(annotation)

# Save the document
pdf.SaveToFile("output/AddPDFNavigationButton.pdf")
pdf.Close()

Python: Create Actions in PDF Documents

Create a Sound Action in PDF with Python

Developers can embed audio as actions in PDF documents, which allows the audio to play when the user performs a specified action, such as playing when the file opens or when a button is clicked. The following are the steps for creating a sound action:

  • Create an instance of PdfDocument class.
  • Load a PDF document using PdfDocument.LoadFromFile() method.
  • Create an instance of PdfSoundAction class with an audio file.
  • Set the audio parameters through properties under PdfSound class.
  • Set the playing parameters through properties under PdfSoundAction class.
  • Get a page using PdfDocument.Pgaes.get_Item(() method.
  • Draw an image on the page using PdfPageBase.Canvas.Draw() method.
  • Create a PdfActionAnnotation object with the sound action at the location of the image.
  • Add the annotation to the page
  • Or you can only set the sound action as the action performed after the document is opened through PdfDocument.AfterOpenAction property. This doesn’t need to add it as an annotation on a PDF page.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *

# Create a PdfDocument instance and load a PDF file
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Get the first page of the document
page = pdf.Pages.get_Item(0)

# Create an instance of PdfSoundAction with the sound file path
soundAction = PdfSoundAction("Wave.wav")

# Set the audio parameters
soundAction.Sound.Bits = 16
soundAction.Sound.Channels = PdfSoundChannels.Stereo
soundAction.Sound.Encoding = PdfSoundEncoding.Signed
soundAction.Sound.Rate = 44100

# Set the playing parameters
soundAction.Volume = 0.5
soundAction.Repeat = True
soundAction.Mix = True
soundAction.Synchronous = False

# Draw an image on the page
image = PdfImage.FromFile("Sound.png")
page.Canvas.DrawImage(image, PointF(30.0, 30.0))

# Create an instance of PdfActionAnnotation with the sound action
rect = RectangleF.FromLTRB(30.0, 30.0, image.GetBounds().Width + 30.0, image.GetBounds().Height + 30.0)
annotation = PdfActionAnnotation(rect, soundAction)

# Add the annotation to the page
page.Annotations.Add(annotation)

# Set the sound action to play after the document is opened
# pdf.AfterOpenAction = soundAction

# Save the document
pdf.SaveToFile("output/AddMusicPDF.pdf")
pdf.Close()

Python: Create Actions in PDF Documents

Create a File Open Action in PDF with Python

The PdfLaunchAction class represents a file open action in PDF that allows users to open the corresponding file by clicking on a button on a PDF page. Developers can specify the absolute or relative path of the file to be opened and whether to open in a new window when creating a file open action. The detailed steps for creating a file open action in a PDF document are as follows:

  • Create an object of PdfDocument class and load a PDF document using PdfDocument.LoadFromFile() method.
  • Get a page of the document using PdfDocument.Pages.get_Item() method.
  • Draw a rectangle on the page using PdfPageBase.Canvas.DrawRectangle() method.
  • Create an object of PdfLaunchAction class and specify the file path and path type.
  • Set the opening mode to new window through PdfLaunchAction.IsNewWindow property.
  • Create an object of PdfActionAnnotation class based on the action and set its color through PdfActionAnnotation.Color property.
  • Add the annotation to the page using PdfPageBase.Annotations.Add() method.
  • Save the document using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf import *

# Create an instance of PdfDocument class
pdf = PdfDocument()

# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Get the first page of the document
page = pdf.Pages.get_Item(0)

# Draw a rectangle on the page
rect = RectangleF.FromLTRB(50, pdf.PageSettings.Size.Height - 100, 200, pdf.PageSettings.Size.Height - 80)
page.Canvas.DrawRectangle(PdfPens.get_LightGray(), rect)

# Draw text in the rectangle
page.Canvas.DrawString("Click to open Sample 2", PdfFont(PdfFontFamily.Helvetica, 14.0), PdfBrushes.get_Green(), rect, PdfStringFormat(PdfTextAlignment.Center))

# Create a PdfLaunchAction object
action = PdfLaunchAction("Sample2.pdf", PdfFilePathType.Relative)
action.IsNewWindow = True

# Create a PdfActionAnnotation object based on the action
annotation = PdfActionAnnotation(rect, action)
annotation.Color = PdfRGBColor(Color.get_Blue())

# Add the annotation to the page
page.Annotations.Add(annotation)

# Save the document
pdf.SaveToFile("output/CreatePDFLaunchAction.pdf")
pdf.Close()

Python: Create Actions in PDF Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Transferring content between Microsoft Word documents is a frequent task for many users. Whether you need to consolidate information spread across multiple files or quickly reuse existing text and other elements, the ability to effectively copy and paste between documents can save you time and effort.

In this article, you will learn how to copy content from one Word document to another using Java and Spire.Doc for Java.

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>12.11.0</version>
    </dependency>
</dependencies>
    

Copy Specified Paragraphs from One Word Document to Another in Java

Spire.Doc for Java provides a flexible way to copy content between Microsoft Word documents. This is achieved by cloning individual paragraphs and then adding those cloned paragraphs to a different document.

To copy specific paragraphs from one Word document to another, you can follow these steps:

  • Load the source document into a Document object.
  • Load the target document into a separate Document object.
  • Identify the paragraphs you want to copy from the source document.
  • Create copies of those selected paragraphs using Paragraph.deepClone() method
  • Add the cloned paragraphs to the target document using ParagraphCollection.add() method.
  • Save the updated target document to a new Word file.
  • Java
import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.Section;
import com.spire.doc.documents.Paragraph;

public class CopyParagraphs {

    public static void main(String[] args) {

        // Create a Document object
        Document sourceDoc = new Document();

        // Load the source file
        sourceDoc.loadFromFile("C:\\Users\\Administrator\\Desktop\\source.docx");

        // Get a specific section
        Section section = sourceDoc.getSections().get(0);

        // Get the specified paragraphs from the source file
        Paragraph p1 = section.getParagraphs().get(2);
        Paragraph p2 = section.getParagraphs().get(3);

        // Create another Document object
        Document targetDoc = new Document();

        // Load the target file
        targetDoc.loadFromFile("C:\\Users\\Administrator\\Desktop\\target.docx");

        // Get the last section
        Section lastSection = targetDoc.getLastSection();

        // Add the paragraphs from the source file to the target file
        lastSection.getParagraphs().add((Paragraph)p1.deepClone());
        lastSection.getParagraphs().add((Paragraph)p2.deepClone());

        // Save the target file to a different Word file
        targetDoc.saveToFile("CopyParagraphs.docx", FileFormat.Docx_2019);

        // Dispose resources
        sourceDoc.dispose();
        targetDoc.dispose();
    }
}

Java: Copy Content from One Word Document to Another

Copy a Section from One Word Document to Another in Java

When copying content between Microsoft Word documents, it's important to consider that a section can contain not only paragraphs, but also other elements like tables. To successfully transfer an entire section from one document to another, you need to iterate through all the child objects within the section and add them individually to a specific section in the target document.

The steps to copy a section between different Word documents are as follows:

  • Create Document objects to load the source file and the target file, respectively.
  • Get the specified section from the source document.
  • Iterate through the child objects within the section.
    • Clone a specific child object using DocumentObject.deepClone() method.
    • Add the cloned child objects to a designated section in the target document using DocumentObjectCollection.add() method.
  • Save the updated target document to a new file.
  • Java
import com.spire.doc.Document;
import com.spire.doc.DocumentObject;
import com.spire.doc.FileFormat;
import com.spire.doc.Section;

public class CopySection {

    public static void main(String[] args) {

        // Create a Document object
        Document sourceDoc = new Document();

        // Load the source file
        sourceDoc.loadFromFile("C:\\Users\\Administrator\\Desktop\\source.docx");

        // Get the specified section from the source file
        Section section = sourceDoc.getSections().get(0);

        // Create another Document object
        Document targetDoc = new Document();

        // Load the target file
        targetDoc.loadFromFile("C:\\Users\\Administrator\\Desktop\\target.docx");

        // Get the last section of the target file
        Section lastSection = targetDoc.getLastSection();

        // Iterate through the child objects in the selected section
        for (int i = 0; i < section.getBody().getChildObjects().getCount(); i++) {

            // Get a specific child object
            DocumentObject childObject = section.getBody().getChildObjects().get(i);

            // Add the child object to the last section of the target file
            lastSection.getBody().getChildObjects().add(childObject.deepClone());
        }

        // Save the target file to a different Word file
        targetDoc.saveToFile("CopySection.docx", FileFormat.Docx_2019);

        // Dispose resources
        sourceDoc.dispose();
        targetDoc.dispose();
    }
}

Java: Copy Content from One Word Document to Another

Copy the Entire Document and Append it to Another in Java

Copying the full contents from one Microsoft Word document into another can be achieved using the Document.insertTextFromFile() method. This method enables you to seamlessly append the contents of a source document to a target document.

The steps to copy an entire document and append it to another are as follows:

  • Create a Document object to represent the target file.
  • Load the target file from the given file path.
  • Insert the content of a different Word document into the target file using Document.insertTextFromFile() method.
  • Save the updated target file to a new Word document.
  • Java
import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class CopyEntireDocument {

    public static void main(String[] args) {

        // Specify the path of the source document
        String sourceFile = "C:\\Users\\Administrator\\Desktop\\source.docx";

        // Create a Document object
        Document targetDoc = new Document();

        // Load the target file
        targetDoc.loadFromFile("C:\\Users\\Administrator\\Desktop\\target.docx");

        // Insert content of the source file to the target file
        targetDoc.insertTextFromFile(sourceFile, FileFormat.Docx);

        // Save the target file to a different Word file
        targetDoc.saveToFile("CopyEntireDocument.docx", FileFormat.Docx_2019);

        // Dispose resources
        targetDoc.dispose();
    }
}

Create a Copy of a Word Document in Java

Spire.Doc for Java provides a straightforward way to create a duplicate of a Microsoft Word document by using the Document.deepClone() method.

To make a copy of a Word document, follow these steps:

  • Create a Document object to relisent the source document.
  • Load a Word file from the given file path.
  • Create a copy of the document using Document.deepClone() method.
  • Save the cloned document to a new Word file.
  • Java
import com.spire.doc.Document;
import com.spire.doc.FileFormat;

public class DuplicateDocument {

    public static void main(String[] args) {

        // Create a new document object
        Document sourceDoc = new Document();

        // Load a Word file
        sourceDoc.loadFromFile("C:\\Users\\Administrator\\Desktop\\target.docx");

        // Clone the document
        Document newDoc = sourceDoc.deepClone();

        // Save the cloned document as a docx file
        newDoc.saveToFile("Copy.docx", FileFormat.Docx);

        // Dispose resources
        sourceDoc.dispose();
        newDoc.dispose();
    }
}

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Tuesday, 27 August 2024 09:27

Spire.Office for Java 9.8.0 is released

We are pleased to announce the release of Spire.Office for Java 9.8.0. In this new version, Spire.XLS for Java supports embedding images in cells; Spire.Doc for Java optimizes the clarity of images when converting Word documents to HTML. Besides, some known issues are fixed successfully in this update. More details are listed below.

Click the link to download Spire.Office for Java 9.8.0:

Here is a list of changes made in this release

Spire.XLS for Java

Category ID Description
New feature SPIREXLS-5147 Provides the hideCategoriTags method to support hiding category labels.
Chart chart = sheet.getCharts().get(0);
String[] labels = chart.getCategoryLabels();
chart.hideCategoryLabels(new String[]
{ labels [0], labels [1], ...}
); 
New feature SPIREXLS-5338 Supports embedding images in cells.
worksheet.getCellRange("B1").insertOrUpdateCellImage("D:\\vs1.png",true);
Bug SPIREXLS-5331 Fixes the issue that caused incorrect content when saving an Excel document and opening it in Microsoft Excel 2016.
Bug SPIREXLS-5337 Fixes the issue that resulted in incorrect pivot table data calculation.

Spire.PDF for Java

Category ID Description
Bug SPIREPDF-6851 Fixes the issue that the program threw "NullPointerException" when converting PDF to SVG.
Bug SPIREPDF-6881 Fixes the issue that some punctuation marks were missing when extracting table text.
Bug SPIREPDF-6895 Fixes the issue that the content was lost when converting OFD to PDF.
Bug SPIREPDF-6923 Fixes the issue that the program threw "doc-0/res/doc-0/res/res7651308984730378845.png cannot be found!" when converting OFD to PDF.
Bug SPIREPDF-6924 Fixes the issue that the program threw "ArrangStoreException" while reading Tiff files.
Bug SPIREPDF-6939 Fixes the issue that the program threw "NullPointerException" when converting PDF to Word.
Bug SPIREPDF-6947 Fixes the issue that the font styles were incorrect when converting PDF to PPTX.

Spire.Doc for Java

Category ID Description
Optimization SPIREDOC-10600 Improves clarity of images when converting Word documents to HTML.
Bug SPIREDOC-10546
SPIREDOC-10601
Fixes the issue that the images were lost when converting HTML to Word.
Bug SPIREDOC-10562 Fixes the problem that the program threw "Unknown boolex value" exception when converting Word to PDF.
Bug SPIREDOC-10688 Fixes the issue that the table borders were lost when the saved Doc document was opened in WPS tool.

Proper presentation of a PDF document is critical for maintaining its accuracy and professionalism. By checking the orientation and rotation of each PDF page, you can confirm that all elements, including diagrams and images, are displayed correctly as intended on the viewing device or platform, thus avoiding confusion or misinterpretation of content. In this article, you will learn how to detect the orientation and rotation angle of a PDF page in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python. It can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Detect PDF Page Orientation in Python

Page orientation is determined by the relationship between page width and height. Using Spire.PDF for Python, you can compare these two values to detect whether a page is landscape (width greater than height) or portrait (width less than height). The following are the detailed steps.

  • Create a PdfDocument instance.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get a specified page using PdfDocument.Pages[] property.
  • Get the width and height of the PDF page using PdfPageBase.Size.Width and PdfPageBase.Size.Height properties.
  • Compare the values of page width and height to detect the page orientation.
  • Print out the result.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load a PDF file from disk
pdf.LoadFromFile("SamplePDF.pdf")

# Get the first page
page = pdf.Pages[0]

# Get the width and height of the page
Width = page.Size.Width
Height = page.Size.Height

# Compare the values of page width and height
if Width > Height:
    print("The page orientation is Landscape.")
    
else:
    print("The page orientation is Portrait.")

Python: Detect Page Orientation or Rotation Angle in PDF

Detect PDF Page Rotation Angle in Python

PDF pages can be rotated by a certain angle. To detect the rotation angle of a PDF page, you can use the PdfPageBase.Rotation property. The following are the detailed steps.

  • Create a PdfDocument instance.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get a specified page using PdfDocument.Pages[] property.
  • Get the rotation angle of the page using PdfPageBase.Rotation property, and then convert it to text string.
  • Print out the result.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load a PDF file from disk
pdf.LoadFromFile("Sample.pdf")

# Get the first page
page = pdf.Pages[0]

# Get the rotation angle of the current page
rotationAngle = page.Rotation
rotation = str(rotationAngle)

# Print out the result
print("The rotation angle is: " + rotation)

Python: Detect Page Orientation or Rotation Angle in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Page 8 of 245