We are excited to announce the release of Spire.XLS for Java 14.9.8. This version supports the revision function. Moreover, it optimizes the speed of converting Excel documents to HTML documents. Besides, some known bugs are fixed successfully in this update, such as the issue that print area settings are not fully copied when duplicating worksheets. More details are listed below.

Here is a list of changes made in this release

Category ID Description
New feature SPIREXLS-5371 Supports the revision function.
Workbook workbook = new Workbook();
workbook.loadFromFile("input.xlsx");
workbook.setTrackedChanges(true);  
workbook.acceptAllTrackedChanges(); 
workbook.saveToFile("output.xlsx", ExcelVersion.Version2013);
workbook.dispose();
New feature SPIREXLS-5362 Optimizes the speed of converting Excel documents to HTML documents.
Bug SPIREXLS-5149 Fixes the issue that print area settings are not fully copied when duplicating worksheets.
Bug SPIREXLS-5295 Fixes the issue that some data is incorrect when converting Excel documents to PDF documents.
Bug SPIREXLS-5368 Fixes the issue that chart contents are lost when converting worksheets to images.
Bug SPIREXLS-5368 Fixes the issue that the program throws an exception "Input string was not in the correct format." when converting charts to images.
Bug SPIREXLS-5432 Fixes the issue that the program throws a java.lang.IllegalArgumentException exception when loading Excel documents.
Bug SPIREXLS-5435 Fixes the issue that conditional formatting is lost when converting Excel documents to XML documents and then back to Excel documents.
Bug SPIREXLS-5441 Fixes the issue that the program throws a java.lang.OutOfMemoryError exception when converting Excel documents to PDF documents.
Bug SPIREXLS-5442 Fixes the issue that fonts are incorrect when converting Excel documents to PDF documents.
Click the link below to download Spire.XLS for Java 14.9.8:

We are delighted to announce the release of Spire.Presentation for Java 9.9.2. This version supports getting the names of all embedded fonts in a PowerPoint file. It also enhances the conversion from PPTX files to PPT files. Moreover, some known issues are fixed in this version, such as the issue that the program suspended when loading a PPTX document. More details are listed below.

Here is a list of changes made in this release

Category ID Description
New feature SPIREPPT-2602 Supports getting the names of all embedded fonts in a PowerPoint file.
ArrayList<String> embedFonts = ppt.getEmbedFonts();
Bug SPIREPPT-2597 Fixes the issue that the program threw java.lang.ClassCastException when converting a PPTX document to a PPT document.
Bug SPIREPPT-2599 Fixes the issue that the program threw java.lang.ClassCastException when calling table.distributeRows(0,1) method after adding a formula to a table cell.
Bug SPIREPPT-2601 Fixes the issue that the program suspended when loading a PPTX document.
Click the link below to download Spire.Presentation for Java 9.9.2:

Converting text to numbers and vice versa in Excel is crucial for efficient data management. When you convert text to numbers, you enable accurate calculations and data processing, which is essential for tasks like financial reporting and statistical analysis. On the other hand, converting numbers to text can be beneficial for formatting outputs, creating clear and readable labels, and presenting data in a more user-friendly manner.

In this article, you will learn how to convert text to numbers and numbers to text in Excel using Spire.XLS for Python.

Install Spire.XLS for Python

This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.XLS

If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows

Convert Text to Numbers in Excel

If you import data from another source into Excel, a small green triangle may appear in the upper-left corner of the cell. This error indicator indicates that the number is stored as text. Numbers that are stored as text can cause unexpected results, like an uncalculated formula showing instead of a result.

To convert numbers stored as text to numbers, you can simply use the CellRange.ConvertToNumber() method. The CellRange object can represent a single cell or a range of cells.

The steps to convert text to numbers in Excel are as follows:

  • Create a Workbook object.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet using Workbook.Worksheets[index] property.
  • Get a cell or a range of cells using Worksheet.Range property.
  • Convert the text in the cell(s) into numbers using CellRange.ConvertToNumber() method.
  • Save the document to a different Excel file.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Load an Excel document
workbook.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Input.xlsx")

# Get a specific worksheet
worksheet = workbook.Worksheets[0]

# Get a cell range
range = worksheet.Range["D2:G13"]

# Convert text to number
range.ConvertToNumber()

# Save the workbook to a different Excel file
workbook.SaveToFile("output/TextToNumbers.xlsx", ExcelVersion.Version2013)

# Dispose resources
workbook.Dispose()

Python: Convert Text to Numbers and Numbers to Text in Excel

Convert Numbers to Text in Excel

When working with numerical data in Excel, you might encounter situations where you need to convert numbers to text. This is particularly important when dealing with data that requires specific formatting, such as IDs or phone numbers that must retain leading zeros.

To convert the number in a cell into text, you can set the CellRange.NumberFormat to @. The CellRange object represents a single cell or a range of cells.

The detailed steps to convert numbers to text in Excel are as follows:

  • Create a Workbook object.
  • Load an Excel file using Workbook.LoadFromFile() method.
  • Get a specific worksheet using Workbook.Worksheets[index] property.
  • Get a specific cell or a range of cells using Worksheet.Range property.
  • Convert the numbers in the cell(s) into text by setting CellRange.NumberFormat to @.
  • Save the document to a different Excel file.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
workbook = Workbook()

# Load an Excel document
workbook.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Employee.xlsx")

# Get a specific worksheet
worksheet = workbook.Worksheets[0]

# Get a cell range
cellRange = worksheet.Range["F2:F9"]

# Convert numbers in the cell range to text
cellRange.NumberFormat = "@"

# Save the workbook to a different Excel file
workbook.SaveToFile("output/NumbersToText.xlsx", ExcelVersion.Version2013)

# Dispose resources
workbook.Dispose()

Python: Convert Text to Numbers and Numbers to Text in Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

We're glad to announce the release of Spire.Presentation 9.9.2. This version supports setting the global font directory when execute conversion feature, and also adds two properties to obtain the last row and last column of the chart's data source. Moreover, two issues that occurred when converting PPTX to SVG, and copying shapes have been successfully fixed. Check below for more details.

Here is a list of changes made in this release

Category ID Description
New feature SPIREPPT-2567 Supports setting the global font directory when execute the conversion function.
Presentation.SetCustomFontsDirctory("myfonts");
New feature SPIREPPT-2594 Adds the "IChart.ChartData.LastRowIndex" and "IChart.ChartData.LastColIndex" properties to obtain the last row and last column of the chart's data source.
Presentation ppt = new Presentation();
ppt.LoadFromFile(inputFile);
StringBuilder stringBuilder= new StringBuilder();
IChart chart = ppt.Slides[0].Shapes[0] as IChart;
if (chart != null)
{
	int lastRow = chart.ChartData.LastRowIndex;
	int lastCol = chart.ChartData.LastColIndex;
	sb.AppendLine("lastRow" + lastRow + "\r\n" + "lastColumn" + lastCol);
	int dataRow = chart.Series[2].Values[chart.Series[2].Values.Count - 1].Row; 
	int dataColumn = chart.Series[2].Values[chart.Series[2].Values.Count - 1].Column; 
	sb.AppendLine("dataRow" + datarow + "\r\n" + "dataColumn" + dataColumn);
	chart.ChartData.Clear(dataRow + 1, 0, lastRow + 1, lastCol + 1);  
	chart.ChartData.Clear(0, dataColumn + 1, lastRow + 1, lastCol + 1); 
}
File.WriteAllText(outputFile_T,stringBuilder.ToString());
ppt.SaveToFile(outputFile, FileFormat.Pptx2013);
ppt.Dispose();
Bug SPIREPPT-2582 Fixes the issue that the type changed from "graphic" to "image" when copying shapes.
Bug SPIREPPT-2590 Fixes the issue that the content was incorrect when converting PPTX documents to SVG documents.
Click the link to download Spire.Presentation 9.9.2:
More information of Spire.Presentation new release or hotfix:

We are delighted to announce the release of Spire.Doc for Java 12.9.4. This version enhances the conversion from Word to PDF. Besides, some known issues are fixed successfully in this version, such as the issue that the created table of contents field was not updated correctly. More details are listed below.

Here is a list of changes made in this release

Category ID Description
Bug SPIREDOC-10740 Optimizes the speed of converting Word documents to PDF documents.
Bug SPIREDOC-10457 Fixes the issue that the text layout was incorrect after converting Word documents to PDF documents.
Bug SPIREDOC-10791 Fixes the issue that the created table of contents field was not updated correctly.
Bug SPIREDOC-10813 Fixes the issue that SimSun font was replaced with Times New Roman font after converting Word documents to PDF documents.
Bug SPIREDOC-10821 Fixes the issue that the program threw "Cannot find any fonts in specified font sources" exception when converting Word documents to PDF documents under the system environment where fonts were not installed.
Bug SPIREDOC-10825 Fixes the issue that the program threw java.lang.NullPointerException when using Map type parameters in MailMergeDataTable class.
Click the link below to download Spire.Doc for Java 12.9.4:
Thursday, 19 September 2024 00:56

Python: Crop Pages in PDF

When dealing with PDF files, you might sometimes need to crop pages in the PDF to remove unnecessary margins, borders, or unwanted content. By doing so, you can make the document conform to specific design requirements or page sizes, ensuring a more aesthetically pleasing or functionally optimized output. This article will introduce how to crop pages in PDF in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python. It can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Crop a PDF Page in Python

Spire.PDF for Python allows you specify a rectangular area, and then use the PdfPageBase.CropBox property to crop page to the specified area. The following are the detailed steps.

  • Create a PdfDocument instance.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get a specified page using PdfDocument.Pages[] property.
  • Crop the page to the specified area using PdfPageBase.CropBox property.
  • Save the result file using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load a PDF file from disk
pdf.LoadFromFile("Sample1.pdf")

# Get the first page
page = pdf.Pages[0]

# Crop the page by the specified area
page.CropBox = RectangleF(0.0, 300.0, 600.0, 260.0)

# Save the result file
pdf.SaveToFile("CropPDF.pdf")
pdf.Close()

Python: Crop Pages in PDF

Crop a PDF Page and Export as an Image in Python

To accomplish this task, you can use the PdfDocument.SaveAsImage(pageIndex: int) method to convert a cropped PDF page to an image stream. The following are the detailed steps.

  • Create a PdfDocument instance.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get a specified page using PdfDocument.Pages[] property.
  • Crop the page to the specified area using PdfPageBase.CropBox property.
  • Convert the cropped page to an image stream using PdfDocument.SaveAsImage() method.
  • Save the image as a PNG, JPG or BMP file using Stream.Save() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load a PDF file from disk
pdf.LoadFromFile("Sample1.pdf")

# Get the first page
page = pdf.Pages[0]

# Crop the page by the specified area
page.CropBox = RectangleF(0.0, 300.0, 600.0, 260.0)

# Convert the page to an image
with pdf.SaveAsImage(0) as imageS:

    # Save the image as a PNG file
    imageS.Save("CropPDFSaveAsImage.png")
pdf.Close()

Python: Crop Pages in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Wednesday, 18 September 2024 01:10

C#: Crop Pages in PDF

PDF page cropping is particularly useful in scenarios where the original document has excessive margins or borders that are not necessary for the intended use. By cropping pages, you can preserve the designated area for specific use, making the document more efficient for sharing, printing, or digital presentations. In this article, you will learn how to crop pages in PDF in C# using Spire.PDF for .NET.

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.PDF

Crop a PDF Page in C#

Spire.PDF for .NET allows you specify a rectangular area, and then use the PdfPageBase.CropBox property to crop page to the specified area. The following are the detailed steps.

  • Create a PdfDocument instance.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get a specified page using PdfDocument.Pages[] property.
  • Crop the page to the specified area using PdfPageBase.CropBox property.
  • Save the result file using PdfDocument.SaveToFile() method.
  • C#
using System.Drawing;
using Spire.Pdf;

namespace CropPDFPage
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a PdfDocument object
            PdfDocument pdf = new PdfDocument();

            //Load a PDF file from disk
            pdf.LoadFromFile("Sample1.pdf");

            //Get the first page
            PdfPageBase page = pdf.Pages[0];

            //Crop the page by the specified area
            page.CropBox = new RectangleF(0, 300, 600, 260);

            //Save the result file
            pdf.SaveToFile("CropPDF.pdf");
            pdf.Close();
        }
    }
}

C#: Crop Pages in PDF

Crop a PDF Page and Export as an Image in C#

To accomplish this task, you can use the PdfDocument.SaveAsImage(int pageIndex, PdfImageType type) method to convert a cropped PDF page to an image. The following are the detailed steps.

  • Create a PdfDocument instance.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get a specified page using PdfDocument.Pages[] property.
  • Crop the page to the specified area using PdfPageBase.CropBox property.
  • Convert the copped page to an image using PdfDocument.SaveAsImage() method.
  • Save the image as a PNG, JPG or BMP file using Image.Save(string filename, ImageFormat format) method.
  • C#
using System.Drawing;
using System.Drawing.Imaging;
using Spire.Pdf;
using Spire.Pdf.Graphics;

namespace CropPDFPageToImage
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a PdfDocument object
            PdfDocument pdf = new PdfDocument();

            //Load a PDF file from disk
            pdf.LoadFromFile("Sample1.pdf");

            //Get the first page
            PdfPageBase page = pdf.Pages[0];

            //Crop the page by the specified area
            page.CropBox = new RectangleF(0, 300, 600, 260);

            //Convert the page to an image
            Image image = pdf.SaveAsImage(0, PdfImageType.Bitmap);

            //Save the image as a PNG file
            image.Save("CropPDFSaveAsImage.png", ImageFormat.Png);

            //Save the image as a JPG file
            //image.Save("ToJPG.jpg", ImageFormat.Jpeg);

            //Save the image as a BMP file
            //image.Save("ToBMP.bmp", ImageFormat.Bmp);
        }
    }
}

C#: Crop Pages in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Saturday, 14 September 2024 02:31

Java: Verify or Extract Digital Signatures in PDF

Verifying digital signatures in PDFs is crucial for ensuring that a document remains unaltered and genuinely comes from the stated signer. This verification process is essential for maintaining the document’s integrity and trustworthiness. Additionally, extracting digital signatures allows you to retrieve signature details, such as the signature image and certificate information, which can be useful for further validation or archival purposes. In this article, we will demonstrate how to verify and extract digital signatures in PDFs in Java using Spire.PDF for Java.

Install Spire.PDF for Java

First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>10.10.7</version>
    </dependency>
</dependencies>
    

Verify Digital Signatures in PDF in Java

Spire.PDF for Java provides the PdfSignature.verifySignature() method to check the validity of digital signatures in PDF documents. The detailed steps are as follows.

  • Create an object of the PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile() method.
  • Get the form of the PDF document using the PdfDocument.Form property.
  • Iterate through all fields in the form and find the signature field.
  • Get the signature using the PdfSignatureFieldWidget.getSignature() method.
  • Verify the validity of the signature using the PdfSignature.verifySignature() method.
  • Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.fields.PdfField;
import com.spire.pdf.security.PdfSignature;
import com.spire.pdf.widget.PdfFormWidget;
import com.spire.pdf.widget.PdfSignatureFieldWidget;

public class VerifySignature {
    public static void main(String[] args) {
        // Create a PdfDocument object
        PdfDocument pdf = new PdfDocument();
        // Load a PDF document
        pdf.loadFromFile("Signature.pdf");

        // Get the form of the PDF document
        PdfFormWidget formWidget = (PdfFormWidget) pdf.getForm();

        if(formWidget.getFieldsWidget().getCount() > 0)
        {
            // Iterate through all fields in the form
            for(int i = 0; i < formWidget.getFieldsWidget().getCount(); i ++)
            {
                PdfField field = formWidget.getFieldsWidget().get(i);
                // Find the signature field
                if (field instanceof PdfSignatureFieldWidget)
                {
                    PdfSignatureFieldWidget signatureField = (PdfSignatureFieldWidget) field;
                    // Get the signature
                    PdfSignature signature = signatureField.getSignature();
                    // Verify the signature
                    boolean valid = signature.verifySignature();
                    if(valid)
                    {
                      System.out.print("The signature is valid!");
                    }
                    else
                    {
                        System.out.print("The signature is invalid!");
                    }
                }
            }
        }
    }
}

Java: Verify or Extract Digital Signatures in PDF

Detect Whether a Signed PDF Has Been Modified in Java

To verify if a signed PDF document has been modified, you can use the PdfSignature.VerifyDocModified() method. The detailed steps are as follows.

  • Create an object of the PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile() method.
  • Get the form of the PDF document using the PdfDocument.Form property.
  • Iterate through all fields in the form and find the signature field.
  • Get the signature using the PdfSignatureFieldWidget.getSignature() method.
  • Verify if the document has been modified since it was signed using the PdfSignature.VerifyDocModified() method.
  • Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.fields.PdfField;
import com.spire.pdf.security.PdfSignature;
import com.spire.pdf.widget.PdfFormWidget;
import com.spire.pdf.widget.PdfSignatureFieldWidget;

public class CheckIfSignedPdfIsModified {
    public static void main(String[] args) {
        // Create a PdfDocument object
        PdfDocument pdf = new PdfDocument();
        // Load a PDF document
        pdf.loadFromFile("Signature.pdf");

        // Get the form of the PDF document
        PdfFormWidget formWidget = (PdfFormWidget) pdf.getForm();

        if(formWidget.getFieldsWidget().getCount() > 0) {
            // Iterate through all fields in the form
            for (int i = 0; i < formWidget.getFieldsWidget().getCount(); i++) {
                PdfField field = formWidget.getFieldsWidget().get(i);
                // Find the signature field
                if (field instanceof PdfSignatureFieldWidget) {
                    PdfSignatureFieldWidget signatureField = (PdfSignatureFieldWidget) field;
                    // Get the signature
                    PdfSignature signature = signatureField.getSignature();
                    // Verify the signaure
                    boolean modified = signature.verifyDocModified();
                    if(modified)
                    {
                        System.out.print("The document has been modified!");
                    }
                    else
                    {
                        System.out.print("The document has not been modified!");
                    }
                }
            }
        }
    }
}

Java: Verify or Extract Digital Signatures in PDF

Extract Signature Images and Certificate Information from PDF in Java

To extract signature images and certificate information from PDF, you can use the PdfFormWidget.extractSignatureAsImages() and PdfSignture.getCertificate().toString() methods. The detailed steps are as follows.

  • Create an object of the PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile() method.
  • Get the form of the PDF document using the PdfDocument.Form property.
  • Extract signature images using the PdfFormWidget.extractSignatureAsImages() method and then save each image to file.
  • Iterate through all fields in the form and find the signature field.
  • Get the signature using the PdfSignatureFieldWidget.getSignature() method.
  • Get the certificate information of the signature using the PdfSignture.getCertificate().toString() method.
  • Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.fields.PdfField;
import com.spire.pdf.security.PdfCertificate;
import com.spire.pdf.security.PdfSignature;
import com.spire.pdf.widget.PdfFormWidget;
import com.spire.pdf.widget.PdfSignatureFieldWidget;

import javax.imageio.ImageIO;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

public class ExtractSignatureImage {
    public static void main(String[] args) {
        // Create a PdfDocument object
        PdfDocument pdf = new PdfDocument();
        // Load a PDF document
        pdf.loadFromFile("Signature.pdf");

        // Get the form of the PDF document
        PdfFormWidget formWidget = (PdfFormWidget) pdf.getForm();

        // Extract signature images
        Image[] images = formWidget.extractSignatureAsImages();
        // Iterate through the images and save each image to file
        for (int i = 0; i < images.length; i++) {
            try {
                // Convert the Image to BufferedImage
                BufferedImage bufferedImage = (BufferedImage) images[i];
                // Define the output file path
                File outputFile = new File("output\\signature_" + i + ".png");
                // Save the image as a PNG file
                ImageIO.write(bufferedImage, "png", outputFile);
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        // Create a text file to save the certificate information
        try (BufferedWriter writer = new BufferedWriter(new FileWriter("output\\certificate_info.txt"))) {
            if (formWidget.getFieldsWidget().getCount() > 0) {
                // Iterate through all fields in the form
                for (int i = 0; i < formWidget.getFieldsWidget().getCount(); i++) {
                    PdfField field = formWidget.getFieldsWidget().get(i);
                    // Find the signature field
                    if (field instanceof PdfSignatureFieldWidget) {
                        PdfSignatureFieldWidget signatureField = (PdfSignatureFieldWidget) field;
                        // Get the signature
                        PdfSignature signature = signatureField.getSignature();

                        // Get the certificate info of the signature
                        String certificateInfo = signature.getCertificate() != null ? signature.getCertificate().toString() : "No certificate";

                        // Write the certificate information to the text file
                        writer.write("Certificate Info: \n" + certificateInfo);
                        writer.write("-----------------------------------\n");
                    }
                }
            } else {
                writer.write("No signature fields found.");
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Java: Verify or Extract Digital Signatures in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Thursday, 12 September 2024 01:23

Spire.OCR for Java Program Guide Content

Spire.OCR for Java is a professional OCR library to read text from Images in JPG, PNG, GIF, BMP and TIFF formats. Developers can easily add OCR functionalities on Java applications (J2SE and J2EE). It supports commonly used image formats and provides functionalities like reading multiple characters and fonts from images, bold and italic styles and much more.

Spire.OCR for Java provides a very easy way to extract text from images. With just three lines of code in Java, Spire.OCR supports read texts from variable common image formats, such as Bitmap, JPG, PNG, TIFF and GIF.

Spire.OCR for Java offers developers a new model for extracting text from images. In this article, we will demonstrate how to extract text from images in Java using the new model of Spire.OCR for Java.

The detailed steps are as follows.

Step 1: Create a Java Project in IntelliJ IDEA.

Extract Text from Images Using the New Model of Spire.OCR for Java

Step 2: Add Spire.OCR.jar to Your Project.

Option 1: Install Spire.OCR for Java via Maven.

If you're using Maven, you can install Spire.OCR for Java by adding the following code to your project's pom.xml file:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.cn/repository/maven-public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.ocr</artifactId>
        <version>1.9.19</version>
    </dependency>
</dependencies>

Option 2: Manually Import Spire.OCR.jar.

First, download Spire.OCR for Java from the following link and extract it to a specific directory:

https://www.e-iceblue.com/Download/ocr-for-java.html

Next, in IntelliJ IDEA, go to File > Project Structure > Modules > Dependencies. In the Dependencies pane, click the "+" button and select JARs or Directories. Navigate to the directory where Spire.OCR for Java is located, open the lib folder and select the Spire.OCR.jar file, then click OK to add it as the project’s dependency.

Extract Text from Images Using the New Model of Spire.OCR for Java

Step 3: Download the New Model of Spire.OCR for Java.

Download the model that fits in with your operating system from one of the following links.

Then extract the package and save it to a specific directory on your computer. In this example, we saved the package to "D:\".

Extract Text from Images Using the New Model of Spire.OCR for Java

Step 4: Implement Text Extraction from Images Using the New Model of Spire.OCR for Java.

Use the following code to extract text from images with the new OCR model of Spire.OCR for Java:

  • Java
import com.spire.ocr.*;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

public class Main {
    public static void main(String[] args) {
        try {
            // Create an instance of the OcrScanner class
            OcrScanner scanner = new OcrScanner();

            // Create an instance of the ConfigureOptions class to set up the scanner configurations
            ConfigureOptions configureOptions = new ConfigureOptions();

            // Set the path to the new model
            configureOptions.setModelPath("D:\\win-x64");

            // Set the language for text recognition. The default is English.
            // Supported languages include English, Chinese, Chinesetraditional, French, German, Japanese, and Korean.
            configureOptions.setLanguage("English");

            // Apply the configuration options to the scanner
            scanner.ConfigureDependencies(configureOptions);

            // Extract text from an image
            scanner.scan("Sample.png");

            // Save the extracted text to a text file
            saveTextToFile(scanner, "output.txt");

        } catch (OcrException e) {
            e.printStackTrace();
        }
    }

    private static void saveTextToFile(OcrScanner scanner, String filePath) {
        try {
            String text = scanner.getText().toString();
            try (BufferedWriter writer = new BufferedWriter(new FileWriter(filePath))) {
                writer.write(text);
            }
        } catch (IOException | OcrException e) {
            e.printStackTrace();
        }
    }
}

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Page 6 of 245