Java: Extract Text from Images Using the New Model of Spire.OCR for Java
Starting from version 1.9.15, Spire.OCR for Java provides a new model for extracting text from images. In this article, we will demonstrate how to use this new model to extract text from images in Java.
The detailed steps are as follows.
Step 1: Create a Java Project in IntelliJ IDEA.
Step 2: Add Spire.OCR.jar to Your Project.
Option 1: Install Spire.OCR for Java via Maven.
If you're using Maven, you can install Spire.OCR for Java by adding the following code to your project's pom.xml file:
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.cn/repository/maven-public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.ocr</artifactId> <version>1.9.15</version> </dependency> </dependencies>
Option 2: Manually Import Spire.OCR.jar.
First, download Spire.OCR for Java from the following link and extract it to a specific directory:
https://www.e-iceblue.com/Download/ocr-for-java.html
Next, in IntelliJ IDEA, go to File > Project Structure > Modules > Dependencies. In the Dependencies pane, click the "+" button and select JARs or Directories. Navigate to the directory where Spire.OCR for Java is located, open the lib folder and select the Spire.OCR.jar file, then click OK to add it as the project’s dependency.
Step 3: Download the New Model and Associated Dependencies of Spire.OCR for Java.
Download the new model and associated dependencies (Model&Lib.zip) from the following link and extract the package to a specific directory, such as D:\.
https://www.e-iceblue.com/resource/ocr_java/Model&Lib.zip
Step 4: Implement Text Extraction from Images Using the New Model of Spire.OCR for Java.
Use the following code to extract text from images with the new OCR model of Spire.OCR for Java:
- Java
import com.spire.ocr.*; import java.io.BufferedWriter; import java.io.FileWriter; import java.io.IOException; public class Main { public static void main(String[] args) { try { // Initialize the OcrScanner instance OcrScanner scanner = new OcrScanner(); // Configure OCR options // Set the path to the new model and the language for text recognition. Supported languages include English, Chinese, Chinesetraditional, French, German, Japanese, and Korean. ConfigureOptions configureOptions = new ConfigureOptions("D:\\Model&Lib\\Model", "English"); // Set the path to the associated dependencies configureOptions.setLibPath("D:\\Model&Lib\\Lib\\win-x64"); scanner.ConfigureDependencies(configureOptions); // Perform text extraction from the image scanner.scan("Sample.png"); // Save the extracted text to a file saveTextToFile(scanner, "output.txt"); } catch (OcrException e) { e.printStackTrace(); } } private static void saveTextToFile(OcrScanner scanner, String filePath) { try { String text = scanner.getText().toString(); try (BufferedWriter writer = new BufferedWriter(new FileWriter(filePath))) { writer.write(text); } } catch (IOException | OcrException e) { e.printStackTrace(); } } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
How to Scan and Recognize Text from Images in Java Projects
OCR (Optical Character Recognition) technology is the primary method to extract text from images. Spire.OCR for Java provides developers with a quick and efficient solution to scan and extract text from images in Java projects. This article will guide you on how to use Spire.OCR for Java to recognize and extract text from images in Java projects.
Obtaining Spire.OCR for Java
To scan and recognize text in images using Spire.OCR for Java, you need to first import the Spire.OCR.jar file along with other relevant dependencies into your Java project.
You can download Spire.OCR for Java from our website. If you are using Maven, you can add the following code to your project's pom.xml file to import the JAR file into your application.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.cn/repository/maven-public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.ocr</artifactId> <version>1.9.0</version> </dependency> </dependencies>
Please download the other dependencies based on your operating system:
Install Dependencies
Step 1: Create a Java project in IntelliJ IDEA.
Step 2: Go to File > Project Structure > Modules > Dependencies in the menu and add Spire.OCR.jar as a project dependency.
Step 3: Download and extract the other dependency files. Copy all the files from the extracted "dependencies" folder to your project directory.
Scanning and Recognizing Text from a Local Image
- Java
import com.spire.ocr.OcrScanner; import java.io.*; public class ScanLocalImage { public static void main(String[] args) throws Exception { // Specify the path to the dependency files String dependencies = "dependencies/"; // Specify the path to the image file to be scanned String imageFile = "data/Sample.png"; // Specify the path to the output file String outputFile = "ScanLocalImage_out.txt"; // Create an OcrScanner object OcrScanner scanner = new OcrScanner(); // Set the dependency file path for the OcrScanner object scanner.setDependencies(dependencies); // Use the OcrScanner object to scan the specified image file scanner.scan(imageFile); // Get the scanned text content String scannedText = scanner.getText().toString(); // Create an output file object File output = new File(outputFile); // If the output file already exists, delete it if (output.exists()) { output.delete(); } // Create a BufferedWriter object to write content to the output file BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile)); // Write the scanned text content to the output file writer.write(scannedText); // Close the BufferedWriter object to release resources writer.close(); } }
Specify the Language File to Scan and Recognize Text from an Image
- Java
import com.spire.ocr.OcrScanner; import java.io.*; public class ScanImageWithLanguageSelection { public static void main(String[] args) throws Exception { // Specify the path to the dependency files String dependencies = "dependencies/"; // Specify the path to the language file String languageFile = "data/japandata"; // Specify the path to the image file to be scanned String imageFile = "data/JapaneseSample.png"; // Specify the path to the output file String outputFile = "ScanImageWithLanguageSelection_out.txt"; // Create an OcrScanner object OcrScanner scanner = new OcrScanner(); // Set the dependency file path for the OcrScanner object scanner.setDependencies(dependencies); // Load the specified language file scanner.loadLanguageFile(languageFile); // Use the OcrScanner object to scan the specified image file scanner.scan(imageFile); // Get the scanned text content String scannedText = scanner.getText().toString(); // Create an output file object File output = new File(outputFile); // If the output file already exists, delete it if (output.exists()) { output.delete(); } // Create a BufferedWriter object to write content to the output file BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile)); // Write the scanned text content to the output file writer.write(scannedText); // Close the BufferedWriter object to release resources writer.close(); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.