Hi everyone,
I’m happy to announce that Simple Text Extractor v1.2 is now available on the stable channel!
This update is a major milestone, moving the application from a simple utility to a robust, industrial-grade OCR tool designed for massive workloads and high privacy standards.
Expanded Language Support
We’ve significantly broadened our horizons. The app now includes 17+ native OCR dictionaries directly bundled in the Snap:
-
Added: Russian, Turkish, Vietnamese, Norwegian, Swedish, Danish, and Greek.
-
Retained: French, English, German, Dutch, Italian, Spanish, Portuguese, Chinese (Simplified), Arabic, and Japanese.
Performance for Massive Documents (2300+ Pages)
Processing large legal or administrative archives is now a breeze. We’ve re-engineered the engine specifically for the Linux Snap environment:
-
OOM Protection: This new architecture prevents “Out Of Memory” crashes, allowing the app to process files exceeding 2300 pages without slowing down your system.
-
Multi-Core Intelligence: Tasks are dynamically distributed across your CPU cores (up to 8 simultaneous pages) for maximum throughput.
Hardened Security (CWE-377 & CWE-209)
Privacy is our core philosophy (100% offline):
-
Atomic Operations: Temporary files are created in secure, restricted bunkers (0600 permissions) and deleted atomically.
-
Log Anonymization: Automatic scrubbing of personal usernames and local paths in system logs to ensure absolute confidentiality.
-
Exploit Protection: Built-in defenses against “Decompression Bombs” and command injections.
UI/UX Refinements
-
Safety First: New confirmation dialog when clearing the file list to prevent accidental data loss.
-
Zero-Freeze Interface: Completely asynchronous bridge ensuring the UI stays 100% responsive during heavy OCR tasks.
Install it now: snap install simple-text-extractor
Or find it on the Snap Store: https://snapcraft.io/simple-text-extractor
I’d love to hear your feedback, especially if you’re handling large-scale digitization projects!
