OCRopus - Open-Source Layout Analysis and OCR
OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. This server allows you to use the system through your web browser.
Please note:
- the system is currently optimized for English text
- the best resolution to use is around 300 dpi
- the processing time increases with the image size and can be up to one minute for a large page
- output formatting is preliminary and will show headings and paragraphs only
- the system does not attempt to recognize that certain parts of the page are not text and will thus try to transcribe images as well
You can either submit an image through the form interface, or you can
submit it programmatically through HTTP.
Form Interface
Note: We may retain data for debugging purposes or to enhance our services.
Examples
If you do not have an image at hand or want to try some of our images, try one of these (note that results may be cached):
Programmatic Interface
To submit your image programmatically, you can simply POST to this URL; the
image should be a parameter named "imagefile".
From the command line, you can do this using:
curl -D header.out -F 'imagefile=@input.png;type=image/png' http://demo.iupr.org/ocropus/ > output.html
You can also do this easily using the HTTP implementation in your favorite
programming language (C#, Python, Java, Perl, etc.).