Error Illegal Image Format Compression
von GoogleAnmeldenAusgeblendete FelderNach Gruppen oder Nachrichten suchen
in with Microsoft Sign Up All Content All Content This Topic This Forum Advanced Search Browse Forums Downloads Guides Calendar Forum Rules Online Users More Wiki Bug Tracker AutoIt Resources Release Installer Help file Editor Beta Installer Help file Editor Git More More More All Activity Home AutoIt v3 AutoIt Help and Support AutoIt General Help and Support Best all-round compatible OCR lib around? Sign in to follow this Followers 0 Best all-round compatible OCR lib around? Started by Penny, September 2, 2009 14 posts in this topic Penny 0 Seeker Active Members 0 33 posts #1 · Posted September 2, 2009 I can't get tesseract to work on Seven, is there a good OCR core, other than MODI (which pretty much sucks) that works with all- xp, https://groups.google.com/d/topic/tesseract-ocr/357ZOt25Cuo vista and seven? Share this post Link to post Share on other sites Penny 0 Seeker Active Members 0 33 posts #2 · Posted September 3, 2009 do other good ocr engines even exist? Share this post Link to post Share on other sites IndyUK 0 Wayfarer Active Members 0 84 posts #3 · Posted September 3, 2009 I can't get tesseract to work on Seven, is there a good OCR core, other than MODI (which pretty much sucks) that works with all- xp, vista https://www.autoitscript.com/forum/topic/101754-best-all-round-compatible-ocr-lib-around/ and seven?Exactly what can't you get to work? Share this post Link to post Share on other sites Penny 0 Seeker Active Members 0 33 posts #4 · Posted September 3, 2009 because I don't know how to convert the JPEG provided by _ScreenCapture_Capture into a TIFF file that can be handled with Tesseract.. is there an easy way to do that? Share this post Link to post Share on other sites IndyUK 0 Wayfarer Active Members 0 84 posts #5 · Posted September 3, 2009 because I don't know how to convert the JPEG provided by _ScreenCapture_Capture into a TIFF file that can be handled with Tesseract.. is there an easy way to do that?Of course there is. I know this can be tricky as I myself couldn't get my head around. You don't need to convert it to a tiff file, use the _TesseractScreenFind function to capture the screen area and scan for the text you're looking for all at the same time. Let me know if you need more help. I'll remote into my work pc and dig out my code. Share this post Link to post Share on other sites Penny 0 Seeker Active Members 0 33 posts #6 · Posted September 3, 2009 nevermind, thing is I hated how that script you're talking about works, so I decided to make it for myself. I made this function to convert the pics to tiffs and it works as far as I can tell. expandcollapse popupFunc ConvertImageToTiff($path,$path_out) Local $
touches TIFF files, with the extension "tif" and uncompressed.Unfortunately, TesseractGUI is not very straightforward about the reason it rejects files from processing (it would just say "Error reading http://blog.loudhush.ro/2010/03/processing-tif-images-for-tessertactgui.html tesseract output").Here's how you can identify the error and fix it:Run tesseract from https://github.com/srobertson/pytesser the command line to find out more about the rejection causecristi:~ diciu$ export TESSDATA_PREFIX=/Applications/TesseractGUI.app/Contents/Resources/cristi:~ diciu$ /Applications/TesseractGUI.app/Contents/Resources/tesseract ~/Desktop/tiffs/page1.tif /tmp/ocrtest.txtTesseract Open Source OCR Engineread_tif_image:Error:Illegal image format:Compression/Applications/TesseractGUI.app/Contents/Resources/tesseract:Error:Read of file failed:/Users/diciu/Desktop/tiffs/page1.tifSignal_exit 31 ABORT. LocCode: 3 AbortCode: 3the problem with this particular TIFF file is the compression.Fixing the problemStep 1/ Download ImageMagickStep 2/ Identify the TIFF file we error illegal want to use:cd /Users/diciu/Downloads/ImageMagick-6.5.8/binexport DYLD_FALLBACK_LIBRARY_PATH=/Users/diciu/Downloads/ImageMagick-6.5.8/libexport MAGICK_HOME=/Users/diciu/Downloads/ImageMagick-6.5.8/cristi:bin diciu$ tiffutil -info ~/Desktop/tiffs/page1.tif Directory at 0x837f8 Subfile Type: (0 = 0x0) Image Width: 1200 Image Length: 2088 Resolution: 200, 200 Resolution Unit: pixels/inch Bits/Sample: 8 Compression Scheme: Lempel-Ziv & Welch encoding Photometric Interpretation: palette color (RGB from colormap) Predictor: none Samples/Pixel: 1 Rows/Strip: 10 Number of Strips: 209 Planar Configuration: Not planar Color Map: (present)Note the compression scheme (LZW).Step error illegal image 3/ Uncompress the TIFF filetiffutil -none ~/Desktop/tiffs/page1.tif -out ~/Desktop/tiffs/page1_uncompressed.tifNow use page1_uncompressed.tif with tesseract.Related:TesseractGUItesseract Posted by Cristian Draghici at 2:06 PM Labels: mac os x , ocr , tesseract 1 comment : Layinka said... Thanks,i hope it works for windows too,if it does you will have made my day 11:32 PM Post a Comment Newer Post Older Post Home Subscribe to: Post Comments ( Atom ) About Me Cristian Draghici View my complete profile My projects Trasee Montane Blog Archive Blog Archive August 2014 ( 1 ) July 2014 ( 2 ) April 2014 ( 1 ) February 2014 ( 1 ) January 2014 ( 4 ) November 2013 ( 2 ) October 2013 ( 1 ) July 2013 ( 1 ) June 2013 ( 2 ) May 2013 ( 2 ) April 2013 ( 1 ) January 2013 ( 1 ) October 2012 ( 1 ) September 2012 ( 2 ) August 2012 ( 5 ) July 2012 ( 1 ) May 2012 ( 1 ) April 2012 ( 3 ) March 2012 ( 1 ) January 2012 ( 1 ) November 2011 ( 2 ) October 2011 ( 2 ) September 2011 ( 1 ) Aug
Sign in Pricing Blog Support Search GitHub This repository Watch 1 Star 3 Fork 2 srobertson/pytesser Code Issues 0 Pull requests 0 Projects 0 Pulse Graphs No description or website provided. 3 commits 1 branch 0 releases Fetching contributors Python 100.0% Python Clone or download Clone with HTTPS Use Git or checkout with SVN using the web URL. Open in Desktop Download ZIP Find file Branch: master Switch branches/tags Branches Tags master Nothing to show Nothing to show New pull request Fetching latest commit… Cannot retrieve the latest commit at this time. Permalink Failed to load latest commit information. tessdata AUTHORS ChangeLog LICENSE NOTICE README dlltest.exe errors.py fnord.tif fonts_test.png out.txt.txt phototest.tif pytesser.py temp.bmp tessdll.dll tessdll.lib tesseract.exe util.py README Introduction: ============ PyTesser is an Optical Character Recognition module for Python. It takes as input an image or image file and outputs a string. PyTesser uses the Tesseract OCR engine (an Open Source project at Google), converting images to an accepted format and calling the Tesseract executable as an external script. A Windows executable is provided along with the Python scripts. The scripts should work in Linux as well. PyTesser: http://code.google.com/p/pytesser/ Tesseract: http://code.google.com/p/tesseract-ocr/ Dependencies: ============= PIL is required to work with images in memory. PyTesser has been tested with Python 2.4 in Windows XP. http://www.pythonware.com/products/pil/ Installation: ============== PyTesser has no installation functionality in this release. Extract pytesser.zip into directory with other scripts. Necessary files are listed in File Dependencies below. Usage: ================================ >>> from pytesser import * >>> im = Image.open('phototest.tif') >>> text = image_to_string(im) >>> print text This is a lot of 12 point text to test the ocr code and see if it works on all types of file format. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumpe