Using OCR



Optical character recognition or OCR allows text to be extracted with varying degrees of accuracy from images.  A PDF that contains embedded text can usually be extracted with 100% accuracy, the O in HELLO is extracted as a letter where OCR might interpret the character as a zero 0 and for that reason a PDF that is not created by scanning often produces the best extraction results.

When creating an image by taking a picture with a phone, using black and white mode so that black characters show up on a white background can be helpful, and taking the photo with the text squarely in view rather than at an angle also usually improves the result.


Last modified: 4/2/2018
Other articles:
Anonymous Mode Email
Web Services - Ws - PDocDetailApi.GetPddList
Combining Multiple Docs into One Doc
Billing - DocVacBasic & DocVacGold
Setup Docs
Excel to Consume Web Services
CSV Files
Financial Statement / Table Extraction
Key Term Search with Wildcards
DocVac Dictionary of Jargon

more