Raw Text Extraction



Raw text extraction where you look for a word like 'COMPANY' can be accomplished in two ways.  95%+ of the time, the best way is to search the raw text (that you can view in MyDoc - pages) by setting up one or more searches in MySearch - Doc level search setup and then viewing the conclusions in Pdc.

You can also accomplish the same thing by wildcard searches [*]COMPANY[*] by a string search of the XML data (that you can view in MyDocs - pages - XML) by using a page level search in MySearch.  Because it is computationally intensive and generally inferior, we limit such searches. These XML string searches can be useful when using wildcards to work around character substitution when OCR is used.  For example C[O0]MPANY[*] would return all XML entries that start with COMPANY including ones where the OCR engine interprets the letter O as a zero.


Last modified: 2/20/2021
Other articles:
Anonymous Mode Email
Web Services - Ws - PDocDetailApi.GetPddList
Combining Multiple Docs into One Doc
Billing - DocVacBasic & DocVacGold
Setup Docs
Excel to Consume Web Services
CSV Files
Financial Statement / Table Extraction
Key Term Search with Wildcards
DocVac Dictionary of Jargon

more