MySearch - DocVacGold / Enterprise

Document extraction involves the creation of both plain text content for each page, and XML content which contains the coordinates of a data element e.g. the word COMPANY is located 1" up and 1" right of the bottom left corner of a page in the document. In some cases, the plain text is used to determine what XML gets extracted.

Document level searches mostly take plain text extracted from the document and use it to identify items without looking for specific data types and without regard to the spatial or directional relationships between different data elements. Identified items are stored in PDocClass (or pdc). A plain text search for a specific word would be one example, classifying the document by either document type or company/author would be another. Classification of the value of different pages would be a third example. More information can be found here

Page level searches use XML containing spatial data to identify items of interest and store them in PDocRowClass (or pdrc). RowClassXmlSimple is the most common procedure used. An example would be to search for the text DATE OF BIRTH: and then look to the right (east) to find a valid date/time value e.g. 12/31/2010. Procedures like OpmcReplaceStart can be used to clean up data elements prior to determining if they are valid search candidates. More information can be found here

Last modified: 6/26/2021
Other articles:
Anonymous Mode Email
Web Services - Ws - PDocDetailApi.GetPddList
Combining Multiple Docs into One Doc
Billing - DocVacBasic & DocVacGold
Setup Docs
Excel to Consume Web Services
CSV Files
Financial Statement / Table Extraction
Key Term Search with Wildcards
DocVac Dictionary of Jargon