Upload Mode


There are 4 modes of uploading a document:

1. Anonymous - when a user comes to documentvacuum.com and uploads a document without logging in and that document is sent to the email they enter on the upload page. Since little is known about either the document or the user, this is the most anonymous mode of operation.  For string type results, the longest strings containing dictionary recognizable words are returned, rather than words that the user might be interested in, as well as any string lists that are in the document e.g. dog breeds {"BASSET HOUND", "BEAGLE"}.

2. AnonymousAuth - short for AnonymousAuthenticated - a user logs into documentvacuum (hence is authenticated) and uploads an anonymous document.  Here, anonymous refers to the document rather than the user.  We take advantage of knowing who the user is to search the document for known items of interest, for example find [*]COMPANY[*] in one or more of the pdrx extracted from the document.  If the user is interested in something other than a double wildcard search on "company" they can edit this in MySearch under the String data type.  Apart from this, output is similar to Anonymous and looks to find triplets of data type/search string/directional relationship. If tableExtraction is set to true, the results are optimized for extracting financial statements or other tables containing numerical data.

3. Reference mode. A document uploaded as a reference document creates entries similiar to AnonymousAuth but in addition if CreateMappingSelected is checked, creates entries in PdrxstnMappingSelected that underpins MySearch.  A user can then go in and edit these entries so that future searches produce the desired search results.  Enterprise users can score the output of reference docs as a means to ensure that a given search produces the desired results.

4. Normal mode.  The results of uploading one or more reference documents and editing the entries in MySearch are used to extract information from the normal document that is of particular interest to the user.  So if the only item of interest is a search on a field "Home Phone", everything else can be disabled in MySearch and when a document is processed in normal mode, the only search attempt that will be made is to find one or more labels that are similar to home phone and if successful, the actual phone number that goes with the label.  If sequentialRowExtraction is set to true, the results are optimized for a document containing rows of numerical data.

Other articles:
Billing - DocVacBasic & DocVacGold
Web Services - Ws - GetPDocPageClassificationNc
Web Services - Usage Charges
Web Services - Class - PDocDetailNc
Web Services - Class - PDocPageNc
Web Services - Class - PDocRowClassificationNc
Web Services - Class - PDocRowXmlNc
Web Services - Class - PdrxDataTypeNc
Web Services - Class - PdrxstnMappingNc