Web Services - Data Definitions - PDocStatus

6/18/2018


PDocStatus contains a variety of status indicators used in DocVacEnterprise and DocVacGold.  In general, an uploaded file is stored in the cloud and entries are made to create a new PDocId and generally prepare for data extraction.  The PDoc then goes through 4 extraction steps - an initial pull of plain text and XML entries from the file, processing of the information at a document level, a second pull of XML entries from the file and finally processing of the information at a page level.  An uploaded file would generally move from status=1 when initially uploaded to a status=21 when part 4 is complete with the status increasing as this occured.  However, other paths are possible.  A document could experience IO (input/output) errors particularly if the image file is very large and these could be fatal (the final status of the document indicates an IO problem) or temporary (the status moves from IO error to something better when the code retries the IO operation that failed).  A second category of non-linear path ultimately resulting in success involves an effort to extract embedded text from a PDF document failing to produce anything useful, followed by a successful effort to treat the PDF document as an image file and use OCR to extract the text.

The most common status codes for a DocVacGold user are:

PDocStatusId=1 - upload successful

PDocStatusId=2 - part 1 image processing underway for a document treated as an image and using OCR

PDocStatusId=3 - part 1 image processing underway for a PDF document treated as having embedded text

PDocStatusId=6 - part 1 image processing successfully completed

PDocStatusId=7 - part 2 image processing underway for a document treated as an image and using OCR

PDocStatusId=8 - part 2 image processing underway for a PDF document treated as having embedded text

PDocStatusId=11 - part 2 image processing successfully completed

PDocStatusId=14 - part 3 image processing underway

PDocStatusId=16 - part 3 image processing successfully completed

PDocStatusId=18 - part 4 processing underway for a document treated as an image and using OCR

PDocStatusId=19 - part 4 processing underway for a PDF document treated as having embedded text

PDocStatusId=21 - part 4  image processing successfully completed

 

PDocStatusId=27 - PDF contents can't be extracted

PDocStatusId=29 - No attempt to extract file contents

PDocStatusId=30 - Image file can't be converted to tif

PDocStatusId=31 - Text can't be extracted from tif

PDocStatusId=32 - IO error

PDocStatusId=33 - Max 20 pages for image


Other articles:
Billing - DocVacBasic & DocVacGold
Web Services - Ws - GetPDocPageClassificationNc
Web Services - Usage Charges
Postman
Web Services - Class - PDocDetailNc
Web Services - Class - PDocPageNc
Web Services - Class - PDocRowClassificationNc
Web Services - Class - PDocRowXmlNc
Web Services - Class - PdrxDataTypeNc
Web Services - Class - PdrxstnMappingNc
more