Protean – Delivering Data extraction / Document scraping as a service

A common problem faced with operators today as they start the transition to digital information management system, is a mismatch between the system’s requirements and deliverables from contractors and suppliers. One thing is the format of the documents, which may not be optimal, but a bigger problem is often the delivery of supporting tables and registers required by these modern systems to function properly.

Broken DB1

Making the changes required to contracts and delivery protocol, in an effort to ensure suppliers and contractors deliver in the formats required is one thing that’s usually not a problem. But ensuring the delivery of the required tables and databases is another question. Very often these deliverables represent no added value for the supplier, often quite the opposite. Far too often these added requirements are not adequately presented in the contract and usually not compensated for, therefore as time comes for handover, the supplier delivers their standard documentation. This problem often doesn’t surface until the handover process starts – at which point it is often too late.

At best, the result of this mismatch is merely extra work for the operator’s information management staff as they manually input data into the supposedly automated system, defeating a lot of the efficiency gain. At worst the documentation and information has to be rejected and returned to the supplier for updating; that often serves little purpose as the problem is bigger than merely revising documents that were technically correct to begin with.

A different problem faced by some operators is when the document and data requirements of new installations no longer match the information requirements of the company. Perhaps systems have been updated after the project started, or perhaps there was an oversight in the contract. Whatever the reason – the final handover of documentation is missing required information, requiring expensive re-work before import into company systems.

set of databases on a white background 3D illustration

Protean offers a simpler solution that allows suppliers to deliver their documents in their preferred format, that best describes the equipment being delivered, and still ensure information is available in the format required by the operator information systems. Our cloud based service follows these steps to deliver the information you require:

  1. Documentation is uploaded to a cloud network through a simple interface.
  2. Files are automatically separated with non-OCR files prepared for OCR conversion while full OCR files go straight to data extraction.
  3. Non-OCR files are processed by our high powered OCR conversion systems capable of converting all types of documents and drawings to full OCR quality. The system is capable of reading and converting nearly 200 languages including Chinese and Arabic.
  4. All OCR files are then processed by our data extraction system, that is customized to your requirements. Based on your Engineering Numbering System and your requirements, our system uses a mixture of pattern recognition and location data to extract all relevant information from documents. Examples include:
  • All tags to build tag registers for each delivery.
  • Cross reference registers such as tag-doc or doc-doc references.
  • Intelligent engineering data and datasheet information.

set of databases on a white background 3D illustration

The end result are full OCR files and registers according to your requirements, delivered to a network drive or directly to your information systems.

Once developed and deployed, this service offers a simple solution to an often complex problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s