rdcThe second root cause of the complexity is laid in the document structures themselves.  Written in a potentially infinite number of forms, but still containing the same business-critical atomic information, being able to reliably access data without having to physically go over hundreds of pages of text is the goal and challenge of the 21st century.
page02-1-e1455719114301To this goal is to develop an applications that:

  • Allow for reliable digitization of paper based document contents,
  • Allow for near-automatic data extraction from digitized, unstructured content.

ETI is a highly versatile platform for data extraction from un-structured and semi structured documents. The ETI Platform extracts and structures   data from   unstructured   text using a set of methods that includes: Human in the loop Machine Learning (RPSL), Semantic Technologies, Natural Language Processing, Regular Expressions, proprietary Machine Learning Table Extraction, and several other methods.  The user defines the data that will be extracted from a set of documents based on their specific interest. The extracted data structure is called the Extraction Taxonomy. This includes values that may be explicitly stated in a document or values that require derivation based on reading a specific section of text.

