To this goal is to develop an applications that:
- Allow for reliable digitization of paper based document contents,
- Allow for near-automatic data extraction from digitized, unstructured content.
ETI is a highly versatile platform for data extraction from un-structured and semi structured documents. The ETI Platform extracts and structures data from unstructured text using a set of methods that includes: Human in the loop Machine Learning (RPSL), Semantic Technologies, Natural Language Processing, Regular Expressions, proprietary Machine Learning Table Extraction, and several other methods. The user defines the data that will be extracted from a set of documents based on their specific interest. The extracted data structure is called the Extraction Taxonomy. This includes values that may be explicitly stated in a document or values that require derivation based on reading a specific section of text.