There are some papers available detailing past projects.
Current projects include:
- Using ontologies and conceptual models to extract and structre information from data-rich, unstructured documents. The related papers are available here.
- Using heuristic approaches to find record boundaries in web documents. More information can be found at this web site.
- The extraction of time references from raw text. More information can be found at this web site.
- The development of algorithms to validate an English lexicon by automatically ensuring that a given noun, verb, adjective, or adverb in the lexicon has all of its related entries present.
Future areas of interest include the extraction of personal names, business entities, finacial references, social security numbers, telephone numbers, addresses, isbn numbers, cardinality references, roman numerals, etc., from raw text.
If you would like to contribute to these research efforts, please contact one of the group members. Any interest, comments, or contributions are welcome.
Comments are welcome. | Updated Fri Dec 4 20:50:19 MST 1998 |