Conclusions
Given an ontology and a Web page with multiple records, it is possible to extract and structure the data automatically.
Recall and Precision results are encouraging.
- Car Ads: ~ 94% recall and ~ 99% precision
- Job Ads: ~ 84% recall and ~ 98% precision
- Obituaries: ~ 90% recall and ~ 95% precision (except on names: ~ 73% precision)
Future Work
- Find and categorize pages of interest.
- Strengthen heuristics for separation, extraction, and construction.
- Add richer conversions and additional constraints to data frames.