Conclusions
Given an ontology and a Web page with multiple records, it is possible to extract and structure the data automatically.
Record Separation Results: 100%
Recall and Precision Results
- Car Ads: ~ 94% recall and ~ 99% precision
- Job Ads: ~ 84% recall and ~ 98% precision
- Obituaries: ~ 90% recall and ~ 95% precision (except names: ~ 73% precision)
Future Work
- Find and categorize pages of interest.
- Relax restrictions for record separation.
- Strengthen heuristics for extraction.
- Add richer conversions and additional constraints to data frames.