Table of Contents
Extracting and Structuring Web Data
GOALQuery the Web like we query a database
PROBLEMThe Web is not structured like a database.
Making the Web Look Like a Database
Automatic Wrapper Generation
Application Ontology:Object-Relationship Model Instance
Application Ontology: Data Frames
Ontology Parser
Record Extractor
Record Extractor:High Fan-Out Heuristic
Record Extractor:Record-Separator Heuristics
Record Extractor:Consensus Heuristic
Record Extractor: Results
Constant/Keyword Recognizer
Heuristics
Keyword Proximity
Subsumed/Overlapping Constants
Functional Relationships
Nonfunctional Relationships
First Occurrence without Constraint Violation
Database-Instance Generator
Recall & Precision
Results: Car Ads
Car Ads: Comments
Results: Computer Job Ads
Obituaries(A More Demanding Application)
Obituary Ontology
Data FramesLexicons & Specializations
Keyword HeuristicsSingleton Items
Keyword HeuristicsMultiple Items
Results: Obituaries
Results: Obituaries
Conclusions
|
Author: David W. Embley
Home Page: http://osm7.cs.byu.edu/CS751R/CS751R.html
|