Building WrappersPart of the problem is record separation
The record-identification task in wrapper construction is nontrivial
Previous Work
- manually [AM97, GHR97, HGMC+97]
- semi-automatically [Ade98, AK97a, AK97b, DEW97, KWD97, Sod97]
Our Work
- automatic with the following assumptions
- the Web document
- has multiple records
- is in HTML
- contains at least one record-separator tag