Conclusions
We described a heuristic approach to discover record boundaries in unstructured Web documents.
Main contribution: we provided a set of individual heuristics and a way to combine these heuristics into a method for discovering record boundaries.
Under normal assumptions, the process is O(n), where n is the size of a document.
The experiments we conducted showed that this approach uniformly attained an accuracy of 100%.