Ziv Bar-Yossef & Sridhar Rajagopalan (2002):
Template detection via data mining and its applications.
In: Proceedings of the 11th International Conference on World Wide Web (WWW'02).
ACM,
New York, NY, USA,
pp. 580–591,
doi:10.1145/511446.511522.
Radek Burget & Ivana Rudolfova (2009):
Web Page Element Classification Based on Visual Features.
In: Proceedings of the 1st Asian Conference on Intelligent Information and Database Systems (ACIIDS'09).
IEEE Computer Society,
Washington, DC, USA,
pp. 67–72,
doi:10.1109/ACIIDS.2009.71.
Soumen Chakrabarti (2001):
Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction.
In: Proceedings of the 10th International Conference on World Wide Web (WWW'01).
ACM,
New York, NY, USA,
pp. 211–220,
doi:10.1145/371920.372054.
Adriano Ferraresi, Eros Zanchetta, Marco Baroni & Silvia Bernardini (2008):
Introducing and evaluating ukWaC, a very large web-derived corpus of english.
In: Proceedings of the 4th Web as Corpus Workshop (WAC-4),
pp. 47–54.
David Gibson, Kunal Punera & Andrew Tomkins (2005):
The volume and evolution of web page templates.
In: Allan Ellis & Tatsuya Hagino: Proceedings of the 14th International Conference on World Wide Web (WWW'05).
ACM,
pp. 830–839,
doi:10.1145/1062745.1062763.
Vidya Kadam & Prakash R. Devale (2012):
A Methodology for Template Extraction from Heterogeneous Web Pages.
Indian Journal of Computer Science and Engineering (IJCSE) 3(3).
Christian Kohlschütter (2009):
A densitometric analysis of web template content.
In: Juan Quemada, Gonzalo León, Yoëlle S. Maarek & Wolfgang Nejdl: Proceedings of the 18th International Conference on World Wide Web (WWW'09).
ACM,
pp. 1165–1166,
doi:10.1145/1526709.1526909.
Christian Kohlschütter, Peter Fankhauser & Wolfgang Nejdl (2010):
Boilerplate detection using shallow text features.
In: Brian D. Davison, Torsten Suel, Nick Craswell & Bing Liu: Proceedings of the 3th International Conference on Web Search and Web Data Mining (WSDM'10).
ACM,
pp. 441–450,
doi:10.1145/1718487.1718542.
Christian Kohlschütter & Wolfgang Nejdl (2008):
A densitometric approach to web page segmentation.
In: James G. Shanahan, Sihem Amer-Yahia, Ioana Manolescu, Yi Zhang, David A. Evans, Aleksander Kolcz, Key-Sun Choi & Abdur Chowdhury: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08).
ACM,
pp. 1173–1182,
doi:10.1145/1458082.1458237.
Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham & The Duy Bui (2009):
A Fast Template-Based Approach to Automatically Identify Primary Text Content of a Web Page.
In: Proceedings of the 2009 International Conference on Knowledge and Systems Engineering,
KSE 2009.
IEEE Computer Society,
pp. 232–236,
doi:10.1109/KSE.2009.39.
Davi de Castro Reis, Paulo Braz Golgher, Altigran Soares Silva & Alberto Henrique Frade Laender (2004):
Automatic web news extraction using tree edit distance.
In: Proceedings of the 13th International Conference on World Wide Web (WWW'04).
ACM,
New York, NY, USA,
pp. 502–511,
doi:10.1145/988672.988740.
Karane Vieira, Altigran S. da Silva, Nick Pinto, Edleno S. de Moura, João M. B. Cavalcanti & Juliana Freire (2006):
A fast and robust method for web page template detection and removal.
In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM'06).
ACM,
New York, NY, USA,
pp. 258–267,
doi:10.1145/1183614.1183654.
Tim Weninger, William Henry Hsu & Jiawei Han (2010):
CETR: Content Extraction via Tag Ratios.
In: Michael Rappa, Paul Jones, Juliana Freire & Soumen Chakrabarti: Proceedings of the 19th International Conference on World Wide Web (WWW'10).
ACM,
pp. 971–980,
doi:10.1145/1772690.1772789.
Lan Yi, Bing Liu & Xiaoli Li (2003):
Eliminating noisy information in Web pages for data mining.
In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD'03).
ACM,
New York, NY, USA,
pp. 296–305,
doi:10.1145/956750.956785.