References

  1. Ziv Bar-Yossef & Sridhar Rajagopalan (2002): Template detection via data mining and its applications. In: Proceedings of the 11th International Conference on World Wide Web (WWW'02). ACM, New York, NY, USA, pp. 580–591, doi:10.1145/511446.511522.
  2. Radek Burget & Ivana Rudolfova (2009): Web Page Element Classification Based on Visual Features. In: Proceedings of the 1st Asian Conference on Intelligent Information and Database Systems (ACIIDS'09). IEEE Computer Society, Washington, DC, USA, pp. 67–72, doi:10.1109/ACIIDS.2009.71.
  3. Soumen Chakrabarti (2001): Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction. In: Proceedings of the 10th International Conference on World Wide Web (WWW'01). ACM, New York, NY, USA, pp. 211–220, doi:10.1145/371920.372054.
  4. Adriano Ferraresi, Eros Zanchetta, Marco Baroni & Silvia Bernardini (2008): Introducing and evaluating ukWaC, a very large web-derived corpus of english. In: Proceedings of the 4th Web as Corpus Workshop (WAC-4), pp. 47–54.
  5. David Gibson, Kunal Punera & Andrew Tomkins (2005): The volume and evolution of web page templates. In: Allan Ellis & Tatsuya Hagino: Proceedings of the 14th International Conference on World Wide Web (WWW'05). ACM, pp. 830–839, doi:10.1145/1062745.1062763.
  6. Vidya Kadam & Prakash R. Devale (2012): A Methodology for Template Extraction from Heterogeneous Web Pages. Indian Journal of Computer Science and Engineering (IJCSE) 3(3).
  7. Christian Kohlschütter (2009): A densitometric analysis of web template content. In: Juan Quemada, Gonzalo León, Yoëlle S. Maarek & Wolfgang Nejdl: Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM, pp. 1165–1166, doi:10.1145/1526709.1526909.
  8. Christian Kohlschütter, Peter Fankhauser & Wolfgang Nejdl (2010): Boilerplate detection using shallow text features. In: Brian D. Davison, Torsten Suel, Nick Craswell & Bing Liu: Proceedings of the 3th International Conference on Web Search and Web Data Mining (WSDM'10). ACM, pp. 441–450, doi:10.1145/1718487.1718542.
  9. Christian Kohlschütter & Wolfgang Nejdl (2008): A densitometric approach to web page segmentation. In: James G. Shanahan, Sihem Amer-Yahia, Ioana Manolescu, Yi Zhang, David A. Evans, Aleksander Kolcz, Key-Sun Choi & Abdur Chowdhury: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, pp. 1173–1182, doi:10.1145/1458082.1458237.
  10. Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham & The Duy Bui (2009): A Fast Template-Based Approach to Automatically Identify Primary Text Content of a Web Page. In: Proceedings of the 2009 International Conference on Knowledge and Systems Engineering, KSE 2009. IEEE Computer Society, pp. 232–236, doi:10.1109/KSE.2009.39.
  11. Davi de Castro Reis, Paulo Braz Golgher, Altigran Soares Silva & Alberto Henrique Frade Laender (2004): Automatic web news extraction using tree edit distance. In: Proceedings of the 13th International Conference on World Wide Web (WWW'04). ACM, New York, NY, USA, pp. 502–511, doi:10.1145/988672.988740.
  12. Tom Rowlands, Paul Thomas & Stephen Wan (2009): Web indexing on a diet: Template removal with the sandwich algorithm. In: Proceedings of the 14th Australasian Document Computing Symposium. Available at http://es.csiro.au/adcs2009/proceedings/poster-presentation/06-rowlands.pdf.
  13. Karane Vieira, Altigran S. da Silva, Nick Pinto, Edleno S. de Moura, João M. B. Cavalcanti & Juliana Freire (2006): A fast and robust method for web page template detection and removal. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM'06). ACM, New York, NY, USA, pp. 258–267, doi:10.1145/1183614.1183654.
  14. Tim Weninger, William Henry Hsu & Jiawei Han (2010): CETR: Content Extraction via Tag Ratios. In: Michael Rappa, Paul Jones, Juliana Freire & Soumen Chakrabarti: Proceedings of the 19th International Conference on World Wide Web (WWW'10). ACM, pp. 971–980, doi:10.1145/1772690.1772789.
  15. Lan Yi, Bing Liu & Xiaoli Li (2003): Eliminating noisy information in Web pages for data mining. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD'03). ACM, New York, NY, USA, pp. 296–305, doi:10.1145/956750.956785.

Comments and questions to: eptcs@eptcs.org
For website issues: webmaster@eptcs.org