header
Local cover image
Local cover image
Image from OpenLibrary

Information retrieval system for automatic categorization of wikipedia articles / Nesma Abdelhakim Refaei Ali ; Supervised Elsayed E. Hemayed , Riham Mansour

By: Contributor(s): Material type: TextTextLanguage: English Publication details: Cairo : Nesma Abd Elhakim Refaei Ali , 2016Description: 79 P. : facsimiles ; 30cmOther title:
  • نظام استرجاع معلومات للتصنيف التلقائي لمقالات ويكيبيديا [Added title page title]
Subject(s): Online resources: Available additional physical forms:
  • Issued also as CD
Dissertation note: Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Computer Engineering Summary: Wikipedia has built a categorization system that assigns for each of its articles a set of categories to facilitate the navigation through the related pages. So far, the categorization process is done manually which makes it confusing, tiring and a time consuming task. In this thesis, we propose a system for automatically categorizing newly created Wikipedia articles. The proposed system uses an information retrieval approach to get relevant Wikipedia articles using the article's body, headings, and hyperlinks with other Wikipedia articles. Then it ranks the set of categories associated with these relevant articles based on their relevancy scores. Besides, we use another important signal which is the co-occurrence between the candidate categories which helps in ranking the categories. Finally, the top k ranked categories are retrieved as topics for the input article. Our system achieved relative enhancements over basic search using text only by 17.7% in F-measure and 20.2% in Mean Total Reciprocal Rank. Also it increased the accuracy over a state of the art technique by at least 10.2% on its datasets. Finally, it's evaluated on a benchmark dataset proposed by LSHTC competition and achieved gains over its K-NN baseline by 8.1% in accuracy
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Current library Home library Call number Copy number Status Barcode
Thesis Thesis قاعة الرسائل الجامعية - الدور الاول المكتبة المركزبة الجديدة - جامعة القاهرة Cai01.13.06.M.Sc.2016.Ne.I (Browse shelf(Opens below)) Not for loan 01010110070879000
CD - Rom CD - Rom مخـــزن الرســائل الجـــامعية - البدروم المكتبة المركزبة الجديدة - جامعة القاهرة Cai01.13.06.M.Sc.2016.Ne.I (Browse shelf(Opens below)) 70879.CD Not for loan 01020110070879000

Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Computer Engineering

Wikipedia has built a categorization system that assigns for each of its articles a set of categories to facilitate the navigation through the related pages. So far, the categorization process is done manually which makes it confusing, tiring and a time consuming task. In this thesis, we propose a system for automatically categorizing newly created Wikipedia articles. The proposed system uses an information retrieval approach to get relevant Wikipedia articles using the article's body, headings, and hyperlinks with other Wikipedia articles. Then it ranks the set of categories associated with these relevant articles based on their relevancy scores. Besides, we use another important signal which is the co-occurrence between the candidate categories which helps in ranking the categories. Finally, the top k ranked categories are retrieved as topics for the input article. Our system achieved relative enhancements over basic search using text only by 17.7% in F-measure and 20.2% in Mean Total Reciprocal Rank. Also it increased the accuracy over a state of the art technique by at least 10.2% on its datasets. Finally, it's evaluated on a benchmark dataset proposed by LSHTC competition and achieved gains over its K-NN baseline by 8.1% in accuracy

Issued also as CD

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image