Sentiment Analysis Approach for Arabic Text on Social Networks/ Osama Sabry Attia Alseidy ; Supervisors: Prof. Dr. Ahmed Gad Allah, Prof. Dr. AbdelMoneim Helmy.

By:

Osama Sabry Attia Alseidy [preparation.]

Contributor(s):

Material type: Text

TextLanguage: English Summary language: English, Arabic Producer: 2023Description: 180 Leaves : illustrations ; 30 cm. + CDContent type:

text

Media type:

Unmediated

Carrier type:

volume

Other title:

تحليل المشاعر فى نصوص اللغة العربية على شبكات التواصل الإجتماعى [Added title page title]

Subject(s):

DDC classification:

005.45

Available additional physical forms:

Issues also as CD.

Dissertation note: Thesis (M.Sc.) -Cairo University, 2023. Summary: The field of Arabic sentiment analysis requires more intensive research due to its underrepresentation in current studies. Our study presents an enhanced approach towards understanding sentiment trends, particularly regarding public opinions towards products, cultures, and ideologies. This understanding was facilitated through data collected from various social media platforms, predominantly Twitter. A distinct dataset comprising of 39,460 tweets that were categorized into positive and negative sentiments. This dataset was utilized for model training purposes. Furthermore, three unique datasets of 82,644 tweets were employed for testing. Upon analysis, we found logistic regression to yield the highest accuracy on the first and second datasets, while the support vector machine displayed superior results for the third dataset. Our research process comprised distinct phases. In the initial phase, we established our objectives, identified inputs, outputs, and the necessary data types for our study. A thorough data cleaning process was undertaken to address issues such as missing values, duplicate data, noise, and structural errors. We also ensured the removal of any irrelevant outputs during this stage, resulting in a final, clean dataset. The subsequent phase involved utilizing the clean dataset for training and validation, followed by model construction. The model was evaluated using various metrics such as accuracy, classification errors, precision, sensitivity, specificity, among others. In the final phase, we assessed our classifiers on other datasets. Logistic regression demonstrated the highest accuracy for the first two sets, whereas the support vector machine was the most accurate for the third set. Despite these results, we noted a general need for improved accuracy. This led us to conduct the experiments again, this time including emojis in the analysis. Logistic regression provided the highest accuracy in all three retests. On evaluation, we identified the inconsistent use of Modern Standard Arabic (MSA) and dialectical Arabic across the training and test datasets as a significant factor affecting accuracy. Hence, we recommend maintaining consistency in language type (either MSA or dialectical) across both training and testing datasets for improved model performance. In terms of results, logistic regression yielded the highest accuracy in the first and second datasets, with scores of 63.218 and 57.334, respectively. Meanwhile, the support vector machine provided the best result in the third dataset, scoring 61.552. However, the accuracy improved across all tests with the inclusion of emojis, with logistic regression scoring 65.852, 57.647, and 57.761 in the respective tests. Despite these promising results, more research is needed to improve the overall accuracy further. Summary: تحليل المشاعر أحد مجالات البحث المتعلقة بمعالجة اللغة الطبيعية والتعلم الآلي. يتوفر الكثير من الآراء والمشاعر حول موضوعات محددة عبر الإنترنت، مما يسمح للعديد من الأطراف مثل العملاء والشركات وحتى الحكومات باستكشاف هذه الآراء. مع النمو الهائل لوسائل التواصل الاجتماعي (على سبيل المثال، المراجعات ومناقشات المنتديات والمدونات وتويتر والتعليقات والمنشورات في مواقع الشبكات الاجتماعية) على الويب، يستخدم الأفراد والمؤسسات المحتوى في هذه الوسائط بشكل متزايد لاتخاذ القرار. يشتمل تحليل المشاعر على عملية متعددة الخطوات وهي استرداد البيانات، واستخراج البيانات واختيارها، والمعالجة المسبقة للبيانات، واستخراج الميزات، وتصنيف المشاعر. المهام الفرعية النهائية لتصنيف المشاعر هي تصنيف القطبية، وتصنيف الكثافة. يهدف تصنيف القطبية إلى تصنيف النص على أنه إيجابي أو سلبي أو محايد. يسعى تصنيف الكثافة إلى تحديد درجة القطبية (على سبيل المثال، إيجابي جدًا، إيجابي، عادل، سلبي، سلبي جدًا). هذا البحث مهم لأنه يهتم باللغة العربية التي تعاني من نقص الاهتمام من حيث تطبيقات الذكاء الاصطناعي التي تهتم بمعالجة البيانات النصية على عكس العديد من اللغات الأخرى. توجد بعض التحديات الخاصة باللغة العربية مثل الثراء الصرفي حيث تختلف الكلمات والضمائر مع الأزمنة المختلفة والمفرد والمثنى والجمع، وأيضاً علامات التشكيل قد تغير معنى الكلمة، وأيضا قد يختلف معنى كلمة باختلاف اللهجات من مكان لآخر. لذلك، تهدف دراستنا إلى بناء أسلوب لتحليل المشاعر يعتمد على التعلم الآلي. سيقوم النظام بتحليل التغريدات العربية في الشبكات الاجتماعية لتحديد المشاعر الإيجابية والسلبية. لتحقيق هذا الهدف، نقوم أولاً بجمع البيانات ثم معالجة البيانات على النحو التالي: • جمع البيانات (يتم جمع البيانات من مصادر مختلفة). • المعالجة المسبقة للبيانات (الاستخراج والتصنيف والاختيار واستخراج الميزات). • تحليل المشاعر. • نتائج الاختبار.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Home library	Call number	Status	Barcode
Thesis	قاعة الرسائل الجامعية - الدور الاول	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.18.07.M.Sc.2023.Os.S (Browse shelf(Opens below))	Not for loan	01010110089763000

Thesis (M.Sc.) -Cairo University, 2023.

Bibliography: pages 173-180.

The field of Arabic sentiment analysis requires more intensive research due to its
underrepresentation in current studies. Our study presents an enhanced approach towards
understanding sentiment trends, particularly regarding public opinions towards products,
cultures, and ideologies. This understanding was facilitated through data collected from
various social media platforms, predominantly Twitter.
A distinct dataset comprising of 39,460 tweets that were categorized into positive and
negative sentiments. This dataset was utilized for model training purposes. Furthermore,
three unique datasets of 82,644 tweets were employed for testing. Upon analysis, we found
logistic regression to yield the highest accuracy on the first and second datasets, while the
support vector machine displayed superior results for the third dataset.
Our research process comprised distinct phases. In the initial phase, we established our
objectives, identified inputs, outputs, and the necessary data types for our study. A thorough
data cleaning process was undertaken to address issues such as missing values, duplicate
data, noise, and structural errors. We also ensured the removal of any irrelevant outputs
during this stage, resulting in a final, clean dataset. The subsequent phase involved utilizing
the clean dataset for training and validation, followed by model construction. The model was
evaluated using various metrics such as accuracy, classification errors, precision, sensitivity,
specificity, among others. In the final phase, we assessed our classifiers on other datasets.
Logistic regression demonstrated the highest accuracy for the first two sets, whereas the
support vector machine was the most accurate for the third set.
Despite these results, we noted a general need for improved accuracy. This led us to
conduct the experiments again, this time including emojis in the analysis. Logistic regression
provided the highest accuracy in all three retests. On evaluation, we identified the
inconsistent use of Modern Standard Arabic (MSA) and dialectical Arabic across the training
and test datasets as a significant factor affecting accuracy. Hence, we recommend
maintaining consistency in language type (either MSA or dialectical) across both training
and testing datasets for improved model performance.
In terms of results, logistic regression yielded the highest accuracy in the first and
second datasets, with scores of 63.218 and 57.334, respectively. Meanwhile, the support
vector machine provided the best result in the third dataset, scoring 61.552. However, the
accuracy improved across all tests with the inclusion of emojis, with logistic regression
scoring 65.852, 57.647, and 57.761 in the respective tests. Despite these promising results,
more research is needed to improve the overall accuracy further.

تحليل المشاعر أحد مجالات البحث المتعلقة بمعالجة اللغة الطبيعية والتعلم الآلي. يتوفر الكثير من الآراء والمشاعر حول موضوعات محددة عبر الإنترنت، مما يسمح للعديد من الأطراف مثل العملاء والشركات وحتى الحكومات باستكشاف هذه الآراء.
مع النمو الهائل لوسائل التواصل الاجتماعي (على سبيل المثال، المراجعات ومناقشات المنتديات والمدونات وتويتر والتعليقات والمنشورات في مواقع الشبكات الاجتماعية) على الويب، يستخدم الأفراد والمؤسسات المحتوى في هذه الوسائط بشكل متزايد لاتخاذ القرار.
يشتمل تحليل المشاعر على عملية متعددة الخطوات وهي استرداد البيانات، واستخراج البيانات واختيارها، والمعالجة المسبقة للبيانات، واستخراج الميزات، وتصنيف المشاعر. المهام الفرعية النهائية لتصنيف المشاعر هي تصنيف القطبية، وتصنيف الكثافة. يهدف تصنيف القطبية إلى تصنيف النص على أنه إيجابي أو سلبي أو محايد. يسعى تصنيف الكثافة إلى تحديد درجة القطبية (على سبيل المثال، إيجابي جدًا، إيجابي، عادل، سلبي، سلبي جدًا).
هذا البحث مهم لأنه يهتم باللغة العربية التي تعاني من نقص الاهتمام من حيث تطبيقات الذكاء الاصطناعي التي تهتم بمعالجة البيانات النصية على عكس العديد من اللغات الأخرى.
توجد بعض التحديات الخاصة باللغة العربية مثل الثراء الصرفي حيث تختلف الكلمات والضمائر مع الأزمنة المختلفة والمفرد والمثنى والجمع، وأيضاً علامات التشكيل قد تغير معنى الكلمة، وأيضا قد يختلف معنى كلمة باختلاف اللهجات من مكان لآخر.
لذلك، تهدف دراستنا إلى بناء أسلوب لتحليل المشاعر يعتمد على التعلم الآلي. سيقوم النظام بتحليل التغريدات العربية في الشبكات الاجتماعية لتحديد المشاعر الإيجابية والسلبية.
لتحقيق هذا الهدف، نقوم أولاً بجمع البيانات ثم معالجة البيانات على النحو التالي:
• جمع البيانات (يتم جمع البيانات من مصادر مختلفة).
• المعالجة المسبقة للبيانات (الاستخراج والتصنيف والاختيار واستخراج الميزات).
• تحليل المشاعر.
• نتائج الاختبار.

Issues also as CD.

Text in English and abstract in Arabic & English.

There are no comments on this title.

to post a comment.