Improving VQA models using tree neural networks / Yahia Zakaria Abdelsamee ; Supervised Nevin M. Darwish

بواسطة:

Yahia Zakaria Abdelsamee

المساهم:

Nevin M. Darwish []

نوع المادة :

نصاللغة: الإنجليزية تفاصيل النشر: Cairo : Yahia Zakaria Abdelsamee , 2017الوصف: 66 P. : charts , facsimiles ; 30cmعنوان آخر:

تحسين نماذج الإجابة على الأسئلة البصرية باستخدام الشبكات الشجرية [عنوان مضاف عنوان الصفحة]

الموضوع:

موارد على الإنترنت:

اضغط هنا للوصول بشكل مباشر

Available additional physical forms:

Issued also as CD

ملاحظة الأطروحة: Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Computer Engineering ملخص: Visual Question Answering (VQA) is a multi-modal task that requires both visual and linguistic understanding and is considered by some researchers as a Turing test for computer vision. While most research focus on enhancing the multimodal pooling module, enhancing visual and linguistic features are also crucial. Long Short Term Memory Networks (LSTM) are a very common choice although they ignore an important property of natural language which is the hierarchal structure of text. Although tree networks address this property, they are much harder to implement and can be slower to train. We propose to include a tree network in the language module showing that some configurations that combine both Tree networks and regular LSTMs can achieve better results compared to the individual performance of each one of them. We also propose some variations to the tree cells that enhance the performance and achieve higher e ciency. We also present the implementation of a static graph structure and preprocessing step that exploits some tree properties to achieve full batching, good e ciency and simplicity. Our best model achieves 64.8% accuracy on VQA 1.0 test-standard which exceeds that of the baseline with 0.2%

وسوم من هذه المكتبة: لا توجد وسوم لهذا العنوان في هذه المكتبة. قم بتسجيل الدخول لإضافة الوسوم.

المقتنيات
نوع المادة	المكتبة الحالية	المكتبة الرئيسية	رقم الاستدعاء	رقم النسخة	حالة	الباركود
Thesis	قاعة الرسائل الجامعية - الدور الاول	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.13.06.M.Sc.2017.Ya.I (استعراض الرف(يفتح أدناه))		لا تعار	01010110074811000
CD - Rom	مخـــزن الرســائل الجـــامعية - البدروم	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.13.06.M.Sc.2017.Ya.I (استعراض الرف(يفتح أدناه))	74811.CD	لا تعار	01020110074811000

Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Computer Engineering

Visual Question Answering (VQA) is a multi-modal task that requires both visual and linguistic understanding and is considered by some researchers as a Turing test for computer vision. While most research focus on enhancing the multimodal pooling module, enhancing visual and linguistic features are also crucial. Long Short Term Memory Networks (LSTM) are a very common choice although they ignore an important property of natural language which is the hierarchal structure of text. Although tree networks address this property, they are much harder to implement and can be slower to train. We propose to include a tree network in the language module showing that some configurations that combine both Tree networks and regular LSTMs can achieve better results compared to the individual performance of each one of them. We also propose some variations to the tree cells that enhance the performance and achieve higher e ciency. We also present the implementation of a static graph structure and preprocessing step that exploits some tree properties to achieve full batching, good e ciency and simplicity. Our best model achieves 64.8% accuracy on VQA 1.0 test-standard which exceeds that of the baseline with 0.2%

Issued also as CD

لا توجد تعليقات على هذا العنوان.

لنشر تعليق.

اضغط على الصورة لمشاهدتها في عارض الصور

العودة إلى النتائج

التالى

1 فقه اللغة و سر العربية :
بواسطة الثعالبى : عبد الملك بن محمد :
2 الاعجاز والايجاز /
بواسطة الثعالبى : عبد الملك بن محمد :
3 كتاب فقه اللغة و اسرار العربية /
بواسطة الثعالبى : عبد الملك بن محمد :
4 كتاب فقه اللغة وسر العربية /
5 الابداع الدلالى فى المتضايفين بين التصورية و البنية العصبية:
بواسطة احمد : عطية سليمان
6 احاسن كلم النبى والصحابة والتابعين وملوك الجاهليه و الاسلام والوزراء والكتاب والبلغاء والحكماء و العلماء /
بواسطة الثعالبى : عبد الملك بن محمد:
7 لطائف المعارف /
بواسطة الثعالبى : عبد الملك بن محمد :
8 الاعجاز والايجاز /
بواسطة الثعالبى : عبد الملك بن محمد :
9 كتاب خاص الخاص للثعالبى/
بواسطة الثعالبى : عبد الملك بن محمد:
10 احسن ما سمعت /
بواسطة الثعالبى : عبد الملك بن محمد :
11 كتاب خاص الخاص /
بواسطة الثعالبى : عبد الملك بن محمد:
12 المنتخب فى محاسن أشعار العرب :
13 سحر البلاغة وسر البراعة /
14 فقه اللغة و سر العربية /
بواسطة الثعالبى : عبد الملك بن محمد :
15 فقه اللغة و سر العربية /
بواسطة الثعالبى : عبد الملك بن محمد :
16 فقه اللغة و سر العربية /
بواسطة الثعالبى : عبد الملك بن محمد :
17 يتيمة الدهر فى محاسن أهل العصر /
18 كتاب خاص الخاص /
بواسطة الثعالبى : عبد الملك بن محمد:
19 من غاب عنه المطرب /
20 كتاب التوفيق للتلفيق /
بواسطة الثعالبى : عبد الملك بن محمد :

إغلاق

جامعة القاهرة

المكتبة المركزية الجديدة

مكتبة جامعة القاهرة الأهلية

Improving VQA models using tree neural networks / Yahia Zakaria Abdelsamee ; Supervised Nevin M. Darwish