Improving VQA models using tree neural networks / Yahia Zakaria Abdelsamee ; Supervised Nevin M. Darwish
Material type: TextLanguage: English Publication details: Cairo : Yahia Zakaria Abdelsamee , 2017Description: 66 P. : charts , facsimiles ; 30cmOther title:- تحسين نماذج الإجابة على الأسئلة البصرية باستخدام الشبكات الشجرية [Added title page title]
- Issued also as CD
Item type | Current library | Home library | Call number | Copy number | Status | Date due | Barcode | |
---|---|---|---|---|---|---|---|---|
Thesis | قاعة الرسائل الجامعية - الدور الاول | المكتبة المركزبة الجديدة - جامعة القاهرة | Cai01.13.06.M.Sc.2017.Ya.I (Browse shelf(Opens below)) | Not for loan | 01010110074811000 | |||
CD - Rom | مخـــزن الرســائل الجـــامعية - البدروم | المكتبة المركزبة الجديدة - جامعة القاهرة | Cai01.13.06.M.Sc.2017.Ya.I (Browse shelf(Opens below)) | 74811.CD | Not for loan | 01020110074811000 |
Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Computer Engineering
Visual Question Answering (VQA) is a multi-modal task that requires both visual and linguistic understanding and is considered by some researchers as a Turing test for computer vision. While most research focus on enhancing the multimodal pooling module, enhancing visual and linguistic features are also crucial. Long Short Term Memory Networks (LSTM) are a very common choice although they ignore an important property of natural language which is the hierarchal structure of text. Although tree networks address this property, they are much harder to implement and can be slower to train. We propose to include a tree network in the language module showing that some configurations that combine both Tree networks and regular LSTMs can achieve better results compared to the individual performance of each one of them. We also propose some variations to the tree cells that enhance the performance and achieve higher e ciency. We also present the implementation of a static graph structure and preprocessing step that exploits some tree properties to achieve full batching, good e ciency and simplicity. Our best model achieves 64.8% accuracy on VQA 1.0 test-standard which exceeds that of the baseline with 0.2%
Issued also as CD
There are no comments on this title.