Hardware/software co-design implementation for cnn model using memory tiling / Mohamed Nafea Mohamed Nafea Khalifa ; Amin M. Nassar, Omar A. Nasr, Hassan Mostafa.

By:

Mohamed Nafea Mohamed Nafea Khalifa [preparation.]

Contributor(s):

Material type: Text

TextLanguage: English Summary language: English, Arabic Producer: 2022Description: 100 Pages : Illustrations, Photograph ; 25 cm. + CDContent type:

text

Media type:

Unmediated

Carrier type:

volume

Other title:

تنفيذ تصميم نموذج الشبكة العصبية التلافيفية بتقسيم التصميم بين العتاد والبرمجيات باستخدام تبليط الذاكرة

Subject(s):

DDC classification:

004.6 21

Available additional physical forms:

Issues also as CD.

Dissertation note: Thesis (M.Sc.)-Cairo University, Faculty of Engineering, Department of Electronics and Communications,2022. Summary: الشبكات العصبية التلافيفية (CNN) تم استخدمها مؤخرًا في العديد من التطبيقات. العدد الهائل من العمليات المكثفة في نماذج CNN من الصعب تحقيق مستويات الأداء المطلوبة باستخدام معالجات CPU. لذلك، تم تطوير مسرعات أجهزة مختلفة لشبكات CNN العميقة مؤخرًا لتحسين الإنتاجية، مسرعات FPGA هي الأكثر شيوعا. في هذا العمل، يتم اتباع منهجية تقسيم التصميم المشترك (HW/SW) باستخدام أداة Xilinx SDSoC لاقتراح مسرّع عالي المستوى يعتمد على FPGA في نموذج GoogLeNet CNN.قمنا بتطوير تطبيقات(C++)عالية المستوى تستخدم الموارد المتاحة لتحقيق أقصى أداء. المسرع المقترح يدعم دقة بيانات مختلفة مثلالنقطة العائمة، والنقطة العائمة النصفية، ودقة البيانات الثابتة. تُظهر النتائج التجريبية تسريعًا قدره 48x لدقة بيانات 32-bit floating، مع 3.8 واط لإجمالي استهلاك الطاقة على الرقاقة. يستهلك المسرع المقترح موارد FPGA أقل بنسبة 40٪ من مسرع RTL المقابلSummary: Convolution Neural Networks (CNNs) are recently deployed in many applications. The massive number of network parameters and the intensive operations in CNN models make it challenging to achieve desired performance levels using general-purpose processors. Therefore, different hardware accelerators for deep CNNs have recently been developed to improve throughput. FPGA-based accelerators are mostly used. In this work, a Hardware/Software (HW/SW) Co-design Partitioning methodology is followed using the Xilinx SDSoC tool to propose a High-Level Synthesis (HLS) FPGA-based accelerator for the GoogLeNet CNN model. Different loop optimization techniques are deployed to allow convolutional functions to run on hardware. The proposed accelerator supports different data precisions. Experimental results show a speedup of 48x for 32-bit float data precision, with 3.8 watts for total on-chip power consumption. The proposed accelerator consumes 40% less FPGA resources than the corresponding RTL accelerator

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Home library	Call number	Status	Date due	Barcode
Thesis	قاعة الرسائل الجامعية - الدور الاول	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.13.08.M.Sc.2022.Mo.H (Browse shelf(Opens below))	Not for loan		01010110087874000

Thesis (M.Sc.)-Cairo University, Faculty of Engineering, Department of Electronics and Communications,2022.

Bibliography: Pages 91-95.

الشبكات العصبية التلافيفية (CNN) تم استخدمها مؤخرًا في العديد من التطبيقات. العدد الهائل من العمليات المكثفة في نماذج CNN من الصعب تحقيق مستويات الأداء المطلوبة باستخدام معالجات CPU. لذلك، تم تطوير مسرعات أجهزة مختلفة لشبكات CNN العميقة مؤخرًا لتحسين الإنتاجية، مسرعات FPGA هي الأكثر شيوعا. في هذا العمل، يتم اتباع منهجية تقسيم التصميم المشترك (HW/SW) باستخدام أداة Xilinx SDSoC لاقتراح مسرّع عالي المستوى يعتمد على FPGA في نموذج GoogLeNet CNN.قمنا بتطوير تطبيقات(C++)عالية المستوى تستخدم الموارد المتاحة لتحقيق أقصى أداء. المسرع المقترح يدعم دقة بيانات مختلفة مثلالنقطة العائمة، والنقطة العائمة النصفية، ودقة البيانات الثابتة. تُظهر النتائج التجريبية تسريعًا قدره 48x لدقة بيانات 32-bit floating، مع 3.8 واط لإجمالي استهلاك الطاقة على الرقاقة. يستهلك المسرع المقترح موارد FPGA أقل بنسبة 40٪ من مسرع RTL المقابل

Convolution Neural Networks (CNNs) are recently deployed in many applications. The massive number of network parameters and the intensive operations in CNN models make it challenging to achieve desired performance levels using general-purpose processors. Therefore, different hardware accelerators for deep CNNs have recently been developed to improve throughput. FPGA-based accelerators are mostly used. In this work, a Hardware/Software (HW/SW) Co-design Partitioning methodology is followed using the Xilinx SDSoC tool to propose a High-Level Synthesis (HLS) FPGA-based accelerator for the GoogLeNet CNN model. Different loop optimization techniques are deployed to allow convolutional functions to run on hardware. The proposed accelerator supports different data precisions. Experimental results show a speedup of 48x for 32-bit float data precision, with 3.8 watts for total on-chip power consumption. The proposed accelerator consumes 40% less FPGA resources than the corresponding RTL accelerator

Issues also as CD.

Text in English and abstract in Arabic & English.

There are no comments on this title.

to post a comment.