000 02575cam a2200337 a 4500
003 EG-GiCUC
005 20250223032307.0
008 190526s2018 ua dh f m 000 0 eng d
040 _aEG-GiCUC
_beng
_cEG-GiCUC
041 0 _aeng
049 _aDeposite
097 _aM.Sc
099 _aCai01.20.04.M.Sc.2018.Ah.I
100 0 _aAhmed Abdelrahim Ali Eldouh
245 1 0 _aInitial data reorderering in mapreduce technique for specific data categories /
_cAhmed Abdelrahim Ali Eldouh ; Supervised Hatem Elkadi , Mohamed Helmy Khafagy
246 1 5 _aإعادة ترتيب البيانات الاولية فى تقنية تصغير الخريطة لفئات بيانات محددة
260 _aCairo :
_bAhmed Abdelrahim Ali Eldouh ,
_c2018
300 _a87 Leaves :
_bcharts , facsimiles ;
_c30cm
502 _aThesis (M.Sc.) - Cairo University - Faculty of Computers and Information - Department of Information System
520 _aThe rapid increase in big data sets presents an urgent need for handling the difficulty in storing and processing of these datasets. MapReduce is a recent programming model which was initiated by Google{u2019}s Team to handle big data sets and storing. Hadoop is an open source software with an implementation of MapReduce presented by Apache. MapReduce requires a shuffling phase to exchange global the intermediate data generated by the mapping phase, but the shuffling phase in MapReduce increases the overhead on performance. In this thesis, we explore the literature on the shuffling subject and discuss previous techniques adopted to enhance the performance of MapReduce. In addition to our focus on an approach to improve the performance of MapReduce through reducing the overhead caused by shuffling phase. Improving the locality of data will lead to eliminating the network overhead in the shuffling phase for the MapReduce. We achieve this by pre-partitioning data based on query-based similarity through the TF {u2013} IDF and Cosine similarity algorithms and grouping the related queries with each other using K-means clustering algorithm. In this regard, we support HDFS with the related data and control where data are stored to collocate the related data files in the same nodes
530 _aIssued also as CD
653 4 _aHadoop
653 4 _aMapreduce
653 4 _aShuffling
700 0 _aHatem Elkadi ,
_eSupervisor
700 0 _aMohamed Helmy Khafagy ,
_eSupervisor
856 _uhttp://172.23.153.220/th.pdf
905 _aAsmaa
_eCataloger
905 _aNazla
_eRevisor
942 _2ddc
_cTH
999 _c72176
_d72176