| 000 | 02575cam a2200337 a 4500 | ||
|---|---|---|---|
| 003 | EG-GiCUC | ||
| 005 | 20250223032307.0 | ||
| 008 | 190526s2018 ua dh f m 000 0 eng d | ||
| 040 |
_aEG-GiCUC _beng _cEG-GiCUC |
||
| 041 | 0 | _aeng | |
| 049 | _aDeposite | ||
| 097 | _aM.Sc | ||
| 099 | _aCai01.20.04.M.Sc.2018.Ah.I | ||
| 100 | 0 | _aAhmed Abdelrahim Ali Eldouh | |
| 245 | 1 | 0 |
_aInitial data reorderering in mapreduce technique for specific data categories / _cAhmed Abdelrahim Ali Eldouh ; Supervised Hatem Elkadi , Mohamed Helmy Khafagy |
| 246 | 1 | 5 | _aإعادة ترتيب البيانات الاولية فى تقنية تصغير الخريطة لفئات بيانات محددة |
| 260 |
_aCairo : _bAhmed Abdelrahim Ali Eldouh , _c2018 |
||
| 300 |
_a87 Leaves : _bcharts , facsimiles ; _c30cm |
||
| 502 | _aThesis (M.Sc.) - Cairo University - Faculty of Computers and Information - Department of Information System | ||
| 520 | _aThe rapid increase in big data sets presents an urgent need for handling the difficulty in storing and processing of these datasets. MapReduce is a recent programming model which was initiated by Google{u2019}s Team to handle big data sets and storing. Hadoop is an open source software with an implementation of MapReduce presented by Apache. MapReduce requires a shuffling phase to exchange global the intermediate data generated by the mapping phase, but the shuffling phase in MapReduce increases the overhead on performance. In this thesis, we explore the literature on the shuffling subject and discuss previous techniques adopted to enhance the performance of MapReduce. In addition to our focus on an approach to improve the performance of MapReduce through reducing the overhead caused by shuffling phase. Improving the locality of data will lead to eliminating the network overhead in the shuffling phase for the MapReduce. We achieve this by pre-partitioning data based on query-based similarity through the TF {u2013} IDF and Cosine similarity algorithms and grouping the related queries with each other using K-means clustering algorithm. In this regard, we support HDFS with the related data and control where data are stored to collocate the related data files in the same nodes | ||
| 530 | _aIssued also as CD | ||
| 653 | 4 | _aHadoop | |
| 653 | 4 | _aMapreduce | |
| 653 | 4 | _aShuffling | |
| 700 | 0 |
_aHatem Elkadi , _eSupervisor |
|
| 700 | 0 |
_aMohamed Helmy Khafagy , _eSupervisor |
|
| 856 | _uhttp://172.23.153.220/th.pdf | ||
| 905 |
_aAsmaa _eCataloger |
||
| 905 |
_aNazla _eRevisor |
||
| 942 |
_2ddc _cTH |
||
| 999 |
_c72176 _d72176 |
||