Eyman Saleh Ali Abdanabi Abid

An efficient replication technique for improving availability in hadoop distributed file system / (Hadoop) إيجاد اسلوب فعال للنسخ المتماثل لتحسين الاتاحة في نظام الملفات الموزعة Eyman Saleh Ali Abdanabi Abid ; Supervised Fatma A. Omara , Mohamed H. Khafagy - Cairo : Eyman Saleh Ali Abdanabi Abid , 2016 - 87 Leaves : charts , facsimiles ; 30cm

Thesis (M.Sc.) - Cairo University - Faculty of Computers and Information - Department of Computer Science

The Hadoop Distributed File System (HDFS) is a core component of Apache Hadoop. In the recent years, the HDFS becomes the most popular file system for Big Data Computing due to its availability and fault-tolerance. HDFS is designed to store, analysis, transfer massive data sets reliably, stream it at high bandwidth to the user applications, provides high throughput access to application data and it is suitable for applications that have large data sets. HDFS is a variant of the Google File System (GFS). It handles fault tolerance by using data replication, where each data block is replicated and stored on multiple DataNodes. Therefore, the HDFS supports reliability and availability. The existing implementation of the HDFS in Hadoop performs replication in a pipelined manner that takes much time for replication. This kind of pipelined replication scheme affects the performance of file write operation because of the time overhead. The work in this thesis concerns about improving the HDFS replication



NameNode Replication technique The Hadoop Distributed File System (HDFS)