Emad Saddad Abdelhakiem Hussain

Towards a novel data warehouses architecture / نحو بنية حديثة لمستودعات البيانات Emad Saddad Abdelhakiem Hussain ; Supervised Hoda Mokhtar Omar Mokhtar , Osman Hegazy , Ali Hamed Elbastawesy - Cairo : Emad Saddad Abdelhakiem Hussain , 2021 - 74 Leaves : charts ; 30cm

Thesis (Ph.D.) - Cairo University - Faculty of Computers and Artificial Intelligence - Department of Information Systems

Traditional Data Warehouse (DW) is a centralized data repository of non-volatile, subject-oriented, non-operational, integrated, and time variant data that integrates data from different heterogeneous data sources. DW is specifically developed for supporting decision making, analysis, data mining, and ad hoc queries.The structure and the volume of data stored on computer systems have recently been growing at an accelerated rate.Traditional DW has several problems to cope with such environments, such as architecture based on relational Database Management Systems (DBMSs), increasing their data volume, high disc space usage, slow query response time, and complicated administration. Furthermore, DWs depend on a static number of external data sources that may be incomplete, do not use the same definitions, and not always available. Therefore, there is an essential need to adjust traditional DW architecture to meet modern challenges imposed by data massiveness and current big data aspects. Further, a new architecture needs to address existing drawbacks such as availability, scalability, and efficiency of queries.This thesis introduces a novel DW architecture, called Lake Data Warehouse Architecture, to provide the capability to resolve the previously mentioned challenges for traditional DW. Lake Data Warehouse Architecture depends on integrating existing DW architecture with advanced technologies, such as the Hadoop framework and Apache Spark, in a novel and efficient hybrid solution. The main advantage of the proposed Lake Data Warehouse Architecture is that it combines the existing features in traditional DWs together with the big data features through joining the traditional DW with Hadoop and Spark ecosystems. Besides, it is suited to handle massive amounts of data while maintaining reliability, scalability, and availability



Big Data Data Warehouses Semi-structured data