header

Data integration framework for multi-objective queries /

Ali Eid Ali Zidane Elqutaany

Data integration framework for multi-objective queries / إطار تكامل البيانات للإستعلام متعددة الأهداف Ali Eid Ali Zidane Elqutaany ; Supervised Osman Hegazi , Ali H. Elbastawissy - Cairo : Ali Eid Ali Zidane Elqutaany , 2019 - 161 Leaves : charts , photographs ; 30cm

Thesis (Ph.D.) - Cairo University - Faculty of Computers and Artificial Intelligence - Department of Information Systems

Nowadays, organizations cannot satisfy their information needs from one data source. Moreover, multiple data sources across the organization fuels the need for data integration. Data integration systems users pose their queries to the integration system in terms of an integrated schema and expect duplicate-free and complete answers. In order to meet users expectations; data integration is not limited to getting the answers from the sources, but it is extended to detect and resolve the data quality problems appeared due to the integration. Three processes: data integration, entity matching and entity resolution are mandatory for an integration framework to provide duplicate free and complete answers for users queries. The existing data integration frameworks are performing their processes independently from each other, where the data is integrated from the sources, then the duplicates are detected regardless how data was integrated, and finally the duplicates are resolved regardless how the other two processes were performed. In this thesis, a new data integration framework is introduced to provide complete and duplicate free answers for users queries, as it performs all its processes with complete interfacing and interleaving. The interfacing and interleaving between the processes provide significant enhancements in the effectiveness and completeness of the provided answers. The most crucial component in any data integration framework is the mappings of the data sources to the integrated schema, hence the first contribution in the proposed framework is a new mapping approach which introduced to map not only the elements of the integrated schema as performed by the existing approaches, but also it maps other elements required in detecting and resolving the duplicates. This approach provides means to facilitate future extensibility of the integration system and provides a linkage between the processes of the framework



Data integration Entity matching Virtual data integration