An Efficient Approach for Storing Biological Sequences/ (Record no. 167861)
[ view plain ]
| 000 -LEADER | |
|---|---|
| fixed length control field | 06853namaa22004331i 4500 |
| 003 - CONTROL NUMBER IDENTIFIER | |
| control field | OSt |
| 005 - أخر تعامل مع التسجيلة | |
| control field | 20250223033257.0 |
| 008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION | |
| fixed length control field | 240914s2023 |||a|||f |m|| 000 0 eng d |
| 040 ## - CATALOGING SOURCE | |
| Original cataloguing agency | EG-GICUC |
| Language of cataloging | eng |
| Transcribing agency | EG-GICUC |
| Modifying agency | EG-GICUC |
| Description conventions | rda |
| 041 0# - LANGUAGE CODE | |
| Language code of text/sound track or separate title | eng |
| Language code of summary or abstract | eng |
| -- | ara |
| 049 ## - Acquisition Source | |
| Acquisition Source | Deposit |
| 082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER | |
| Classification number | 570.285 |
| 092 ## - LOCALLY ASSIGNED DEWEY CALL NUMBER (OCLC) | |
| Classification number | 570.285 |
| Edition number | 21 |
| 097 ## - Degree | |
| Degree | M.Sc |
| 099 ## - LOCAL FREE-TEXT CALL NUMBER (OCLC) | |
| Local Call Number | Cai01.20.03.M.Sc.2023.Sa.E |
| 100 0# - MAIN ENTRY--PERSONAL NAME | |
| Authority record control number or standard number | Sarah Ahmed Mohamed Abd Ellatif Elnady, |
| Preparation | preparation. |
| 245 12 - TITLE STATEMENT | |
| Title | An Efficient Approach for Storing Biological Sequences/ |
| Statement of responsibility, etc. | by Sarah Ahmed Mohamed Abd Ellatif Elnady; Prof. Abeer ElKorany, Prof. Akram Salah, Dr. Sabah Sayed. |
| 246 15 - VARYING FORM OF TITLE | |
| Title proper/short title | منهج فعال لتخزين السلاسل الحيوية |
| 264 #0 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE | |
| Date of production, publication, distribution, manufacture, or copyright notice | 2023. |
| 300 ## - PHYSICAL DESCRIPTION | |
| Extent | 77 Leaves: |
| Other physical details | illustrations ; |
| Dimensions | 30 cm. + |
| Accompanying material | CD. |
| 336 ## - CONTENT TYPE | |
| Content type term | text |
| Source | rda content |
| 337 ## - MEDIA TYPE | |
| Media type term | Unmediated |
| Source | rdamedia |
| 338 ## - CARRIER TYPE | |
| Carrier type term | volume |
| Source | rdacarrier |
| 502 ## - DISSERTATION NOTE | |
| Dissertation note | Thesis (M.Sc.)-Cairo University, 2023. |
| 504 ## - BIBLIOGRAPHY, ETC. NOTE | |
| Bibliography, etc. note | Bibliography: pages 71-77. |
| 520 ## - SUMMARY, ETC. | |
| Summary, etc. | In the blossoming age of Next-Generation Sequencing (NGS) technologies, genome sequencing has become much easier and more affordable. The large number of enormous genomic sequences obtained demands the availability of huge storage space in order to be kept for analysis. Since the storage cost has become an impediment facing biologists, there is a constant need of software that provides efficient compression of genomic sequences. Most general-purpose compression algorithms do not exploit the redundancies that exist in genomic sequences which is the reason for the success and popularity of special-purpose DNA compression algorithms. One of the main schemes of special-purpose DNA compression is reference-based compression. Although reference-based compression algorithms can achieve outstanding compression, they face several challenges. In this research, a new reference-based lossless compression framework is proposed for deoxyribonucleic acid (DNA) sequences stored in FASTA format. This framework makes use of redundancies in DNA sequences to achieve efficient compression. It has three main phases: data preparation, action sequence generation and gzip compression. The first two phases act as a reference-based compression layer above gzip compression. Furthermore, the “Genetic algorithm”, in addition to greedy alignment algorithms, is used to improve the proposed compression framework. Moreover, a reference selection technique is proposed as an initial phase in the proposed framework. The proposed reference selection technique uses clustering algorithms for determining the most suitable reference genomes to be selected thus enabling the whole framework to reach even more efficient compression. Several experiments were performed to evaluate the proposed framework and the experimental results show that it is able to obtain promising compression ratios saving up to 99.9% space and reaching a gain of 83% with respect to existing algorithms for some plant genomes. The proposed framework also succeeds in performing the compression at acceptable time; even saving more than 50% of the time taken by competitive algorithms in most experiments. Results also proved that using references selected by the proposed reference selection technique provides extremely higher compression gains reaching up to 85% than using a manually selected or random references. |
| 520 ## - SUMMARY, ETC. | |
| Summary, etc. | في الآونة الأخيرة، ارتفع عدد السلاسل الحيوية المتاحة بشكل كبير بفضل تقنيات التسلسل الجديدة. تطلب هذه التسلسلات الهائلة توفر مساحة تخزين ضخمة من أجل الاحتفاظ بها للتحليل. وبالتالي، هناك حاجة مستمرة لخوارزميات ضغط جديدة ومناسبة لهذه التسلسلات لتسهيل تخزينها ونقلها. على الرغم من وجود العديد من خوارزميات ضغط البيانات للأغراض العامة، إلا أنها لا تستغل البنية الأساسية للتسلسلات الجينومية. لذلك، تم تصميم خوارزميات ضغط خصيصًا للتسلسلات الجينومية. ومع ذلك، تواجه هذه الخوارزميات أيضًا بعض التحديات. <br/>لذا، في هذه الرسالة، تم اقتراح منهج فعال من أجل تحقيق ضغط للتسلسلات الجينومية. يعتمد هذا المنهج على طريقة ضغط جديدة باستخدام مرجع للتسلسلات الجينومية. الهدف هو أن تستفيد هذه الطريقة من التكرارات في تسلسلات الجينوم لتحسين نسبة ضغط التسلسلات والوقت ومحاولة التغلب على بعض التحديات التي تواجه الخوارزميات الموجودة مسبقا. علاوة على ذلك، تستخدم هذه الخوارزمية تقنيات الحوسبة الناعمة مثل الخوارزميات الجينية لتحقيق ضغط أكثر فعالية.<br/>وأيضا في هذه الرسالة تم اقتراح طريقة جديدة لاختيارالمرجع المناسب حتى يستخدم في عملية ضغط التسلسلات لأن اختيار المرجع المناسب يعتبر عقبة تواجه خوارزميات الضغط التي تحتاج إلى مرجع. هذه الطريقة تعتمد في الأساس على خوارزميات التصنيف وتستخدم أيضا المنهج الفعال الذي اقترحناه. <br/>في النهاية تم عرض ومناقشة جميع نتائج المنهج المقترح والتي من أبرزها الوصول لضغط أفضل للجينوم بنسبة تصل إلى ٨٣٪ أفضل من بعض الخوارزميات الموجودة وتوفير أكثر من ٥٠٪ من وقت ضغط التسلسلات و٩٩.٩٪ من مساحة التخزين. كما أن طريقة اختيار المرجع المقترحة تستطيع تحسين ضغط المجموعات بنسب كبيرة تصل إلى ٨٥٪.<br/> |
| 530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE | |
| Issues CD | Issued also as CD |
| 546 ## - LANGUAGE NOTE | |
| Text Language | Text in English and abstract in Arabic & English. |
| 650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM | |
| Topical term or geographic name entry element | Bioinformatics |
| Source of heading or term | qrmak |
| 653 #0 - INDEX TERM--UNCONTROLLED | |
| Uncontrolled term | Bioinformatics, |
| -- | DNA sequences |
| -- | reference-based compression |
| -- | greedy alignment |
| 700 0# - ADDED ENTRY--PERSONAL NAME | |
| Personal name | Abeer ElKorany |
| Relator term | thesis advisor. |
| 700 0# - ADDED ENTRY--PERSONAL NAME | |
| Personal name | Akram Salah, |
| Relator term | thesis advisor. |
| 700 0# - ADDED ENTRY--PERSONAL NAME | |
| Personal name | Sabah Sayed |
| Relator term | thesis advisor. |
| 900 ## - Thesis Information | |
| Grant date | 01-01-2023 |
| Supervisory body | Abeer ElKorany |
| -- | Akram Salah |
| -- | Sabah Sayed |
| Discussion body | Abeer ElKorany |
| -- | Fatma Abd El-Sattar Omara |
| -- | Enas Mohamed Fahmy El Houby |
| Universities | Cairo University |
| Faculties | Faculty of Computers and Artificial Intelligence |
| Department | Department of Computer Science |
| 905 ## - Cataloger and Reviser Names | |
| Cataloger Name | Samah |
| Reviser Names | Huda |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
| Source of classification or shelving scheme | Dewey Decimal Classification |
| Koha item type | Thesis |
| Edition | 21 |
| Suppress in OPAC | No |
| Source of classification or shelving scheme | Home library | Current library | Date acquired | Inventory number | Full call number | Barcode | Date last seen | Effective from | Koha item type |
|---|---|---|---|---|---|---|---|---|---|
| Dewey Decimal Classification | قاعة الرسائل الجامعية - الدور الاول | قاعة الرسائل الجامعية - الدور الاول | 14.09.2024 | 88535 | Cai01.20.03.M.Sc.2023.Sa.E | 01010110088535000 | 14.09.2024 | 14.09.2024 | Thesis |