Are Abstracts Really the Abstracts: Alignment in Academic Research Papers and Summarizing Medical Research Paper With Deep Learning
Date
2023
Authors
Student ID
Research Supervisor
Editor
Journal Title
Volume
Issue
Journal Title
Journal ISSN
Volume Title
Abstract
This research project investigates the degree of alignment between academic research and its summarizing abstracts in the medical domain, employing deep learning models for abstractive summarization. Abstractive summarization is a technique that generates summaries of texts using natural language generation rather than extracting sentences from the original texts. However, most existing studies on abstractive summarization focus on news articles or short stories, and few have explored its application to scientific literature, especially medical research papers. We used two state-of-the-art deep learning models, bert2bert and roberta2roberta, which are based on the encoder-decoder architecture and use pre-trained language models as encoders and decoders. We trained our models on a dataset of full-text papers and their abstracts from CORD-19 and other sources, which provided access to a large and diverse collection of medical research papers. We evaluated the performance of our models using ROUGE and METEOR scores, which are commonly used metrics for measuring the quality of summaries. We found that roberta2roberta outperformed bert2bert on both metrics, achieving a ROUGE score of 0.50 and a METEOR score of 0.48. We also compared the generated summaries with the original abstracts and performed a qualitative analysis of their differences and similarities. We found that the abstracts in the papers most often suffer from some common errors and limitations, such as repetition, inconsistency, omission, or exaggeration. We identified some potential benefits and challenges of applying abstractive summarization to scientific literature, such as improving accessibility and readability of research papers, reducing information overload, enhancing scientific communication, promoting interdisciplinary research, ensuring ethical and responsible use of AI, protecting intellectual property rights, preserving data quality and integrity, ensuring data privacy and security, reducing environmental footprint, etc. We concluded that abstracts don’t always reflect the whole paper through abstractive summarization of medical research papers, but further research and development are still required to overcome its challenges and limitations.
Description
Keywords
Citation
Department Name
Electrical and Computer Engineering
Publisher
North South University