Can AI Review Scientific Literature?


Artificial intelligence (AI) has transformed fields from creative writing to data analysis—but can it effectively review scientific literature? In “Can AI Review the Scientific Literature?” Helen Pearson explores the capabilities and limitations of AI-powered research synthesis, assessing whether these tools can streamline one of the most time-consuming tasks in academia: systematic reviews.

The Challenge of Scientific Review

Compiling research into a review paper is an arduous process. Traditional systematic reviews, which synthesize findings from multiple studies while following rigorous methodologies, can take months or even years to complete. Researchers have long sought ways to accelerate this process, and with the rise of large language models (LLMs), AI now promises a potential shortcut.

AI as a Research Assistant

AI-powered science search engines, such as Consensus and Elicit, already assist researchers by retrieving, sorting, and summarizing academic literature. Instead of generating text from scratch (which risks inaccuracies), these tools apply retrieval-augmented generation, meaning they extract insights directly from selected sources.

Some AI-driven platforms go further. FutureHouse, a non-profit backed by former Google CEO Eric Schmidt, developed PaperQA2, an AI system that digests full-text scientific papers and generates Wikipedia-style entries on human genes. When tested against human-written Wikipedia articles, PaperQA2 produced fewer reasoning errors, suggesting that AI may, in some cases, provide more reliable syntheses than human authors.

Meanwhile, Paul Glasziou, an expert in evidence-based medicine, has been leveraging AI to speed up systematic reviews. His team set a world record, completing a review in just five days (compared to the usual months-long process), thanks to AI-driven tools that automate literature screening and bias assessment.

The Risks of AI-Generated Reviews

Despite these advances, AI-based reviews face significant limitations and risks:

  1. Lack of Quality Control – AI models struggle to assess study credibility, leading to potential inclusion of low-quality or misleading research.

  2. Hallucinations & Fabrications – LLMs can generate false citations and erroneous claims, undermining reliability.

  3. Reproducibility Issues – AI-generated summaries may vary between runs, making systematic analysis inconsistent.

  4. Paywall Barriers – Many AI tools can only access open-access literature, limiting comprehensive reviews.

Researchers also worry that AI could flood academic publishing with low-quality, AI-generated reviews, diluting the integrity of evidence-based research.

The Future of AI in Research

While AI is far from replacing human experts, it is proving to be a powerful assistive tool—streamlining tasks such as literature retrieval, citation tracking, and preliminary summarization. Future developments may refine AI’s ability to filter high-quality research, reduce bias, and improve reproducibility, but for now, AI-enhanced reviews remain a work in progress.

As Pearson highlights, AI’s impact on scientific publishing is still unfolding. It has the potential to revolutionize research synthesis, but also to introduce new challenges in reliability and transparency. Whether AI ultimately enhances or undermines the scientific process will depend on how carefully researchers integrate these tools into their workflows.

References:

  1. Pearson, H. Can AI review the scientific literature — and figure out what it all means? (2024). Nature 635, 276-278.

All rights reserved Biobites 2025
All rights reserved Biobites 2025
All rights reserved Biobites 2025