In Silico Biology

March 24, 2025

LLMs in Healthcare…what separates accuracy from medical quackery? 0.001% of Training Tokens with Misinformation
Link to the Paper: https://www.nature.com/articles/s41591-024-03445-1

It is undeniable that tech is putting its eyes on healthcare and the gigantic amounts of data it produces. But are tech and the current waves of AI ready to tackle the issues medicine presents and is the current wave of Large Language Models (LLMs) able to deal with the bias in data and the medical misinformation that prevails online?

Let’s delve into this interesting paper, published last year, in Nature Medicine. The researchers decided to extend the work developed in papers such as:
- Carlini, N., Jagielski, M., Choquette-Choo, C. A., Paleka, D., Pearce, W., Anderson, H., … & Tramèr, F. (2024, May). Poisoning web-scale training datasets is practical. In 2024 IEEE Symposium on Security and Privacy (SP) (pp. 407-425). IEEE.
- Steinhardt, J., Koh, P. W. W., & Liang, P. S. (2017). Certified defenses for data poisoning attacks. Advances in neural information processing systems, 30.
- Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., & Jha, N. K. (2014). Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE journal of biomedical and health informatics, 19(6), 1893-1905.
And assess the risk that misinformation poses in LLMs in healthcare. As CarliniEtAl2024 demonstrates, LLMs depend on significant amounts of data extracted from the internet, and specific data poisoning (ergo: data that possesses misinformation on purpose) attacks are enough to severely alter the results of the LLMs wihout messing around with the models or the weights. The question posed by this article comes as “how much data poisoning turns a reliable healthcare LLM into a medical misinformation machine?” .

The team studied several datasets for their usage

The final in depth analysis went as follows: the team took on The Pile, a commonly used 825 Gb English text corpus targeted at training large-scale language models, extracted and analysed the medical information on the dataset to classify it. Then they devise a protocol to assess how much data poisoning is enough to significantly affect the LLM (fig 1)

Protocol of the study. Inject increasing percentages of data poisoning into the medical data on The Pile and analyse the response.

The degree of poisoning varied, with modifications as small as 0.001% of the total tokens in the training dataset. After poisoning the data, then retrained LLMs on both clean and poisoned datasets and evaluated their responses to medical queries using standard methods of LLM quality.

Main Points:

Negligible levels of data poisoning can be exceedingly detrimental:
- LLMs are capable of disseminating misinformation within medical responses if training data contains just 0.001% of false information.
- Medical LLMs are more sensitive to the integrity of training data than previously thought because even a small amount of poisoning produced a striking amount of falsehoods.
The models continued to do well on standard Wechsler Adult Intelligence Scale and California Achievement Test measures:
- The models maintained high scores on traditional evaluation metrics even though they were poisoned.
- It is reasonable to assume that standard NLP benchmarks do not identify poisoned models and, therefore, maliciously altered LLMs can pass as trustworthy.
The misinformation was persistent:
- Misinformation is highly persistent; once a model learned false medical facts, even fine-tuning with accurate data could not reverse the poisoning fully.
- There is great cause for concern regarding the long-term retention of misinformation for medical AI, as the false knowledge appeared to be ingrained much too deeply.
Biomedical knowledge graphs had the ability to detect misinformation:
- The study assessed the use of biomedical knowledge graphs (which are structured databases containing verified medical data) as a countermeasure.
- By validating and filtering model outputs with knowledge graphs, 91.9% of the incorrect content was detected (F1 score = 85.7%).
- This approach has the potential to act as an additional safety measure for medical AI systems.
Consequences

Patient Safety Concerns
- If an LLM is unknowingly trained on poisoned data, it could mislead doctors, researchers, or patients with incorrect medical advice.
- This poses serious health risks, particularly in high-stakes decisions like drug prescriptions or treatment plans.
Current AI Evaluation Standards are Inadequate
- Many standard NLP benchmarks assess language fluency and coherence, but they do not verify factual accuracy.
- This allows poisoned models to pass quality checks while still spreading misinformation.
Security & Ethical Risks in AI Development
- The study highlights an urgent need for better security measures in LLM training, such as data provenance tracking and transparency in dataset curation.
- Without safeguards, malicious actors could exploit LLMs to spread medical disinformation at scale.
Proposed Mitigation Strategies

Incorporate Biomedical Knowledge Graphs
- Before presenting medical information, LLM outputs should be cross-checked against validated medical sources.
- This approach successfully filtered over 91% of harmful content in the study.
Improve Training Data Provenance & Transparency
- AI developers should ensure that training datasets are accurately sourced, verified, and monitored for poisoning attempts.
- Greater transparency in dataset composition could help detect and prevent manipulation.
Develop Robust Fact-Checking Mechanisms
- AI systems should integrate automated fact-checking tools to continuously validate medical information they generate.
- This could include real-time external validation with expert-reviewed medical literature.
Regulatory & Ethical Oversight

The study emphasizes the need for stricter regulations around medical AI, including:
- Auditing training data for integrity.
- Enforcing guidelines to prevent malicious manipulation.
- Mandating disclosure of AI training sources in healthcare applications.
Final Conclusions
- Medical LLMs are highly vulnerable to data-poisoning attacks, even when modifications are extremely subtle.
- Standard evaluation metrics fail to detect poisoned models, meaning false medical knowledge can persist unnoticed.
- Knowledge graphs provide an effective defense, but additional security and transparency measures are urgently needed.
The study calls for better safeguards in AI development, as compromised medical LLMs could pose a serious threat to public health and patient safety.
AI in Medicine, Computational Biology, Data and Tech Ethics, Ethics

AI, artificial-intelligence, chatgpt, computationalbiology, Healthcare, llm, LLMs, technology

Share to:
Leave a comment:
No comments on LLMs in Healthcare…what separates accuracy from medical quackery? 0.001% of Training Tokens with Misinformation
February 19, 2025

AI, Data Science and The New World of Misinformation

I know, the headlines since the 20th January 2025 look like from 1984 tech version of dystopia…

Billionaires wasting billions of dollars and precious energy in megalomaniacal AI projects with limited ability to actually help people and an enormous ability to manipulate them and hurt them.

It’s not the greatest of times to call yourself a data scientist…when you’re getting mixed with snake oil salesman, outright liars and people that disregard public health and safety in the name of money. I know, more and more I’ve been calling myself just a Computational Biologist.

But what can we do in such a reality? I second Hannah Fry’s argument that tech and data specialists need their own version of the Hippocratic Oath (check article).

Something that will help us understand the power of our work and be a constant reminder of our responsibilities as data stewards.

Here’s my suggestion, let’s start by the suggestion given by Sir Józef Rotblat, in 1999, in a Science Journal Article :

I promise to work for a better world, where science and technology are used in socially responsible ways. I will not use my education for any purpose intended to harm human beings or the environment. Throughout my career, I will consider the ethical implications of my work before I take action. While the demands placed upon me may be great, I sign this declaration because I recognize that individual responsibility is the first step on the path to peace.

This is a good first oath, we can make it better with time.
Data and Tech Ethics

data ethics, ethic, tech ethics

Share to:
Leave a comment:
No comments on AI, Data Science and The New World of Misinformation
December 24, 2024

Who am I? What’s the goal of this blog?

I am not gonna put myself on a microscope, not even the 100y old seen in this diagram 😅

I am a computational biologist passionate about teaching people about biology, the science that conquered my heart since i was 10, and especially computational biology, the mix of technology and biology that has been my life’s work for over a decade now.

So what can you find here?

– Posts about science curiosities, history fo science and historical science figures that have almost been forgotten (yes, I’m gonna talk a lot about women, LGBTQ+ and BIPOC people in science);

– Resources on how to learn biology and computational biology, especially within neurology and immunology, my favourite areas in medicine;

– Also, as I’m known to do, a lot of silliness in science, funny science discoveries and all the laughs, gore and guts that biology has to give.

Wanna follow along the journey? Subscribe! 😄
Presentations

computationalbiology, Presentation, Whoami

Share to:
Leave a comment:
No comments on Who am I? What’s the goal of this blog?

About Me

I’m an AuDHDer computational biologist, dedicating my life to the bridge between tech and medicine/ biology. I hold degrees in biology, complexity sciences and data science & advanced analytics. I’m passionate about teaching people about using data in medicine and ethical usages of Artificial Intelligence.

LLMs in Healthcare…what separates accuracy from medical quackery? 0.001% of Training Tokens with Misinformation

Main Points:

Consequences

Patient Safety Concerns

Current AI Evaluation Standards are Inadequate

Security & Ethical Risks in AI Development

Proposed Mitigation Strategies

Incorporate Biomedical Knowledge Graphs

Improve Training Data Provenance & Transparency

Develop Robust Fact-Checking Mechanisms

Regulatory & Ethical Oversight

Final Conclusions

AI, Data Science and The New World of Misinformation

Who am I? What’s the goal of this blog?