Causal Confusion

How LLMs Can Improve Causal Language in Research Communication

LLMs

Tags

LLMs Causal language Web development

Date

March 2025 (HTI master graduation)

Links
Github
Causal language plays a critical role in scientific communication, as it shapes public understanding, informs policy, and impacts healthcare decisions. When causal statements are ambiguous or misleading, they can lead to confusion and misinterpretation of research findings. This study explores how large language models (LLMs) can improve causal language in academic writing. It employs a two-step approach: first, distinguishing non-causal from causal statements, and second, classifying causal sentences as correlational, conditional causal, or direct causal. The models were fine-tuned on a blended dataset of general-purpose (news, web) and scientific (social science, biomedical) human-labeled sentences. The BERT-based classifier achieved a macro F1-score of 0.94 for detecting causal versus non-causal sentences, while SciBERT attained 0.83 in distinguishing correlational, conditional causal, and direct causal statements. To explore how these classifiers can be applied in practice, a tool was developed to analyze scientific papers and texts, offering personalized warnings and highlighting potential inconsistencies in causal reasoning. By providing researchers with a (visual) overview of causal strength and alignment with study design, the tool supports clearer, more precise communication of research findings. This study demonstrates how LLMs can enhance the clarity and precision of causal language in academic writing, offering a scalable approach to improving scientific communication.

Final tool functionalities

Summary
Motion Pattern 3
Look at the classifications per sections, get a summary on the study design and some writing tips
Align claims
Motion Pattern 4
Toggle section in or out of your view to check wether your claims made in the abstract still match with the strength of your claims in the conclusion
Lenient vs Strict classifications
Lenient vs Strict classifications
You can decide if you want to see all its classifications or only the one that it is very sure of
Explanations
Model explanations
Let the model explain its decision and ask a follow up question
Warnings
Motion Pattern 4
Actionable tips based on what the model found

Try the demo version here

Context

Stakeholders
Causal language is often misused in scientific papers leading to confusion about the implications of findings
Explorations
An example how quickly the media can misinterpret unclear causal language use from scientific papers
Prototype
An example of an unclear causal statement

Three levels of causality

Training 1
As humans we can only clearly distinguish three categories of causal relationships. So these are the labels used for fine-tuning.

Training data

Data 1
The human‑labeled training data was put together based on exisiting datasets
Data 2
Dataset usage

Model selection, is bigger always better?

Model 1
BERT vs. GPT architecture
Model 2
BERT vs. GPT architecture

Evaluation 2 labels

Eval 1
Confusion matrices (2 labels)
Eval 2
Evaluation metrics comparison, BERT best performing model

Evaluation 3 labels

Eval 3
Confusion matrices (3 labels)
Eval 4
Evaluation metrics comparison, SciBERT best performing model

Evaluation final models

Eval 3
Learning curves showing the training of the final models
Eval 4
Misclassifications by the final model that are also very hard to classify as a human. Showing the complexity of this task

Integration into a Tool

Best practices
10 best practices when writing causal language deducted from literature are used to provide actionable tips
Best practices
First the scientific paper in PDF format is processed by a service called GROBID to create a strutured XML file. Then all the headers, figures etc are recognized and organized by a python script. The refences and the introduction are removed as they do not need to be classified by the model. Then first every sentence is classified into causal or non causal. Then all the causal sentences are classified into correlational, conditional causal or direct causal. The method is put into a llama model to provide a summary of the paper and its study design.
Tool UI
Tool interface screenshot

The tool has been tested by 5 researchers on reviewing their own papers for causal language usage