• About
  • Advertise
  • Privacy & Policy
  • Contact
Monday, January 12, 2026
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

Deploying a PICO Extractor in Five Steps

AiNEWS2025 by AiNEWS2025
2025-09-19
in Machine Learning
0
Deploying a PICO Extractor in Five Steps
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


language models has made many Natural Processing (NLP) tasks appear effortless. Tools like ChatGPT sometimes generate strikingly good responses, leading even seasoned professionals to wonder if some jobs might be handed over to algorithms sooner rather than later. Yet, as impressive as these models are, they still stumble on tasks requiring precise, domain-specific extraction.

Motivation: Why Build a PICO Extractor?

The idea arose during a conversation with a student, graduating in International Healthcare Management, who set out to analyze future trends in Parkinson’s treatment and to calculate potential costs awaiting insurances, if the current trials turn into a successful product. The first step was classic and laborious: isolate PICO elements—Population, Intervention, Comparator, and Outcome descriptions—from running trial descriptions published on clinicaltrials.gov. This PICO framework is often used in evidence-based medicine to structure clinical trial data. Since she was neither a coder nor an NLP specialist, she did this entirely by hand, working with spreadsheets. It became clear to me that, even in the LLM era, there is real demand for straightforward, reliable tools for biomedical information extraction.

Step 1: Understanding the Data and Setting Goals

As in every data project, the first order of business is setting clear goals and identifying who will use the results. Here, the objective was to extract PICO elements for downstream predictive analyses or meta-research. The audience: anyone interested in systematically analyzing clinical trial data, be it researchers, clinicians, or data scientists. With this scope in mind, I started with exports from clinicaltrials.gov in JSON format. Initial field extraction and data cleaning provided some structured information (Table 1) — especially for interventions — but other key fields were still unmanageably verbose for downstream automated analyses. This is where NLP shines: it enables us to distill crucial details from unstructured text such as eligibility criteria or tested drugs. Named Entity Recognition (NER) enables automated detection and classification of key entities—for example, identifying the population group described in an eligibility section, or pinpointing outcome measures within a study summary. Thus, the project naturally transitioned from basic preprocessing to the implementation of domain-adapted NER models.

Table 1: Key elements from clinicaltrials.gov information on two Alzheimer’s studies, extracted from data, downloaded from their site. (image by author)

Step 2: Benchmarking Existing Models

My next step was a survey of off-the-shelf NER models, especially those trained on biomedical literature and available via Huggingface, the central repository for transformer models. Out of 19 candidates, only BioELECTRA-PICO (110 million parameters) [1] worked directly for extracting PICO elements, while the others are trained on the NER task, but not specifically on PICO recognition. Testing BioELECTRA on my own “gold-standard” set of 20 manually annotated trials showed acceptable but far from ideal performance, with particular weakness on the “Comparator” element. This was likely because comparators are rarely described in the trial summaries, forcing a return to a practical rule-based approach, searching directly the intervention text for standard comparator keywords such as “placebo” or “usual care.”

Step 3: Fine-Tuning with Domain-Specific Data

To further improve performance, I moved to fine-tuning, which was made possible thanks to annotated PICO datasets from BIDS-Xu-Lab, including Alzheimer’s-specific samples [2]. In order to balance the need for high accuracy with efficiency and scalability, I selected three models for experimentation. BioBERT-v1.1, with 110 million parameters [3], served as the primary model due to its strong track record in biomedical NLP tasks. I also included two smaller, derived models to optimize for speed and memory usage: CompactBioBERT, at 65 million parameters, is a distilled version of BioBERT-v1.1; and BioMobileBERT, at just 25 million parameters, is a further compressed variant, which underwent an additional round of continual learning after compression [4]. I fine-tuned all three models using Google Colab GPUs, which allowed for efficient training—each model was ready for testing in under two hours.

Step 4: Evaluation and Insights

The results, summarized in Table 2, reveal clear trends. All variants performed strongly on extracting Population, with BioMobileBERT leading at F1 = 0.91. Outcome extraction was near ceiling across all models. However, extracting Interventions proved more challenging. Although recall was quite high (0.83–0.87), precision lagged (0.54–0.61), with models frequently tagging extra medication mentions found in the free text—often because trial descriptions refer to drugs or “intervention-like” keywords describing the background but not necessarily focusing on the planned main intervention.

On closer inspection, this highlights the complexity of biomedical NER. Interventions occasionally appeared as short, fragmented strings like “use of whole,” “week,” “top,” or “tissues with”, which are of little value for a researcher trying to make sense of a compiled list of studies. Similarly, examining the population yielded rather sobering examples such as “percent of” or “states with”, pointing to the need for additional cleanup and pipeline optimization. At the same time, the models could extract impressively detailed population descriptors, like “qualifying adults with a diagnosis of cognitively unimpaired, or probable Alzheimer’s disease, frontotemporal dementia, or dementia with Lewy bodies”. While such long strings can be correct, they tend to be too verbose for practical summarization because each trial’s participant description is so specific, often requiring some form of abstraction or standardization.

This underscores a classic challenge in biomedical NLP: context matters, and domain-specific text often resists purely generic extraction methods. For Comparator elements, a rule-based approach (matching explicit comparator keywords) worked best, reminding us that blending statistical learning with pragmatic heuristics is often the most viable strategy in real-world applications.

One major source of these “mischief” extractions stems from how trials are described in broader context sections. Moving forward, possible improvements include adding a post-processing filter to discard short or ambiguous snippets, incorporating a domain-specific controlled vocabulary (so only recognized intervention terms are kept), or applying concept linking to known ontologies. These steps could help ensure that the pipeline produces cleaner, more standardized outputs.

Table 2: F1 for extraction of PICO elements, % of documents with all PICO elements partially correct, and process duration. (image by author)

A word on performance: For any end-user tool, speed matters as much as accuracy. BioMobileBERT’s compact size translated to faster inference, making it my preferred model, especially since it performed optimally for Population, Comparator, and Outcome elements.

Step 5: Making the Tool Usable—Deployment

Technical solutions are only as valuable as they are accessible. I wrapped the final pipeline in a Streamlit app, allowing users to upload clinicaltrials.gov datasets, switch between models, extract PICO elements, and download results. Quick summary plots provide an at-a-glance view of top interventions and outcomes (see Figure 1). I deliberately left the underperforming BioELECTRA model for the user to compare performance duration in order to appreciate the efficiency gains from using a smaller architecture. Although the tool came too late to spare my student hours of manual data extraction, I hope it will benefit others facing similar tasks.

To make deployment straightforward, I’ve containerized the app with Docker, so followers and collaborators can get up and running quickly. I’ve also invested substantial effort into the GitHub repo [5], providing thorough documentation to encourage further contributions or adaptation for new domains.

Lessons Learned

This project showcases the full journey of developing a real-world extraction pipeline — from setting clear objectives and benchmarking existing models, to fine-tuning them on specialized data and deploying a user-friendly application. Although models and data were readily available for fine-tuning, turning them into a truly useful tool proved more challenging than expected. Dealing with intricate, multi-word biomedical entities which were often only partially recognized, highlighted the limits of one-size-fits-all solutions. The lack of abstraction in the extracted text also became an obstacle for anyone aiming to identify global trends. Moving forward, more focused approaches and pipeline optimizations are needed rather than relying on a simple prêt-à-porter solution.

Figure 1. Sample output from the Streamlit app running BioMobileBERT and BioELECTRA for PICO extraction (image by author).

If you’re interested in extending this work, or adapting the approach for other biomedical tasks, I invite you to explore the repository [5] and contribute. Just fork the project and Happy Coding!

References

  • [1]          S. Alrowili and V. Shanker, “BioM-Transformers: Building Large Biomedical Language Models with BERT, ALBERT and ELECTRA,” in Proceedings of the 20th Workshop on Biomedical Language Processing, D. Demner-Fushman, K. B. Cohen, S. Ananiadou, and J. Tsujii, Eds., Online: Association for Computational Linguistics, June 2021, pp. 221–227. doi: 10.18653/v1/2021.bionlp-1.24.
  • [2]          BIDS-Xu-Lab/section_specific_annotation_of_PICO. (Aug. 23, 2025). Jupyter Notebook. Clinical NLP Lab. Accessed: Sept. 13, 2025. [Online]. Available: https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO
  • [3]          J. Lee et al., “BioBERT: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, Feb. 2020, doi: 10.1093/bioinformatics/btz682.
  • [4]          O. Rohanian, M. Nouriborji, S. Kouchaki, and D. A. Clifton, “On the effectiveness of compact biomedical transformers,” Bioinformatics, vol. 39, no. 3, p. btad103, Mar. 2023, doi: 10.1093/bioinformatics/btad103.
  • [5]          ElenJ, ElenJ/biomed-extractor. (Sept. 13, 2025). Jupyter Notebook. Accessed: Sept. 13, 2025. [Online]. Available: https://github.com/ElenJ/biomed-extractor

Source link

#Deploying #PICO #Extractor #Steps

Tags: artificial intelligencedata scienceEditors PickmedicineNamed Entity Recognition
Previous Post

Oklahoma’s big “TV nudes” scandal was… a Jackie Chan movie on a Samsung streaming service

Next Post

Making the Case for Technology To Drive Higher Ed Enrollment

AiNEWS2025

AiNEWS2025

Next Post
Making the Case for Technology To Drive Higher Ed Enrollment

Making the Case for Technology To Drive Higher Ed Enrollment

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
How to Leverage Slash Commands to Code Effectively

How to Leverage Slash Commands to Code Effectively

2026-01-11
The oceans just keep getting hotter

The oceans just keep getting hotter

2026-01-11
The full history of TiVo, and how it changed TV forever

The full history of TiVo, and how it changed TV forever

2026-01-11
Doomsday Glacier Bombarded by Earthquakes

Doomsday Glacier Bombarded by Earthquakes

2026-01-11

Recent News

How to Leverage Slash Commands to Code Effectively

How to Leverage Slash Commands to Code Effectively

2026-01-11
The oceans just keep getting hotter

The oceans just keep getting hotter

2026-01-11
The full history of TiVo, and how it changed TV forever

The full history of TiVo, and how it changed TV forever

2026-01-11
Doomsday Glacier Bombarded by Earthquakes

Doomsday Glacier Bombarded by Earthquakes

2026-01-11
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

How to Leverage Slash Commands to Code Effectively

How to Leverage Slash Commands to Code Effectively

2026-01-11
The oceans just keep getting hotter

The oceans just keep getting hotter

2026-01-11
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.