Can machine learning/artificial intelligence improve the clinical success rates of oncology drugs?
- Dowdy Jackson
- May 5
- 13 min read

Dowdy Jackson, Ph.D.
The challenges of oncology drug discovery
Oncology drug discovery is a challenging endeavor. The discovery and the clinical development of an oncology drug can take over a decade and cost millions of dollars (Figure 1).
Figure 1. Oncology drug development timelines and costs

Once a drug enters clinical development, the efficacy and safety of the drug is evaluated in cancer patients. The efficacy of a drug is evaluated using several metrics such as objective response rates (ORR), overall survival rates (OS), progression free survival (PFS) and duration of response (DOR). Overall survival is considered the gold standard because it is a direct measure of how long a patient’s life has been extended due to the treatment. The other end points, such as ORR, PFS and DOR, are secondary end points and are also used to assess the efficacy of the treatment (1).
The goal in oncology drug discovery is to eliminate the tumor or significantly reduce the growth and metastasis of the tumor while minimizing the side effects/toxicities the patient experiences. Oncology drug developers want to deliver the most efficacious dose of their drug at a frequency that will allow the systemic drug levels (exposures) to be high enough to maintain the anti-tumor effect for a longer period of time while low enough to minimize the drug’s side effects. The difference between the minimal efficacious drug levels and the drug levels that results in adverse toxicities is called the therapeutic window (Figure 2).
Figure 2. Therapeutic window

Drug developers want to maximize the therapeutic window because it gives them the ability to start at an efficacious dose and increase the dose, if needed, before reaching the systemic exposures that results in adverse toxicities. This allows for maximum suppression tumor growth for a longer period of time.
One of the most significant challenges for oncology drug developers is addressing the low clinical success rates. The overall clinical success rates for oncology drugs has been reported to be between 3.4% - 5.3% (2, 3). This is an aggregate success rate that includes all drug modalities, such as, small molecule inhibitors, biologics (e.g. antibodies, bispecific/multi-specific antibodies, antibody drug conjugates (ADCs), cell-based therapies (e.g. CAR-T), vaccines, etc). The clinical success rates vary depending on the type of drug modality being developed (Figure 3).
Figure 3. Oncology drug modalities and probability of clinical success rates

Figure 3. (A) Various drug modalities used to treat cancer patients. (B) Clinical success rates for the various modalities.
The modality with the lowest clinical success rate are small molecule inhibitors while the modalities with the highest success rates are chimeric antigen receptor T-cells (CAR-Ts) followed by monoclonal antibodies and ADCs. Several factors can impact the success of a drug, such as the tumors developing drug resistance to the treatment, maintaining the efficacious exposures in the tumor for a prolonged period of time, debilitating toxicities, etc. Patients who have been heavily pre-treated with other therapies tend to not respond as well to new therapies compared to patients who have not been previously treated with other therapies (e.g. treatment naïve patients).
PADECV (Enfortumab vedotin), which is an ADC used to treat bladder cancer patients, in combination with pembrolizumab, an antibody that inhibits the immune checkpoint protein PD-1 and enhances an immune response to the tumor, reported an OS of 32 months in treatment naïve patients while patients who had received chemotherapy had an OS of 16 months. Patients previously treated with an immunotherapy and cisplatin or carboplatin had an OS of 13 months while patients who received chemotherapy had an OS of 9 months (4). These data suggest that treating patients that have continued tumor growth after having been treated with multiple rounds of various therapies (i.e. drug resistant tumors) are a difficult patient population to treat.
Since the clinical success rate for oncology development is low, where are they failing during clinical development? Most oncology drugs fail in either phase 1 or phase 2 of clinical development. Once a drug transitions from phase 2 to phase 3, the likelihood of success increases (3, 5). The likelihood of an oncology drug transitioning from phase 1 to phase 2 is 48.8 - 62.8%, phase 2 to phase 3 is 24.6% and phase 3 to approval is 40.1 - 47.7% (Figure 4).
Figure 4. Probability of successful phase transition

The increased interest in machine learning and artificial intelligence begs the question of how can we use this technology in oncology drug discovery to improve the selection of the best preclinical lead candidates and improve the clinical success rates for oncology drugs?
The promise of machine learning for oncology drug discovery
Machine learning and artificial intelligence have quickly become buzz words used by companies to highlight how their therapeutics will differentiate from their competitors who are using traditional drug discovery methods. The traditional oncology drug discovery efforts have a high clinical failure rate. The ultimate goal of using AI is to improve the clinical success rates of oncology drugs (Figure 5).
Figure 5. Traditional drug discovery vs. predictive AI machine learning

Machine learning is often viewed as a “black box” where information is fed into a computer, the computer performs a series of computations and a result is provided (Figure 6). Various examples of what is happening in the black box is provided in the supplemental information section.
Figure 6. Machine learning is perceived to be a black box

So what is machine learning ? Machine learning is a branch of artificial intelligence that uses various methods/algorithms to analyze vast amounts of data that will enable better decisions of which drugs to develop and which drugs are effective. Machine learning algorithms can recognize patterns in the large datasets that would be difficult, if not impossible, for humans to recognize.
Within machine learning there are neural networks and within neural networks there is deep learning (Figure 7).
Figure 7. The relationship between artificial intelligence, machine learning, neural networks and deep learning

Machine learning can be further divided into methods and algorithms (Figure 8).
The machine learning methods can be divided into four categories. They are supervised, unsupervised, semi-supervised and reinforced.
Figure 8. Machine learning methods and algorithms

The implementation of machine learning in oncology drug discovery
The claim made by several companies developing AI for drug discovery is that their technology can shorten preclinical development timelines, identify novel lead candidates and provide cost saving compared to the companies using traditional drug discovery methods.
The largest expense, longest timelines and the greatest risk for oncology drug developers occurs during clinical development and not in preclinical development.
Status of AI developed oncology drugs
The majority of AI designed small molecule inhibitors are in early clinical development. The most advanced AI developed small molecule inhibitor is REC-2282, which targets HDAC, and is being used to treat Neurofibromatosis type 2 patients (Table 1). Several AI designed biologics have not progressed past preclinical development and those in clinical development are in early clinical development (Table 2).
Table 1. AI designed small molecule inhibitors in clinical development
Drug name | Target | Indication | Phase of development | Company |
ISM3091 | USP1 | BRCA-mutant cancer | Phase 1 | Insilico Medicine/Exelixis |
ISM8207 | QPCTL | Immuno-Oncology | Phase 1 | Insilico Medicine/Fosun Pharma |
ISM5043 | KAT6A | ER+ HER2- Breast Cancer | Phase 1 | Insilico Medicine/Menarini |
EXS-21546 | A2A Receptor | High adenosine signature cancers | Phase 1b/2 (Terminated) | Exscientia |
GTAEXS617 | CDK7 | Solid tumors | Phase 1/2 | Exscientia |
LP-300 | Tyrosine kinase inhibitor | NSCLC | Phase 2 | Lantern Pharma |
LP-284 | DNA damaging | Relapsed or refractory (R/R) lymphomas and solid tumors | Phase 1 | Lantern Pharma |
LP-184 | DNA damaging | Advanced solid tumors | Phase 1 | Lantern Pharma |
TEQ101 | MetAP2 | Renal cell cancer | Phase 1 | Cureteq (Paratex) |
BXCL701 | dipeptidyl peptidases (DPP) primarily DPP8/9, and DPP4 | Prostate cancer | Phase 2 | BioXcel Therapeutics |
EXS74539 | Lysine demethylase 1 (LSD1) | AML-SCLC | Phase 1 | Exscientia |
REC-4881 | MeK1/2 | Familial adenomatous polyposis (FAP) | Phase 2 | Recursion |
REC-617 | CDK7 | Advanced solid tumors | Phase 1/2 | Recursion |
REC-1245 | CDK12 | Solid tumors and Lymphomas | Phase 1 | Recursion |
REC-3565 | MALT1 | B-cell cancers | Phase 1 | Recursion |
REC-4539 | LSD1 | SCLC | Phase 1 | Recursion |
REC-2282 | HDAC | Neurofibromatosis type 2 | Phase 2/3 | Recursion |
RLY-2608 | PI3ka | Breast cancer and solid tumors | Phase 1/2 | Relay Therapeutic |
Pocenbrodib | CBP/p300 inhibitor | Prostate cancer | Phase 1/2 | Pathos AI |
P-500 | protein arginine methyltransferase 5 (PRMT5) | High-grade glioma and uveal melanoma | Phase 1 | Pathos AI |
Table 2. The status of biologics using AI in oncology drug discovery
Drug name (type of biologic) | Company | Phase of development |
ABS-301 (undisclosed antibody) | Absci | Preclinical (Lead selection) |
ABS-501 (anti-HER2) | Absci | Preclinical (Candidate ID) |
Unknown (ADC) | BigHat | Preclinical (IND enabling) |
Unknown (T-Cell Engager) | BigHat | Preclinical (Lead optimization) |
CAR-T | Generate Biomedicinces | Preclinical (IND enabling) |
ADC | Generate Biomedicinces | Preclinical |
Bispecific antibody | Generate Biomedicinces | Preclinical |
EVX-01 (Cancer vaccine liposomal peptide) | Evaxion | Phase 2 (Metastatatic melanoma) |
EVX-02 (Cancer vaccine - DNA) | Evaxion | Phase 1/2 (Adjuvant melanoma) |
NECVAX-NEO1 | AGC Biologics and NEC Bio Therapeutics | Phase 1/2 |
Several deals/partnerships between AI companies and pharma/biotech companies have been announced to develop the next generation of biologics (Table 3).
Table 3. AI deals with biotech and pharmaceutical companies focusing on oncology
AI company | Pharma/Biotech company | Date | Type of molecule |
Absci | AstraZeneca | December 2023 | Antibody |
Abbvie | BigHat | December 2023 | Antibody |
Sanofi | BioMap | October 2023 | Antibody |
Absci | Almirall | October 2023 | Antibody |
Eli Lilly | BigHat | April 2025 | Antibody |
Lonza/Synaffix | BigHat | November 2024 | ADC |
The majority of AI designed oncology drugs are in preclinical or early clinical development and it could be years before we see the impact AI had on improving clinical success rates.
One of the challenges oncology drug developers face is the lack of well documented data from failed clinical trials. Companies don’t usually publish their data from failed clinical trials. These data are important in training the machine learning models (Figure 9).
One of the reasons why AI has been successful in developing self-driving cars (e.g. Telsa), developing programs that can play chess (e.g. StockFish) or GO (e.g. AlphaGo) that can beat grand masters, determine 3D protein structures from the amino acid sequence (e.g. AlphaFold), aid in document preparation and review (e.g. CoPilot) and develop AI assistants (e.g. Alexa) is because there is a wealth of data used to train the models on what works (positive outcomes) and what doesn’t work (negative outcomes).
The data from failed clinical trials could be used to train the models to improve patient selection, optimize dose and schedules to widen the therapeutic window, identify biomarkers associated with patient response to treatment, and improve target selection. Efforts should be made to encourage companies to share the data from failed clinical trials because these data could lead to shorter development timelines, reduced drug development costs and improved clinical success rates, which would be a tremendous benefit to cancer patients.
Figure 9. AI models need a diverse set of data for training

Where can we see more near-term benefit from machine learning in oncology drug development? AI could have substantial near-term value in identifying patients with a greater likelihood of responding to a given treatment, de-risking antibody design to eliminate development issues and identifying novel small molecule inhibitors that have appropriate chemistry, manufacturing and controls (CMC) properties (6, 7).
Identifying the right patients to treat using machine learning
Machine learning is very good at pattern recognition. Thin sections of patient tumor samples that have undergone immunohistochemistry staining to identify the cells expressing tumor antigens are imaged and stored in a computer. A trained pathologist reviews the images and provides several data points such as the tumor histotype, tumor grade, whether the antigen staining is cell surface or cytoplasmic, whether the staining is described as high (3+), medium( 2+), low (1+) or no staining, assigning an H-score, etc. These data, along with the hundreds, if not thousands of images are used to train the machine learning models to recognize the patterns observed by the pathologist.
AstraZeneca and Daiichi Sankyo are codeveloping an antibody drug conjugate (Datopotamab deruxtecan (Dato-DXd)) that binds to TROP2 and kills TROP2 expressing cells. Dato-DXD was approved by the FDA on January 17, 2025 to treat HR+/HER2- metastatic breast cancer patients (8). AstraZeneca and Roche are codeveloping machine learning methods to help identify patients who most likely to respond to Dato-DXd using machine learning.
In order to identify which patients have the best response to Dato-DXd, a machine learning companion diagnostic, TROP2-quantitative continuous scoring (QCS) was developed to identify cell surface TROP2 expression from cytoplasmic TROP2 expression in patient tumor sections (9). TROP2-QCS is a fully supervised platform where patient tumor sections are stained for TROP2 expression and the images from the immunohistochemical (IHC) staining are scanned into the computer. The computer is trained to recognize cell surface TROP2 expression from cytoplasmic TROP2 expression. A normalized membrane ratio (NMR) is calculated where the lower the NMR, the higher the cytoplasmic TROP2 expression. The extent of TROP2 expressing cells in the tumor section are also evaluated where if 75% or more of the cells express TROP2 and have an NMR score of 0.56 or less then these tumors are described as being positive for TROP2 expression (Figure 10).
Figure 10. How TROP2-QCS works

TROP2-QCS was used to evaluate the NSCLC patient responses to Dato-DXd in the TROPION-Lung01 Phase III trial (Table 4). The data shows NSCLC patients identified as having cell surface expression of TROP2, via TROP2-QCS (positive), had a median PFS of 6.9 months and an ORR of 32.7% compared to a PFS of 2.9 months and an ORR of 16.9% for patients with cytoplasmic expression of TROP2 via TROP2-QCS. These data suggest that TROP2 cell surface expression may be a biomarker to identify patients likely to respond to Dato-DXd.
Unfortunately the BLA for the use of Dato-DXd to treat NSCLC patients was withdrawn due to a lack of improved OS relative to patients treated with Docetaxel (10). NSCLC patients treated with Dato-DXd had an OS of 12.9 months compared to 11.8 months for patients treated with Docetaxel (Table 5). This retrospective analysis of NSCLC patients from the TROPION-Lung01 trial shows promise in identifying patients with improved response to Dato-DXd. The Ventana TROP2 RxDx Device, which is a computational pathology platform developed by Roche, received breakthrough device designation by the FDA on April 29, 2025 (11).
Table 4. Comparison of clinical responses of patients treated with Dato-DXd using TROP2-QCS

Table 5. TROPION-Lung01 Phase III trial summary

Some AI focused companies, such as Owkin, AiForia, and PathAI are developing machine learning based biomarker identification and pathologist support tools to identify patients who will respond to various treatments. These methods will hopefully improve the clinical success rates for oncology drugs. The FDA is implementing guidance to address the increase in the number of medical devices being submitted for review to the agency (12).
Summary
Machine learning holds great promise in aiding the development of novel therapies, in the identification of biomarkers to help identify the right patients to treat, which result in improved clinical success rates. The challenges will be in the implementation of these various machine learning methods and technologies along with the collection of quality controlled patient data from both successful and failed clinical trials.
The utility of machine learning in the discovery of new therapeutics for cancer patients whether they be small molecule inhibitors, biologics or vaccines also holds great promise but these efforts are in the early stages of discovery and development and it will take years before we see which technologies and methods are successful in increasing the clinical success rates.
There are efforts using machine learning to repurpose FDA approved drugs for use in indications outside of the indications they were approved (13). Eli Lilly’s drug, Baricitinib (Olumiant), was approved for rheumatoid arthritis and was repurposed as a treatment for COVID-19 based on machine learning results (14, 15). In 2022, Baricitinib was granted emergency use authorization for the treatment of COVID-19 patients (16).
It is reasonable to believe that the repurposing of FDA approved drugs may also identify effective treatments for cancer patients. The increase in the number of companies focusing on the use of machine learning and the guidance from the FDA and other regulatory agencies will hopefully result in improving the lives of cancer patients.
Machine learning – supplemental information
Supervised machine learning uses input data in the form of features and labels. Features are data or characteristics used to identify the label. The label is the answer we provide to the machine learning algorithm that is associated with the defined features. We want the algorithm to provide us with the answer based on the input data associated with the features. Once the model is trained, a test data set is used to evaluate the accuracy of the model (Figure S1). In the example, the computer is trained to identify a defined set of fruit. The images of the fruit and a label associated with each image is provided and the model is trained using these data. Once the model has been trained, a test data set of images is given to the model to test the accuracy of the model. The model should accurately identify the images used to train the model and apply the correct labels.
Figure S 1. Supervised Machine Learning

Figure S1.
1. Images of different types of fruit are defined and used to train the machine learning model.
2. A test set of images of fruit are used to evaluate the accuracy of the algorithm to identify the algorithm.
3. The algorithm provides the output where it identifies the data from the test set.
Unsupervised machine learning uses unlabeled data and relies on the model to identify common features and patterns to group the objects and make associations with the features (Figure S2). For example. Similar objects will be grouped together.
Figure S 2. Unsupervised Machine Learning

Semi-supervised machine learning uses a combination of labeled and unlabeled data to train the model (Figure S3). The model will correctly identify the labeled objects and a user will provide a label for the unidentified object and will retrain the model based on this new input. The model will subsequently correctly identify the new entry.
Figure S 3. Semi-supervised Machine Learning

Reinforced machine learning uses a series of rewards and penalties to train the model. There is no training set used for this type of model therefore the agent learns based on the rewards and penalties it gathers from each action as it navigates its environment (Figure S4). This technique has been used to teach robots how to navigate a maze, self-driving cars, and in games such as AlphaGo, and AlphaZero.
Figure S 4. Reinforced Machine Learning

References
1. Delgado A, Guddati AK. Clinical endpoints in oncology - a primer. Am J Cancer Res. 2021;11(4):1121-31. Epub 20210415. PubMed PMID: 33948349; PubMed Central PMCID: PMC8085844.
2. Wong CH, Siah KW, Lo AW. Estimation of clinical trial success rates and related parameters. Biostatistics. 2019;20(2):273-86. doi: 10.1093/biostatistics/kxx069. PubMed PMID: 29394327; PubMed Central PMCID: PMC6409418.
3. David Thomas DC, Amanda Micklus, Sara LaFever, Michael Hay, Shomesh Chaudhuri, Robert Bowden and Andrew W. Lo. Clinical Development Success Rates and Contributing Factors 2011–2020. Biomedtracker [Internet]. 2021 February 2021.
4. Pfizer Aa. PADECV Clinical Safety. Available from: https://www.padcev.com/trial-results.
5. Thayer AM. Clinical trials by the numbers. C&EN Global Enterprise [Internet]. 2016 2016/07/04; 94(27):[26-7 pp.]. Available from: https://doi.org/10.1021/cen-09427-bus.
6. Zheng J, Wang Y, Liang Q, Cui L, Wang L. The Application of Machine Learning on Antibody Discovery and Optimization. Molecules. 2024;29(24). Epub 20241216. doi: 10.3390/molecules29245923. PubMed PMID: 39770013; PubMed Central PMCID: PMC11679646.
7. Wang Y, Hu Z, Chang J, Yu B. Thinking on the Use of Artificial Intelligence in Drug Discovery. J Med Chem. 2025;68(5):4996-9. Epub 20250224. doi: 10.1021/acs.jmedchem.5c00373. PubMed PMID: 39993334.
8. FDA approves datopotamab deruxtecan-dlnk for unresectable or metastatic, HR-positive, HER2-negative breast cancer [Internet]. 2025. Available from: https://www.fda.gov/drugs/resources-information-approved-drugs/fda-approves-datopotamab-deruxtecan-dlnk-unresectable-or-metastatic-hr-positive-her2-negative-breast
9. Novel computational pathology-based TROP2 biomarker for datopotamab deruxtecan was predictive of clinical outcomes in patients with non-small cell lung cancer in TROPION-Lung01 Phase III trial [Internet]. 2024. Available from: https://www.astrazeneca.com/media-centre/press-releases/2024/novel-computational-pathology-based-trop2-biomarker-for-dato-dxd-was-predictive-of-clinical-outcomes-in-patients-with-nsclc-in-tropion-lung01-phase-iii-trial.html
10. Datopotamab Deruxtecan Application in the EU for Patients with
Advanced Nonsquamous Non-Small Cell Lung Cancer Voluntarily
Withdrawn [Internet]. 2024. Available from: https://www.daiichisankyo.com/files/news/pressrelease/pdf/202412/20241224_E.pdf
11. Roche granted FDA Breakthrough Device Designation for first AI-driven companion diagnostic for non-small cell lung cancer [Internet]. 2025. Available from: https://diagnostics.roche.com/us/en/news-listing/2025/roche-granted-fda-breakthrough-device-designation-for-first-ai-driven-companion-diagnostic-for-non-small-cell-lung-cancer.html
12. FDA. Artificial Intelligence and Machine Learning in Software as a Medical Device. 2025.
13. Urbina F, Puhl AC, Ekins S. Recent advances in drug repurposing using machine learning. Curr Opin Chem Biol. 2021;65:74-84. Epub 20210716. doi: 10.1016/j.cbpa.2021.06.001. PubMed PMID: 34274565; PubMed Central PMCID: PMC8671152.
14. Smith DP, Oechsle O, Rawling MJ, Savory E, Lacoste AMB, Richardson PJ. Expert-Augmented Computational Drug Repurposing Identified Baricitinib as a Treatment for COVID-19. Front Pharmacol. 2021;12:709856. Epub 20210728. doi: 10.3389/fphar.2021.709856. PubMed PMID: 34393789; PubMed Central PMCID: PMC8356560.
15. Richardson PJ, Robinson BWS, Smith DP, Stebbing J. The AI-Assisted Identification and Clinical Efficacy of Baricitinib in the Treatment of COVID-19. Vaccines (Basel). 2022;10(6). Epub 20220615. doi: 10.3390/vaccines10060951. PubMed PMID: 35746559; PubMed Central PMCID: PMC9231077.
16. Frequently Asked Questions on Olumiant (Baricitinib) for the Treatment of COVID-19 [Internet]. 2022. Available from: https://www.fda.gov/media/143825/download
Comments