Generative AI in Health and the Crucial Challenge of Our Era

Generative Artificial Intelligence (GenAI) has emerged as a force of major change, much like the internet was when it first appeared in the 1990s. GenAI has the extraordinary capacity to swiftly analyse data, discern patterns, and generate insights, sparking both enthusiasm and alarm across the medical, health and other sectors. However, as we delve deeper into the application of GenAI integration in healthcare, it is imperative to tread cautiously, mindful of the intricate nuances and potential pitfalls that accompany this technological advancement.

Where It All Began

The inception of artificial intelligence dates back to the early 1950s when researchers worldwide delved into exploring the realm of creating intelligent machines. In 1950, Alan Turing's groundbreaking publication "Computing Machinery and Intelligence" unveiled the Turing test, marking a pivotal moment in the inception of artificial intelligence (AI).

In 1951, Marvin Minsky and Dean Edmonds achieved a significant milestone by developing the first artificial neural network (ANN) named SNARC. This invention utilised 3,000 vacuum tubes to simulate a network of 40 neurons, laying a foundational brick in the evolution of AI technology. While multiple countries contributed significantly to this pursuit, the United States emerges as the primary birthplace of artificial intelligence.

Amidst diverse global efforts, pivotal breakthroughs in AI research unfolded within American academic and research domains, notably at Dartmouth College and the Massachusetts Institute of Technology (MIT). John McCarthy, an eminent American computer scientist not only coined the term "artificial intelligence" but also spearheaded the organisation of the landmark Dartmouth Conference in 1956. This pivotal event convened leading minds in AI research, laying the groundwork for subsequent advancements in the field.

AI in healthcare has roots dating back to the 1970s when it was initially employed to tackle biomedical challenges. Since then, AI-powered applications have evolved significantly, such as diagnostics, driving cost reduction, enhancing patient outcomes, and boosting overall operational efficiencies.

Today, the competition to lead in AI dominance is fierce, and although the United States and China garner significant attention for their advancements in AI, countries across the globe are actively exploring and investing in this technology, yielding groundbreaking discoveries and attracting substantial private investments. From Israel in the Middle East to the UK, France, and Germany in Europe, and spanning to India, Japan, and Singapore in Asia, where notable advancements in this technology are taking place.

According to Stanford's Artificial Intelligence Report 2023, global private investment in AI soared to $91.9 billion in 2022. Goldman Sachs projects a robust growth trajectory, forecasting that global AI investment will surge to $110.2 billion in 2023 and escalate further to $158.4 billion by 2025.

Passing and Failing Medical Exams

Recently, Open Evidence an AI developed by the Mayo Clinic achieved a 91% score on the US Medical Licensing Exam. Other competitors on the same exam scored the following.

Googles Med-PAL2 achieved an 82% score
Anthropic Claude 2 - 66%.
Chat GPT – 58%.
GPT 4 scored 88%.

Conversely, in a recent study presented at the Royal College of General Practitioners (RCGP) Annual Conference in 2023, it was revealed that ChatGPT, failed to pass the UK’s National Primary Care examinations.

The recent buzz surrounding ChatGPT's purported success in medical exams serves as a reminder of the delicate balance needed between innovation and human expertise in healthcare and medical research. While AI models such as ChatGPT exhibit remarkable capabilities, they are not infallible substitutes for human intelligence and clinical judgment, where life-altering decisions are required - the human touch remains indispensable.

Consider a recent case presented to AMN involving a 74-year-old male admitted to a Sydney hospital with severe abdominal and moderate chest pain. Despite extensive testing, the hospital medical team were unable to pinpoint the root cause of his severe pain. Even if AI had been utilised to offer potential diagnoses and treatment recommendations, it would not have identified the underlying issue. It was only after the patient left the hospital armed with his test results in hand and sought opinions from different specialists that a simple elimination of a key lifestyle factor—daily alcohol consumption—would act as a preventative measure against future pancreatic attacks and cancer.

This scenario underscores the invaluable role of human intuition and deductive reasoning in navigating the complexities of medical conditions. It also highlights the importance of frameworks that complement rather than control or replace medical expertise, ensuring patient safety and correct health delivery remains paramount.

As we contemplate the integration of generative AI into medical research and publishing, it is crucial to acknowledge the exponential pace of technological advancement and its implications for institutional readiness. The disparity between the rapid evolution of AI technology and the sluggish adaptation of regulatory frameworks poses significant challenges. Organisations such as the International Committee of Medical Journal Editors have begun to provide guidance on the use of generative AI, yet the landscape remains fragmented, with inconsistencies in approach and disclosure standards.

Coexistence of Human Intelligence and AI

Consider a scenario where a team of medical researchers is tasked with analysing vast datasets to identify potential correlations between genetic markers and disease susceptibility. Utilising generative AI algorithms, the researchers can rapidly sift through immense amounts of data, identifying subtle patterns and associations that may elude human perception. AI streamlines the data analysis process, accelerating the pace of discovery and enabling researchers to uncover novel insights and approaches. Moreover, the rapid delivery of data and insights facilitates expedited exploration of alternative pathways, a feat previously unattainable without AI-driven advancements.

However, the process does not end with AI-driven data analysis. Human expertise is indispensable in interpreting AI-generated findings, contextualising them within the broader scope of medical knowledge, and discerning actionable insights. How do we test the authenticity of the data outcome and insight provided, what if it is wrong and then potentially sends the individual and team into an expensive and damaging direction. In this scenario, human intelligence complements AI capabilities by providing critical insights derived from clinical experience, theoretical knowledge, peer to peer discussions, multi-viewed debate and ethical considerations. By leveraging the strengths of both AI and human intelligence, the research team can achieve synergistic outcomes, advancing medical knowledge and improving patient care.

Potential Pitfalls and Dangers

Despite its transformative potential, the indiscriminate reliance on AI in medical decision-making also poses risks. Consider a hypothetical scenario where an AI-powered diagnostic tool incorrectly identifies a benign tumour as malignant, leading to unnecessary surgical intervention and patient harm. This scenario presents a myriad of risks that extend across various realms, including insurance, regulatory bodies for practitioners, legal considerations, and ethical dimensions. Doctors will not be able to absolve themselves of responsibility by blaming the AI. Consequently, will we see AI-related risks be incorporated into patient contracts and hospital admission forms? In this instance, the limitations of AI algorithms, such as algorithmic biases or inadequate training data, can result in erroneous diagnoses and treatment recommendations.

Recent research conducted by the CSIRO and the University of Queensland (UQ) reveals that while ChatGPT performs well with simple, question-only prompts, it struggles with evidence-biased questions, leading to a low accuracy rate of 28%. This finding challenges the notion that using prompts can prevent hallucinations. Dr. Bevan Koopman, the lead author of the study, emphasises the importance of understanding the impact of prompts, especially as large language models (LLMs) gain popularity for health information.

The study involved asking ChatGPT, 100 consumer health questions, ranging from basic inquiries to those with evidence-based prompts. Results show that while ChatGPT accurately answers simple questions 80% of the time, accuracy drops to 63% with evidence-based prompts and further to 28% when uncertain answers are allowed. Prompting with evidence not only fails to correct incorrect answers but also causes correct answers to become incorrect, raising concerns about relying on retrieve-then-generate pipelines to mitigate model hallucinations. This study underscores the limitations of LLMs in providing accurate health information, especially when prompted with evidence, which may introduce noise and lower accuracy.

Further to this, the cautionary tale painted by Mustafa Suleyman and Michael Bhaskar in "The Coming Wave" serves as a sobering reminder of society's unpreparedness for the profound changes AI will usher in and our reluctance to confront the uncertainties of AI-driven transformation, which they term the "pessimism aversion trap."

Amidst the widespread frenzy surrounding AI, Suleyman cautions that governments and people are blindly overlooking "the containment problem" — the imperative of maintaining control over potent technologies — which he identifies as the fundamental challenge of our age. It is widely acknowledged that misplaced power poses significant dangers, not only affecting patient outcomes and health systems but also presenting broader and detrimental risks to humanity.

Recommendations for Moving Forward

To navigate the intersection of generative AI and medicine effectively, several key recommendations are critical to be considered:

Promote Collaboration: Foster interdisciplinary collaboration between AI experts, medical professionals from all spheres, ethicists, and regulatory bodies to develop AI-driven healthcare solutions that prioritise patient safety, ethical principles, and clinical efficacy.
Transparency and Accountability: Establish transparent guidelines and regulatory frameworks for AI-driven healthcare systems, ensuring accountability, explainability, and oversight throughout the development, deployment, and evaluation phases.
Continual Evaluation and Improvement: Implement robust mechanisms for ongoing evaluation and refinement of AI algorithms, incorporating feedback from medical professionals and real-world clinical data to enhance accuracy, reliability, and performance.
Ethical Considerations: Integrate ethical considerations into the design and implementation of AI-driven healthcare systems, safeguarding patient privacy, autonomy, the doctor patient relationship, and dignity while mitigating the risks of algorithmic bias, discrimination, and unintended consequences.
Interdisciplinary Oversight Committees: Establish interdisciplinary oversight committees comprised of experts from diverse fields, including medicine, computer science, ethics, and law, to monitor AI developments, address emerging ethical challenges, and recommend policy interventions as needed.
Public Awareness and Education: Educate the public about the capabilities, limitations, and potential risks of AI in healthcare, fostering informed discussions and promoting transparency regarding its implementation and impact on medical practices.
Regulation and Governance: Enact comprehensive regulations and governance structures to oversee the development, deployment, and usage of AI in healthcare, preventing its misuse or exploitation by corporations or individuals for profit or control purposes.
Empowerment of Medical & Health Professionals: Empower healthcare providers with the knowledge and training necessary to understand, critically evaluate, and effectively utilize AI technologies, ensuring that they retain control over clinical decision-making and patient care.

While the potential of generative AI to revolutionise medicine is undeniable, its integration must be approached with circumspection. We must harness the power of AI as a complement to human intelligence, leveraging its capabilities while preserving the essential elements of collaborative efforts, governance, empathy, intuition, and ethical discernment that define compassionate medical care. By navigating the intersection of AI and medicine with vigilance and foresight, we can unlock new frontiers of innovation while safeguarding the sanctity of patient well-being and healthcare excellence.

References

https://aclanthology.org/2023.emnlp-main.928.pdf

https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf

https://www.the-coming-wave.com/