Artificial Intelligence in Ophthalmic Plastic and Reconstructive Surgery

April 2025

Angela McCarthy; Kaveri Thakoor, PhD; Lora R. Dagi Glass, MD

Introduction

Overview

Artificial intelligence (AI) refers to the simulation of human intelligence by computer systems.
As with other fields, in oculoplastics, AI has the potential to improve diagnostic accuracy, automate administrative tasks, improve data management, and enable personalized treatments and surgical planning.

Types of Artificial Intelligence

AI can be categorized based on the way it functions and the type of outputs it produces.
Discriminative AI
- Learns to distinguish between different categories or groups by identifying the “decision boundary” between them
- Example: Classifying clinical photographs as “thyroid eye disease” or “no thyroid eye disease”
Generative AI
- Learns underlying patterns in data and generates new, similar data rather than just classifying existing data
- Example: Generating synthetic facial images to assist with reconstructive surgery simulations
- Large Language Models (LLMs) are a type of generative model designed for text processing and creation. These models can automate note-taking and medical documentation, reducing the workload for physicians.

Training AI Models

Training an AI model involves exposing it to large datasets so it can learn patterns and trends relevant to a specific task. In oculoplastics, models might be trained on medical images and clinical data to assist with screening, diagnosis, treatment planning, and outcome prediction. Model performance depends on the quality, diversity, and quantity of the training data.
Training Data
- The reliability of an AI model hinges on the source and quality of its training data.
- Examples of data: clinical notes, external photographs, radiographic imaging, laboratory results
Diversity and Representation
- The performance of an AI model in clinical settings depends on how well the training data represents diverse patient populations (Agora, 2023).
- Dataset imbalances (e.g. overrepresentation of specific demographics or disease severity) can lead to biased predictions and reduced accuracy in underrepresented groups.
Multimodal Data
- New AI models are increasingly trained on multimodal data, which combines multiple data types (e.g. imaging, clinical notes, and lab results).
- For example, the diagnosis and management of thyroid eye disease (TED) benefits from a combination of clinical history, examination findings, laboratory results and orbital imaging. Integrating these data sources will teach the AI model to contextualize findings.
- Challenges include missing data, inconsistent formats, and privacy regulations, all of which complicate dataset construction.
Ground Truth / Reference Standard
- Ground truth, or reference standard, refers to (human) expert-assigned labels used to train AI models to make correct classifications (e.g. manual annotation of an image).
- This approach forms the basis of supervised learning, in which the model learns under the explicit guidance of experts and within the framework of current medical understanding. While this ensures alignment with established knowledge, it may limit the discovery of novel or subclinical disease patterns.
- The limited number of oculoplastic specialists and labor-intensive nature of annotation constrain both the size and consistency of available training datasets.
Unlabeled Data / Unsupervised Learning
- Unsupervised learning allows AI to identify patterns without relying on expert-labeled data, making large scale training more feasible.
  - This technique may reveal previously unrecognized trends in disease presentation. However, without expert guidance, models may identify clinically irrelevant patterns or produce misleading conclusions.
- Self-supervised learning falls under the umbrella of unsupervised learning but generates its own supervisory signals from the data itself, often by predicting one part of the data from another (e.g. reconstructing intentionally masked regions of an image). While these tasks help the model learn visual features, they do not guarantee the model will learn clinically meaningful patterns.
Data Augmentation
- Data augmentation artificially expands training datasets, which is especially helpful when primary data are limited.
- This technique increases training data diversity by introducing variations to existing data through rotations, flipping, noise injection, and other transformations.
- However, excessive or unrealistic augmentation can introduce artifacts that mislead the model or reduce its clinical relevance.

Model Learning

AI models learn iteratively by passing through the dataset multiple times. Each complete pass is known as an epoch.
By comparing predictions to ground truth labels, the model adjusts its predictions and learns to recognize complex patterns in data.
Neural networks are composed of layered mathematical functions. These functions contain weights, which determine the relative importance of input features in making predictions.
- For example, in a model detecting TED from facial photographs, regions of the image showing upper eyelid retraction or proptosis may be assigned greater weight than areas with less clinically relevant features.
Through a process called backpropagation, the model updates these weights by minimizing prediction errors across successive iterations.
However, due to the many layers and parameters in a neural network, AI decision-making is often viewed as opaque or difficult to explain.

Interpretability

To improve trust and transparency, researchers develop explainability techniques that help clarify how AI models arrive at their conclusions.
Methods such as Grad-CAMs and saliency maps highlight the regions of an image that most influenced the model’s decision (Hanif, 2021).
These tools allow clinicians to verify AI outputs, ensuring safer and more reliable use in patient care.

Model Testing and Performance Metrics

Once an AI model is trained, it must be rigorously tested to ensure it performs well across diverse patient populations. This involves testing the model on unseen data to analyze its generalizability.
Retrospective Testing
- Held-Out Internal Testing: The model is tested on a subset of the original dataset that was excluded from training (e.g., an 80/20 train-test split). This helps confirm the model’s ability to handle new cases from the same source. However, it does not ensure accurate performance on external data.
- External Testing: The model is tested on a separate dataset (e.g. from a different institution or a publicly available dataset). This assesses how well the model generalizes across patient populations, clinical settings, or imaging protocols.
Prospective Testing (Gold Standard)
- In prospective validation, the model is tested in real-time on new, incoming patient data.
- This approach best reflects how the AI will perform in clinical practice and is considered the gold standard for validation prior to deployment.

Performance Metrics

AI performance is measured using a few metrics.

Accuracy: Proportion of correct predictions among all cases.
Sensitivity (Recall): Measures the model’s ability to correctly identify true positive cases. High sensitivity reduces missed diagnoses.
Specificity: Measures the model’s ability to correctly identify true negatives. High specificity reduces false positives.
Area Under the Receiver Operating Characteristic Curve (AUROC): A metric that assesses how well the model distinguishes between different categories, such as a healthy and diseased state. A higher AUC (closer to 1.0) indicates better performance.
Precision-Recall Curve (PR): Useful when working with imbalanced datasets or rare conditions. Plots precision vs. sensitivity (recall), providing better insight than AUROC in scenarios in which false positives are more acceptable than false negatives.
Subgroup Analysis: Evaluates model performance across different demographic or clinical subgroups (e.g. age, sex, race, disease severity) to ensure the model performs consistently in all populations.

AI Applications in Oculoplastics

Screening & Diagnosis
- Detection of Eyelid Conditions: AI models trained on clinical photographs can detect conditions such as blepharoptosis, facilitating early diagnosis and triage for specialist evaluation (Hung, 2022) and (Abascal, 2023). Chen et. al (2021) developed DL models that could accurately quantify eyelid metrics such as margin reflex distance 1, margin reflex distance 2 and levator muscle function using smartphone photos. These tools could aid in population-level screening and support decision-making in primary or telemedicine clinics.
- Thyroid Eye Disease: AI models can analyze orbital CT scans or facial images to detect TED, assess disease activity, and predict treatment response (Chng, 2023).
- Eyelid Tumors: Deep learning models differentiate benign and malignant eyelid lesions, potentially improving diagnostic accuracy and guiding biopsy decisions (Li, 2022) and (Wang, 2024).
Surgical Planning
- Surgical Planning: AI tools are used to extract key periocular features for preoperative planning and analysis of postoperative changes (Yixin, 2022) and (Bahceci, 2021).
- 3D Modeling & Image Synthesis: Generative models can create personalized 3D reconstructions or simulate surgical outcomes, supporting detailed planning and improved preoperative patient counseling (Lim, 2023).
Clinical Documentation
- LLMs can automate notetaking, structure clinical notes, and summarize patient histories. This could reduce documentation workload, improve efficiency, and allow for more patient-focused interactions (Ma, 2025).
Academic Applications
- Literature Review & Data Analysis: AI tools can assist with organizing research data, conducting literature reviews, and providing statistical analyses for academic publications. However, LLMs are known to hallucinate, generating citations to papers that don’t exist or misquoting studies.
- Manuscript Writing & Editing: Generative AI tools can assist in manuscript editing and statistical analysis, saving time for researchers and clinicians and improving the quality of scholarly work. However, the author remains responsible for the accuracy and integrity of the work, and caution must be taken to avoid potential copyright or confidentiality issues when putting unpublished or novel content into AI systems.

Ethical Considerations

The use of AI in oculoplastics raises ethical challenges, particularly regarding privacy, algorithmic bias, consent and the impact on medical training. These issues are especially pertinent in oculoplastics due to the use of facial images, which are inherently identifiable and difficult to anonymize.

Privacy

Unlike de-identified medical images (e.g. CT/MRI), facial images cannot be fully anonymized without losing key diagnostic features, increasing privacy risks under HIPAA and similar regulations.
Emerging solutions like digital masking and federated learning may help protect patient data, but these techniques are still evolving (Yang, 2022) and (Sheller, 2020).
Research has shown that retinal images and iris scans can be used to identify individuals. In the future, AI models may be able to re-identify patients even from “de-identified” medical images, challenging existing notions of privacy protection (Nakayama, 2023) and (Nguyen, 2023).

Algorithmic Bias

AI models trained on non-diverse datasets may exhibit racial, ethnic, or gender biases, leading to inequities in diagnosis and treatment (Grzybowski, 2024).
In facial analysis, biases have been seen in both law enforcement AI systems and in ophthalmology, highlighting the need for representative training data.

Ownership and Consent

The use of patient facial images raises important questions about ownership and informed consent. When these images are used to train systems that generate new images or data, issues reminiscent of the Henrietta Lacks case arise, where biological materials were used to benefit society without proper consent (Sharma, 2024).
Thousands of facial images may be used without clear consent protocols, raising ethical concerns over how the patients’ likenesses are handled, stored, and reused.

Black Box Nature of AI

Many AI models lack transparency, making it difficult to understand how they generate diagnoses or recommendations.
Explainability techniques (e.g. Grad-CAMs) help visualize how models analyze facial images, but can be inconsistent.
Liability concerns arise when AI-influenced decisions lead to misdiagnosis or harm. Determining responsibility among developers, clinicians, and institutions remains a challenge.

Risks with Large Language Models

LLMs hallucinate (generate incorrect or fabricated information), posing risks in clinical documentation and research.
In AI-powered medical scribing, an LLM may fill in missing details that were never discussed, leading to inaccurate records.
Responsible use, strong prompting, and human oversight are key to preventing these errors (Ong, 2024).

Trainee Deskilling

AI tools may lead to cognitive offloading, in which trainees become overly reliant on technology for tasks they should master, such as disease recognition and surgical planning (Gerlich, 2025).
While AI could help reduce trainee burnout, it is important to balance automation with hands-on learning to preserve clinical expertise.

Conclusion

AI has the potential to offer new levels of precision, efficiency, and personalized care in oculoplastics. From advanced diagnostic tools powered by discriminative AI to innovative surgical outcome simulations enabled by generative models, AI is already demonstrating its potential to enhance clinical decision-making and improve patient outcomes.
Ultimately, the integration of AI into oculoplastics will depend not only on technological progress but also on a thoughtful approach to the ethical and practical challenges it presents. As the field moves forward, embracing AI’s benefits while remaining educated and vigilant about its limitations will ensure these tools enhance the human expertise at the heart of patient care.

References

Training AI Models

Arora A, Alderman JE, Palmer J, Ganapathi S, Laws E, McCradden MD, et al. The value of standards for health datasets in artificial intelligence-based applications. Nat Med. 2023 Nov;29(11):2929-2938. doi: 10.1038/s41591-023-02608-w.

Model Learning

Hanif AM, Beqiri S, Keane PA, Campbell JP. Applications of interpretability in deep learning models for ophthalmology. Curr Opin Ophthalmol. 2021;32(5):452-458. doi:10.1097/ICU.0000000000000780

AI Applications in Oculoplastics

Hung JY, Chen KW, Perera C, et al. An outperforming artificial intelligence model to identify referable blepharoptosis for general practitioners. J Pers Med. 2022;12(2):283. Published 2022 Feb 15. doi:10.3390/jpm12020283
Abascal Azanza C, Barrio-Barrio J, Ramos Cejudo J, Ybarra Arróspide B, Devoto MH. Development and validation of a convolutional neural network to identify blepharoptosis. Sci Rep. 2023;13(1):17585. Published 2023 Oct 16. doi:10.1038/s41598-023-44686-3
Chen HC, Tzeng SS, Hsiao YC, Chen RF, Hung EC, Lee OK. Smartphone-based artificial intelligence-assisted prediction for eyelid measurements: algorithm development and observational validation study. JMIR Mhealth Uhealth. 2021;9(10):e32444. Published 2021 Oct 8. doi:10.2196/32444
Chng CL, Zheng K, Kwee AK, et al. Application of artificial intelligence in the assessment of thyroid eye disease (TED) – a scoping review. Front Endocrinol (Lausanne). 2023;14:1300196. Published 2023 Dec 20. doi:10.3389/fendo.2023.1300196
Li Z, Qiang W, Chen H, et al. Artificial intelligence to detect malignant eyelid tumors from photographic images. NPJ Digit Med. 2022;5(1):23. Published 2022 Mar 2. doi:10.1038/s41746-022-00571-3
Wang L, Dai X, Liu Z, et al. AI-driven eyelid tumor classification in ocular oncology using proteomic data. NPJ Precis Oncol. 2024;8(1):289. Published 2024 Dec 23. doi:10.1038/s41698-024-00767-8
Yixin Qu, Bingying Lin, Shuiling Li, et al. Effect of multichannel convolutional neural network-based model on the repair and aesthetic effect of eye plastic surgery patients. Comput Math Methods Med. 2022;2022:5315146. Published 2022 Sep 1. doi:10.1155/2022/5315146
Bahçeci Şimşek İ, Şirolu C. Analysis of surgical outcome after upper eyelid surgery by computer vision algorithm using face and facial landmark detection. Graefes Arch Clin Exp Ophthalmol. 2021;259(10):3119-3125. doi:10.1007/s00417-021-05219-8
Lim B, Seth I, Kah S, et al. Using generative artificial intelligence tools in cosmetic surgery: a study on rhinoplasty, facelifts, and blepharoplasty procedures. J Clin Med. 2023;12(20):6524. Published 2023 Oct 14. doi:10.3390/jcm12206524
Ma SP, Liang AS, Shah SJ, et al. Ambient artificial intelligence scribes: utilization and impact on documentation time. J Am Med Inform Assoc. 2025;32(2):381-385. doi:10.1093/jamia/ocae304
Cai Y, Zhang X, Cao J, Grzybowski A, Ye J, Lou L. Application of artificial intelligence in oculoplastics. Clin Dermatol. 2024;42(3):259-267. doi:10.1016/j.clindermatol.2023.12.019

Ethical Considerations

Yang Y, Lyu J, Wang R, et al. A digital mask to safeguard patient privacy. Nat Med. 2022;28(9):1883-1892. doi:10.1038/s41591-022-01966-1
Sheller MJ, Edwards B, Reina GA, et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep. 2020;10(1):12598. Published 2020 Jul 28. doi:10.1038/s41598-020-69250-1
Luis Filipe Nakayama, João Carlos Ramos Gonçalves de Matos, Isabelle Ursula Stewart, et al. Retinal scans and data sharing: the privacy and scientific development equilibrium, Mayo Clinic Proceedings: Digital Health,Volume 1, Issue 2, 2023,67-74
Nguyen K, Fookes C, Sridharan S, Ross A. Complex-valued iris recognition network. IEEE Trans Pattern Anal Mach Intell. 2023;45(1):182-196. doi:10.1109/TPAMI.2022.3152857
Grzybowski A, Jin K, Wu H. Challenges of artificial intelligence in medicine and dermatology. Clin Dermatol. 2024;42(3):210-215. doi:10.1016/j.clindermatol.2023.12.013
Ong JCL, Chang SY, William W, et al. Ethical and regulatory challenges of large language models in medicine. Lancet Digit Health. 2024;6(6):e428-e432. doi:10.1016/S2589-7500(24)00061-X
Gerlich M. AI tools in society: impacts on cognitive offloading and the future of critical thinking. Societies. 2025; 15(1):6. https://doi.org/10.3390/soc15010006
Sharma, P. Responsible research in artificial intelligence: lessons from the past. AI & Soc (2024). https://doi.org/10.1007/s00146-024-01929-9
Veritti D, Rubinato L, Sarao V, De Nardin A, Foresti GL, Lanzetta P. Behind the mask: a critical perspective on the ethical, moral, and legal implications of AI in ophthalmology. Graefes Arch Clin Exp Ophthalmol. 2024;262(3):975-982. doi:10.1007/s00417-023-06245-4

Financial Disclosures

Angela McCarthy: None

Kaveri Thakoor, PhD: Research funding – TopCon Healthcare

Lora R. Dagi Glass, MD: Consultant – Ora Clinical; Scientific Advisory Board – Sling Therapeutics