Ran Xu

Department of Computer Science, Emory University

Room N410, Mathematics and Science Center

400 Dowman Dr, Atlanta, GA 30307

My name is Ran Xu. I’m a 4th year Ph.D. student in Department of Computer Science at Emory University, co-advised by Prof. Carl Yang and Prof. Joyce C. Ho. Before that, I obtained my bachelor’s degree (with Highest Honors) from the Department of Computer Science, Emory University in 2021, where I worked with Prof. Jinho Choi.

My current research interest focuses on large language models, with a special interest on augmented (e.g. retrieval augmented, tool-augmented) language models and their biomedical applications. I have also worked on synthetic data generation and llm alignment.

Feel free to drop me an email (ran.xu at emory dot edu) if you have any questions about my research, or want to discuss about potential collaborations.

I am looking for internship/fulltime industrial opportunities, starting from Spring 2025. Feel free to reach out if there is a good fit!


Educations

Emory University (2021 - Present)
Ph.D. in Computational Science and Informatics
GPA: 3.98/4.00
Research Focus: EHR modeling, Clinical NLP, Large Language Models.
Advisor: Prof. Carl Yang & Prof. Joyce Ho

Emory University (2017 - 2021)
B.S. in Computer Science, Double Major in Applied Mathematics
GPA: 3.97/4.00
Research Focus: Natural Language Processing.
Advisor: Prof. Jinho Choi


Industrial Experience

Amazon (May 2024 - Oct 2024)
Applied Scientist Intern, Amazon Search
Topic: LLM Fine-tuning for Self-improving Retrieval-Augmented Generation [preprint].
Mentor: Hui Liu, Manager: Qi He.

Meta Platforms, Inc. (May 2020 - Aug 2020)
Enterprise Engineer Intern
Mentor: Zexi Zhang


News

Sep 20, 2024 Three papers on LLMs for Text Retrieval, LLM Agents for Complex Tabular Reasoning and LLM Test-time Adaptation are accepted to EMNLP 2024.
May 20, 2024 Started my internship at Amazon Search!
May 16, 2024 Two papers on Synthetic Data Generation and Retrieval Augmented clinical predictions are accepted to ACL 2024.
Nov 28, 2022 Our paper Counterfactual and Factual Reasoning over Hypergraphs for Interpretable Clinical Predictions on EHR received the Best Paper Award (2 in total) at the Machine Learning for Health 2022.
Nov 19, 2022 One paper on Few-shot Learning for Language Models is accepted to AAAI 2023 as Oral presenation.

Selected Publications

  1. SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
    Ran Xu, Hui Liu, Sreyashi Nag, Zhenwei Dai, Yaochen Xie, Xianfeng Tang, Chen Luo, Yang Li, Joyce C Ho, Carl Yang, and Qi He
    arXiv preprint arXiv:2410.17952, 2024.
  2. BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers
    Ran Xu*, Wenqi Shi*, Yue Yu*, Yuchen Zhuang, Yanqiao Zhu, May Dongmei Wang, Joyce C. Ho, Chao Zhang, and Carl Yang
    Proceedings of EMNLP, 2024.
  3. EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records
    Wenqi Shi*, Ran Xu*, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce C. Ho, Carl Yang, and May Dongmei Wang
    Proceedings of EMNLP, 2024.
  4. RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records
    Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen Jin, May Dongmei Wang, Joyce Ho, and Carl Yang
    Proceedings of ACL, 2024. (Oral)
  5. Neighborhood-regularized Self-Training for Learning with Few Labels
    Ran Xu, Yue Yu, Hejie Cui, Xuan Kan, Yanqiao Zhu, Joyce Ho, Chao Zhang, and Carl Yang
    Proceedings of AAAI, 2023. (Oral)
  6. Counterfactual and Factual Reasoning over Hypergraphs for Interpretable Clinical Predictions on EHR
    Ran Xu, Yue Yu, Chao Zhang, Mohammed K Ali, Joyce C Ho, and Carl Yang
    Proceedings of ML4H, 2022. (Best Paper Award)