Postdoctoral Position Available in the Department of Emergency Medicine — The Data Mining (DM) Lab

The Data Mining (DM) Lab at Yale University lead by Dr. Samah Fodeh has an opening for a postdoctoral researcher to participate in a series of projects that focus on leveraging healthcare data to improve patient care. The work spans work with structured and unstructured data in the electronic health record (EHR) as well as MyChart data, with the opportunity to work on applications of machine learning/deep learning/ Natural Language Processing in novel areas of healthcare. The position is open for a driven individual with a PhD in data science, computer science, biomedical informatics, or a similar background with some experience working with large datasets. Prior experience with healthcare is not required but will be helpful. The ideal candidate will have an interest in broad career development in a dynamic environment that allows them to develop as a leader in healthcare data science and innovation.

Under the direction of the Principal Investigator, the ideal candidate is expected to lead several efforts, including working with the team to develop, discover and apply novel machine learning/natural language processing/deep learning applications to healthcare. They will also lead or assist drafting analytical sections for peer-review publication for various project. Responsibilities will also include participating in the design, implementation, and maintenance of data pipelines and leading/assisting in building algorithms for deep learning with close collaboration from the study team.
 

Essential Duties:

  • Develop, fine-tune, and evaluate LLMs (e.g., GPT, Llama, Mistral, Falcon, BioGPT, ClinicalBERT, MedPaLM) for domain-specific tasks such as entity extraction, summarization, temporal reasoning, and clinical note understanding.
  • Design end-to-end NLP pipelines, including data preprocessing, named entity recognition (NER), relation extraction, and text classification.
  • Implement and optimize training and fine-tuning workflows using:
    • PyTorch, Hugging Face Transformers, PEFT (LoRA, QLoRA), and Deepspeed
    • Experiment tracking with Weights & Biases, MLflow, or TensorBoard
  • Apply data engineering and processing tools such as Pandas, NumPy, spaCy, scikit-learn, and NLTK.
  • Develop model evaluation frameworks incorporating both intrinsic and extrinsic metrics (e.g., precision/recall/F1, BLEU, ROUGE, BERTScore, and human expert evaluation).
  • Collaborate with clinicians, informaticians, and data engineers to translate model outputs into clinically meaningful insights.
  • Contribute to peer-reviewed publications, grants, and conference presentations.

Required Education and Experience

PhD Degree in computer science, applied/computational mathematics, engineering, bioinformatics, data science, or a related field and two years of demonstrated experience or an equivalent combination of education and demonstrated experience.

Required Skill/Ability 1:

LLM deployment and inference optimization (using LangChain, LlamaIndex, or vLLM). RAG, multi-modal LLM integration, or structured knowledge graph augmentation.

Required Skill/Ability 2:

Containerization and reproducible environments (Docker, Conda, or Kubernetes). Version control and CI/CD tools (GitHub Actions, GitLab CI)

Required Skill/Ability 3:

Sound background in theoretical and applied machine learning/deep learning with applications to either language, signals, or images.

Required Skill/Ability 4:

Demonstrated strong ability to communicate technical ideas and results to non-technical customers in written and verbal formats.

Required Skill/Ability 5:

Strong organizational, time management, and leadership skills. Ability and willingness to work in a highly collaborative team environment and matrixed organization.

Please email samah.fodeh@yale.edu if interested. 

Yale is an affirmative action/equal opportunity employer. Yale values diversity among its students, faculty, and staff and welcomes applications from women, members of minority groups, persons with disabilities, and protected veterans.