About the position
Responsibilities
• The successful candidates will participate in ground-breaking research projects that need advanced software solutions requiring expertise in software engineering not commonly found in scientific collaborations.
• The projects will require development of state-of-the art clinical NLP solutions using the latest deep learning libraries trained on state-of-the-art hardware in secure healthcare computing environments.
• Projects will involve analysis of massive data sets either in the cloud or on premises.
• Projects will require development of novel NLP software pipelines for processing of unstructured clinical notes.
• Some projects may require deep engagement, possibly leading to co-authorship on scientific publications, while others may involve a more casual consulting engagement.
• They may require software solutions developed from scratch or refactoring existing solutions to make them conform to industry standards (quality, efficiency, reusability, robustness, portability, documentation, etc.).
• It is a high-level goal of DSAI to translate the efforts for the individual projects into frameworks and template patterns for sustainable scientific infrastructure benefiting future projects.
Requirements
• Strong NLP, LLM, machine learning and deep learning skills.
• Practical experience building NLP models and pipelines in a secure, HIPPA compliant healthcare environment.
• Expert-level knowledge of multiple modern NLP and LLM libraries and models.
• Hands-on experience adapting and fine-tuning large language models for domain-specific clinical applications, with attention to data efficiency, interpretability, and reproducibility.
• Demonstrated expertise in prompt engineering, evaluation, and benchmarking of large language models, including applying responsible AI principles in clinical or sensitive-data contexts
• Expert-level knowledge of the Python programming language.
• Familiarity with or willingness to learn C++ or other languages as may be needed.
• Familiarity with software containerization technologies such as Docker and Singularity.
• Familiarity with the Databricks platform.
• Fluency in the Linux operating system and related tools.
• Familiarity with modern software engineering best practices, such as Git source control, peer code review, test-driven development, build automation and continuous integration / continuous delivery.
• Familiarity with cloud development and deployment.
• Demonstrated leadership and self-direction.
• Willingness to teach others both informally and in short course format.
• Willingness to continually learn new tools and techniques as needed.
• Excellent verbal and written communication.
• Masters in a quantitative discipline such as computer science, engineering, physics or bioinformatics, with strong scientific computing and/or mathematics background.
• Three year's experience working in software development in large clinical NLP projects in industry or academia.
• Additional education may substitute for required experience, and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula.
Nice-to-haves
• PhD in a quantitative discipline.
• Five (5) years' experience as above in clinical NLP.
• Experience in CUDA GPU programming.
• Experience authoring open-source Python packages in PyPI.
• Experience in open-source project governance.
• Experience in open-source community adoption initiatives.
Apply Now
Apply Now