I am a passionate data scientist and engineer with five years of work experience. I’ve been fortunate enough to work on projects that span a variety of disciplines and tap into what I loved most about my studies in Data Science, Electrical Engineering, and Applied Mathematics: finding patterns, analyzing signals, and implementing solutions.
One of my biggest strengths is immersing myself so deeply in a dataset or field that I become the go-to resource on teams to explain data anomalies and model behavior. I get most excited when I’m able to come up with a creative idea for an algorithm based on my knowledge of how the data is generated. Recently, I’ve had a lot of fun exploring the intersection of machine learning and bioinformatics.
I have a strong programming background and am comfortable with the full stack of development. Coworkers have described me as the ‘Swiss Army knife’ of the team because of my ability to quickly pick up new technical skills and contribute to where I’m needed most- whether that’s developing new algorithms, training models, setting up data pipelines, picking up frontend or backend development tasks, designing software and leading a technical team, or even sometimes waking up at 5 a.m. to email the official daily COVID death count to the Director of the CDC.
January 2021 - August 2024
Degree: Master's in Data Science
Relevant Coursework: Advanced Applied Machine Learning, Theory of Machine Learning, Foundations of Neural Networks, Database Systems, Data Visualization, Optimization, Statistical Models and Regression
Data Scientist and Software Developer | October 2020 - Present | Laurel, MD
Designed and implemented a locally hosted chatbot application for data-sensitive usage. Evaluated, selected, and deployed open-source LLMs. Built the prompt construction, memory management, and post-processing workflows.
Developed a tissue classifier model to identify tissues from unknown mixed samples using an ensemble of One vs Rest and strategically selected One vs One random forests to enable multilabel classification across highly similar and imbalanced classes. Improved generalization and enabled more rigorous validation of model to varied collection procedures, sequencing technologies, and chemistries by identifying and incorporating publicly available genomic datasets.
Led team of 6, responsible for daily delivery of the County Cases and Death data posted directly to the Center for Disease Control’s (CDC) official Covid Data Tracker, used by policymakers and healthcare officials nationwide to inform critical decisions. Designed and implemented an automated COVID-19 time series data delivery pipeline, resulting in an 85% reduction in delivery time. Supported frequent meetings with state health departments to coordinate collection processes on behalf of the CDC. Guided CDC in their transition to weekly COVID data reporting by presenting detailed analyses of decision trade-offs. See Tracking COVID-19 in the United States With Surveillance of Aggregate Cases and Deaths for more details on this effort.
Ensured data fidelity and real-time availability for the widely recognized Johns Hopkins COVID Dashboard, working 3-4 on-call shifts biweekly over the course of two years to update web scrapers as their sources were updated. Project was awarded TIME Best Invention of 2020: 2020’s Go-To Data Source ; Team recognized by Fast Company as 2021’s Innovative Team of the Year.
Designed and led an interactive workshop on LangChain as part of a Generative AI Workshop Series, giving an audience of 150+ people hands-on experience in working with LangChain to integrate LLMs into their applications. Documentation and code were circulated widely across the laboratory and have been used as a reference for many getting started with LLMs.
Led company-wide innovation challenge program focused on developing early-career staff members. Developed the 2023 challenge topic and planned a 6 month program involving 10+ staff development events for 50 participants from across all sectors of the lab. Efficiently managed a $950k program budget. Mentored two team leads one-on-one.
Signal Processing Engineer | September 2019 - October 2020 | Lexington, MA
Performed statistical analysis on large volumes of data to characterize features such as point spread functions of targets, distortion of cameras, and components of various focal planes, providing key insights for the development of new algorithms.
Developed novel filtering technique for bad pixel suppression, cutting false alarms by 94%.
Developed infrared raw imagery simulator for testing algorithms.
Software Engineering Intern | May 2018 - August 2018 | Boulder, CO
Developed novel algorithms for automatic detection of critical anatomy in laparoscopic video using OpenCV.
Electrical Engineering Intern | May 2017 - August 2017 | Laurel, MD
Designed and implemented a 2-D tracking system for infrared camera data in Matlab incorporating Kalman filtering, track association algorithms, image processing, & other computer vision techniques.
Designed a novel method for separating read fragments from a mixture of SARS-COV-2 virus sequences to enable early detection of emerging strains. Trained a GAT to create embedding vectors from a graph of reads from unknown sources, optimized to cluster reads by original source sequence. Achieved an Adjusted Rand Index of 0.86 on test mixtures, indicating strong consistency between clusters produced and original sources.
Wrote a PyTorch implementation of a Transformer and trained it for English to Italian translation. Created custom model and configuration classes compatible with the Hugging Face Transformers library and hosted the trained model on Hugging Face.