Projects

Select Projects (Verge Genomics)

Machine Learning for Prediction of Gene Targets

  • Led research in supervised machine learning classifiers for the prediction of gene targets for drug discovery efforts
  • Authored Python code stack managing complex API queries across company platform for the assembly of training and annotation data for user-selected disease indications, performance comparisons of a suite of classifiers using nested cross-validation, and downstream analysis of feature weights to inform future data generation efforts
  • Skills: Generalized Linear Models · Support Vector Machine (SVM) · Random Forest · XGBoost · Python (Programming Language)

Statistical Analytic Support for Bench Scientists

  • Conducted study design, power analysis, and results reporting for in vitro and in vivo studies using outcomes including rodent biomarker and behavioral endpoints, histological changes, cell survival, cellular morphological differentiation, gene expression, puncta and stress granule formation, and immunofluorescence in collaboration with bench scientists across the company
  • Skills: Statistical Modeling · R · Biostatistics · Technical Presentations

Co-expression Preservation

  • Formulated metrics for gauging the replicability of gene expression correlations across cohorts and between human, animal, and in vitro models based on the probability distributions of correlation matrices; metrics and associated R code were incorporated into standard company research protocols
  • Skills: Probability Theory · R · Statistical Data Analysis · Genomics

Joint Gene Set Analysis

  • Formulated a novel expansion of Gene Set Analysis (GSA; Efron and Tibshirani (2007), Ann. Appl. Stat. 1(1):107-129) to quantify the consistency of set-wise gene dysregulation across experiments and between human, animal, and in vitro models; Python code for this and custom implementation of the efficiency-enhanced versions of original GSA calculations was incorporated into the company analytic pipeline
  • Skills: Python (Programming Language) · Statistical Analysis · Genomics

Gene Co-expression Cluster Detection

  • Formulated novel unsupervised machine learning approach for the detection of clusters of co-expressed genes, similar in spirit to Langfelder Horvath (2008, BMC Bioinformatics 9:559) but based on probability theoretic reasoning for increased validity of output
  • R code was incorporated into the company research pipeline and adopted for standard analytic protocols
  • Skills: Probability Theory · R · Machine Learning · Cluster Analysis · Transcriptomics
Image
© 2020-2024 Eric Roberts