Srijith Rajamohan, Ph.D.
Blacksburg, VA 24060
Ph.D. in Computational Engineering The University of Tennessee
Dissertation: ”A StreamlineUpwind/ Petrov-Galerkin FEM based Time-Accurate Solution of 3D Time-Domain Maxwell’s Equations for Dispersive Materials”
MS in Electrical Engineering The Pennsylvania State University
Thesis: ”A Neural Network based classifier on the Cell Broadband Engine”
- HPC - HPC, FEM, Eigen, PetSc, MPI, OpenMP, CUDA, OpenACC
- Deep Learning - NLP/NLU, Unsupervised and Weakly-supervised learning
- Machine Learning - Network analysis, Bayesian inference
- Bayesian Learning/Probabilistic Programming - PyMC3
- Deep Learning Frameworks - Tensorflow, Pytorch, Keras, Scikit-learn
- DataScience - Pandas, NumPy, SciPy, Nlopt, PySpark, Airflow, RQ
- Scientific Visualization - Visit, ParaView, VTK
- Information Visualization - Plot.ly, Matplotlib, Bokeh, Tableau
- Languages - C, C++, Python, Fortran
- Database/Datalake - MongoDB, SQL, Delta Lake
- Workflow management - Git, Ansible
- MLOps - MLflow, Neptune.ml, Comet.ml
Senior Developer Advocate (Data Science & ML), Databricks, Blacksburg, VA
Computational Scientist, Advanced Research Computing, Blacksburg, VA
Graduate Research Assistant, SimCenter: Center of Excellence in Computational Engineering, The University of Tennessee, Chattanooga, TN
Graduate Student, The Microsystems Design Lab, The Pennsylvania State University, State College, PA
Sr. Developer Advocate (Data Science/Machine Learning)
My role as a Developer Advocate in Data Science allows me to serve as a thought leader in Machine learning and Data science, and educate the community about the state-of-the-art. This role allows me to engage in internal advocacy, and work cross-functionally across various units such as product management, product marketing, solutions, engineering and documentation. Some of my responsibilities include:
- - Thought leadership articles and presentations on enterprise and open-source ML/Data Science
- - Provide guidance and feedback to product management and product marketing
- - Act as a subject matter expert to the solutions and engineering team
My technical areas of expertise here are Deep Learning for Natural Language Understanding (NLU), Bayesian inference and large-scale processing using PySpark.
Links: = Webpage of project
Lead the DevRel efforts on Machine Learning at Databricks
- - Engage with all stakeholders, i.e. everyone from the executive team to the practitioner community to identify their ML needs and help provide solutions.
- - Guide the ML/DS Product Management team with regards to product features
- - Work with the Product Marketing Managers to help reach the appropriate community of practitioners
- - Understand the needs of the practitioner community
- - Lead the efforts on growth and the messaging architecture of Spark and Koalas
- - Engage with C-suite executives to define the growth strategy for Spark
- - Offer strategic guidance on improvements to the Spark website and documentation, and grow community adoption
Lead the advocacy efforts on OSS MLflow
- - Interface with the Mlflow product team at Databricks and drive adoption of OSS MLflow among the practitioner community
Authored a set of three courses on Bayesian Inference titled ‘Introduction to Computational Statistics for Data Scientists’ on Coursera
- - A practical guide to getting started with scalable Bayesian Inference using PyMC3
- - Introduction to Bayesian Statistics
- - Bayesian Inference with MCMC
- - PyMC3 for Bayesian Modeling and Inference
Articles on Data Science and Machine Learning
- - GPU-accelerated Sentiment Analysis Using Pytorch and Huggingface on Databricks
- - Are GPUs really expensive? A benchmark study for inference in NLP
- - An Experimentation Pipeline for Extracting Topics From Text Data Using PySpark
- - MLflow for Bayesian Experiment Tracking
- - Bayesian Modeling of the Temporal Dynamics of COVID-19 using PyMC3
- - Using Bayesian Hierarchical Models to Infer the Disease Parameters of COVID-19
- - Beyond LDA: Dive into BigARTM for Topic Modeling
- - The Modern Chief Data Officer: Transitioning From Defense to Offense
- - Reproduce Anything: Machine Learning meets Data Lakehouse
Machine Learning at Scale (Nov/Aug 2021)
- - Best practices for the full lifecycle of ML projects along with issues such as reproducibility, explainability, trustworth AI and governance
- - Keynote at the IEEE IDSTA conference
- - Invited talk at the ADBIS workshop
- - Presentation on how to scale Deep learning workloads at Databricks at the Big Data Symposium in South Korea
- - Presentation on how to use OSS MLflow for model management, reproducibility with MLflow projects and Model registry, and model nference/serving
- - Bayesian inference to estimate the disease parameters of COVID-19 from real case data with PyMC3
- - Scale HPC and Data Science codes written in Python
- - JIT approach using Numba
- - Offload compute-intensive portions of the Python code using Eigen and Xtensor in C++
This role as a Computational Scientist involved providing Scientific Computing expertise, enabling High-Performance Computing and Visualization solutions and performing research in Machine Learning.
Links: = Webpage of project
Interactive Network Analysis of Social Graphs
- - Network analysis of social media network of political figures in collaboration with the Political Science dept.
- - Network data obtained from Twitter, ETL workflows performed with PySpark and saved in Spark tables
- - Metabase as a BI dashboard for the SME to perform EDA on these tables
- - Networks graphs created using Graphtools
- - Interactive visualization with Sigma.js in the web browser
- - Distributed Deep learning sentiment analysis pipeline with Huggingface Roberta embeddings
Natural Language Processing for determining Political Affiliation
- - Stance extraction from a weakly-supervised classifier through a word vector-based approach to
determine political affiliation
- - Tweets were stored and preprocessed in MongoDB
- - Python RQ for data acquisition
- - PySpark and Spacy used for corpus cleaning
- - Fixed embedding generation frameworks such as Doc2Vec and Fasttext used for training and generation
of sentence embeddings
- - Contextual embeddings such as Elmo for generating sentence embeddings
- - Self-attention for model interpretability
- - Created interactive web-based visualization tool for stance visualization
- - Hyperparameter optimization with Comet.ml
- - Apache Airflow to streamline the workflow
Deep Learning lead for the project ‘Eye Gaze tracking for Surgical Training’
- - Oversaw a team of 5 attempting to learn gaze patterns of resident surgeons using Deep Learning
- - Collaboration between the ISE department at Virginia Tech and the Carilion School of Medicine
Co-PI on Jefferson National Lab funded project ‘Next-generation Visual Analysis Workspace for Multidimensional Nuclear Femtography Data’
- Joint venture between Virginia Tech and JLab
- - Visual Analytics project for Nuclear Femtography data.
- - Big Data analytics and visualization: Investigating various novel ways to enable understanding of nuclear physics phenomena with two-dimensional and three-dimensional visualizations.
General Dynamics Collaboration with the Discovery Analytics Center(VT)
- - Computational statistics and dimensionality reduction using unsupervised techniques such as PCA, MDS
- - Weighted Multi-Dimensional Scaling (WMDS) for semantic interaction
- - Formulated a highly-accurate Backward MDS algorithm
- - Formulated and implemented the optimization scheme for the solution of Inverse MDS in Python using the NLopt optimization package
- - Parallelized Inverse MDS Python code using Numba
- - Accelerated Inverse MDS C++ code for low latency with Eigen
Generative Methods for Stance Detection and Visualization
- - Investigate Variational Autoencoders for extracting stance
- - Use static and contextual text embeddings with this configuration
- - Use a semi-supervised learning approach with partially labeled data
Reinforcement learning for Eye Tracking in Laparoscopic Surgery
- - Reinforcement learning for learning eye gaze patterns in surgeons during laparoscopic surgery
- - ADNet for training the reinforcement learning algorithm
- - The algorithm attempts to learn expert gaze patterns that can then be used to train and evaluate residents
Cost analysis of On-premise Cloud vs. Public Cloud for Virginia Tech
- - Performed a cost analysis of Virginia Tech’s on-premise cloud and compared it to those offered
by third-party external cloud providers for scientific computing.
- - This was done to understand the tradeoffs of an on-premise cloud for Virginia Tech stakeholders,
i.e. the VT research and academic community since their needs are unique.
- - Created an interactive report that allows end-users to compare the costs based on their usage needs.
ICAT SEAD Grant 2018
- - Co-PI on ICAT SEAD grant to analyze and visualize the health and nutrition policies across countries
- - Interdisciplinary collaboration with the Health and Nutrition, Business and SOVA departments
- - Visualization of geographic food policy data
- - Developing an open-source interactive visual query framework
- - Web-based framework that is deployed with Docker
- - Funding used to hire undergraduate intern from the CMDA department as part of a broader
engagement program for this project.
Scheduling and Visualization Application for Idaho National Lab
- - Built a Django and D3 based scheduling and visualization tool
- - Tool helps the Idaho National Lab manage outage tasks for the Nuclear Power plants
- - Managed two undergrads in this development project
HNFE Project for Visualization of National Food and Beverage Endorsements
- - Collaborated with the HNFE department to visualize data
- - Produced interactive web-based visualizations for exploratory analysis of Food and Beverage endorsements
- - Work was presented to policy makers to understand the impact of celebrity endorsements
- - Produced a web-based analytics framework for visual querying of endorsement data
- - Framework extended using Dask for analyzing Big Data
Interactive 3D Visualization for Nuclear Reactor Pool
- - Multi-user user interactive visualization using open-source technology (X3D) in web browsers
- - This involved collaborating with the Nuclear Engineering department to interface with their
RAPID toolset to generate a visualization and querying tool.
- - Played a key role in architecting the cloud computing infrastructure for research computing using Openstack at Virginia Tech
- - Involved in designing and setting up Cloud Computing best practices and architectures
- - Compiled document for Cloud Computing use-cases comparing OpenStack, LXC and Docker containers
- - ParaView and Visit for Massive Scientific Visualization Streamlined remote rendering on ParaView using the ARC supercomputers
- - Set up Python-based scripted visualization and used this for profiling the remote rendering capabilities of ParaView
- - Compiled a ’Best Practices’ document for Massive Scientific Visualization using ParaView on the VT clusters
- - Evaluating novel approaches for scientific visualization using ParaView and Visit
Collaborative Computing Platforms
- - Setup Visionarium resources ’Anvil’, ’Polaris’, ‘Pluto’, ‘Spock’ as collaborative-computing platforms
for Data Science users and project collaborators using JupyterHub. Set up policies for group access
and compute resources on these machines
- - Setup unified Python environments with the Python Data Science stack (E.g. Numpy, SciPy, Plot.ly, Pandas, Numba)
- - Setup Machine Learning and Deep Learning environments with Tensorflow, Keras and Pytorch to run on GPUs.
- - Setup out-of-core computing with Dask and GPU compute with CUDA
- • Set up Asana project management tools for Visualization team’s internal coordination and communication
- • Setup GitLab Continuous Integration as part of code-quality improvement in research projects. Authored blog post on this topic.
- • Setup and leading the adoption of Open Science Framework as a provenance and data management tool for research projects
- • Participated in several NSF proposals
- • Provide Visionarium Tours for Virginia Tech users and external visitors
- • Helped set up Virginia Tech ARC Booth and Slideshow presentations at SuperComputing 2016, 2017
- • Virginia Tech campus champion for XSEDE
- • Virginia Tech representative for ACI-REF
- • Poster and paper reviewer for XSEDE15, XSEDE16, PEARC17 and PEARC18
- • Session Chair for Workforce Development and Diversity XSEDE2016
- • NSF Proposal reviewer
- • Supervising ARC GRAs
Highlighted is some of my research work conducted during my Masters and Ph.D. programs.
Graduate Research Assistant
Experience developing and maintaining a 3D Time domain Electromagnetic solver for open/closed boundary problems such as Radar Cross Sections, Waveguides, Frequency-dependent materials etc. Knowledge of Finite Element Method for solving electromagnetic problems involving time-dependent phenomena.
- - Acceleration of EM Fortran code by interfacing with CUDA C.
- - Implemented Linear System solvers both serial and parallel-Block GMRES, Block Gauss Seidel,
LU in C and Fortran.
- - Written 2D and 3D CFD implicit unstructured Finite Volume solver in C++ using the Van Leer Flux and Roe Flux.
- - Computational design of airfoils and sensitivity analysis.
- - Delaunay triangulation of 2D meshes in C++.
- - Mesh smoothing using Winslow and Linear Elastic smoothing in C++.
- - Mentored two graduate students in Computational Engineering.
- - Pointwise for mesh generation.
I was a member of the Microsystems Design Lab at the Pennsylvania State University where I worked on accelerator technology.
- - Implemented a neural network for skin-tone detection on the IBM Cell and observed a 23x speedup over
the serial code (Masters thesis).
Performed code maintenance and bug fixes on proprietary embedded system firmware at Arris Corporation.
SuperComputing 2019 short talks : Talks@VT series
- - Introduction to Generative Modeling
- - Cloud Cost Comparison: On-premise vs. External Vendor
Presented work on Stance Detection using Deep Learning at the AI4Good workshop held at PEARC19
SuperComputing 2018 short talks : Talks@VT series
- - Deep Learning on GPUs with PyTorch for Text Analysis
- - AutoML: An Overview of Automated Machine Learning
Taught ‘Text Summarization with Word Embeddings using PyTorch’ for CS4984/5984 in Fall 2018
Workshops for the Industrial and Systems Engineering Dept.
- - Introduction to Scientific Python: 150 min handson workshop
- - Introduction to Data Visualization with Plot.ly: 150 min handson workshop
Taught undergraduate class ‘CS1064: Introduction to Python’ in the Spring 2016 semester
Taught ’Introduction to OpenACC’ lecture for undergraduate class ‘CMDA 3634: Comp Sci Foundations for CMDA’
Taught workshop titled ’Introduction to ARC Cloud using OpenStack for Machine Learning’
Taught Networked Learning Initiative Seminar classes 2015, 2016, 2017 and 2018
- - Introduction to Scientific Computing using Python
- - Introduction to Data Visualization
- - Co-taught ’Introduction to Debugging and Profiling with GNU tools’
- - Introduction to CUDA
- - Introduction to Scientific Visualization using ParaView
- - TensorFlow for Machine Learning
- - Dask for Out-of-Core Computing: Big Data solutions on your laptop
- - Unsupervised Machine Learning using Sckit-learn and TensorFlow
- - Supervised Machine Learning using Sckit-learn and TensorFlow
- - Deep Learning using TensorFlow and Keras
Taught XSEDE workshops in 2015, 2016
- - Introduction to Scientific Computing using Python
- - Python Pandas for Data Analytics
PEARC workshops in 2017, 2018
- - A Data Scientist’s Python Toolbox
- - Workshop titled ’Introduction to Machine Learning’ taught at PEARC18
Mentored graduate and undergraduate students on various funded projects
*- Chaitanya S. Kulkarni, Tianzi Wang, Nathan Lau, Jacob Hartman-Kenzler, Sarah E. Parker, Srijith Rajamohan, Laura E. Barnes, Shawn D. Safford, Applying Deep Learning to Provide Eye- Gaze Guidance for the Peg Transfer Task, Jan 1 2021, 16th Academic Surgical Congress
- Srijith Rajamohan, Robert Settlage Informing the On/Off-prem Cloud Discussion in Higher Education, PEARC20, ACM, Portland
- Robert Settlage, Srijith Rajamohan Enabling AI/DL Workloads on HPC Infrastructure through Containers and Open OnDemand. HPCKP20, High-Performance Computing Knowledge Meeting, Barcelona, July 2020
- Rincón-Gallardo Patiño, Sofía, Srijith Rajamohan, Kathleen Meaney, Eloise Coupey, Elena Serrano, Valisa E. Hedrick, Fabio da Silva Gomes, Nicholas Polys, and Vivica Kraak. Development of a Responsible Policy Index to Improve Statutory and Self-Regulatory Policies that Protect Children’s Diet and Health in the America’s Region. International Journal of Environmental Research and Public Health 17, no. 2 (2020): 495.
- Robert Settlage, Srijith Rajamohan, Kevin Lahmers2, Alan Chalker3, Eric Franz3, Steve Gallo4, David Hudak3. Portals for Interactive Steering of HPC Workflows. Nov 2019, Third Workshop on Interactive High-Performance Computing, SC19
- Srijith Rajamohan, Alana Romanella, Amit Ramesh. A Weakly-Supervised Attention-based Visualization Tool for Assessing Political Affiliation. Aug 2019, arXiv:1908.02282 [cs.CL], https:// arxiv.org/abs/1908.02282
- Zhou, M., Rajamohan, S., Hedrick, V., Rincón-Gallardo Patiño, S., Abidi, F., Polys, N., & Kraak, V. (2019). Mapping the Celebrity Endorsement of Branded Food and Beverage Products and Marketing Campaigns in the United States, 1990–2017 ,International journal of environmental research and public health 16.19 (2019): 3743
- Valerio Mascolino, Alireza Haghighat, Nicholas Polys, Nathan J. Roskoff, and Srijith Rajamohan. 2019. A Collaborative Virtual Reality System (VRS) with X3D Visualization for RAPID, The 24th International Conference on 3D Web Technology (Web3D ’19), ACM, New York, NY, USA, 1-8.
- Srijith Rajamohan and Faiz Abidi, Web-based Visualization and Querying of Food and Beverage Endorsements by Celebrities, PEARC19, ACM, Chicago
- Rajamohan, S., Romanella, A., Ramesh, A., A Human-in-the-Loop Deep Learning Based Document Tagging for Stance Detection, CHCI 2019: Algorithms that make you think, Blacksburg.
- Rajamohan,S. and Anderson, W.K. A Modified Streamline Upwind/Petrov-Galerkin Stabilization Matrix for Time-Domain FEM, ACES 2018, Denver
- Rajamohan,S. and Anderson, W.K. Using an Approximate Streamline Upwind/Petrov-Galerkin Stabilization Matrix for the Solution of Maxwell’s Equations in Dispersive Materials, ACES 2018, Denver.
- Abidi, F., Polys, N., Rajamohan, S., Arsenault, L., Mohammed, A. (2018, April). Remote high performance visualization of big data for immersive science. In Proceedings of the High Performance Computing Symposium (p. 5). Society for Computer Simulation International.
- Zhou M, Kraak VI, Rajamohan S, Abidi F, Polys N. Mapping the Celebrity Marketing of Branded Food and Beverage Products in the United States: Policy Implications and Research Needs. 15th World Congress on Public Health. April 3-7, 2017. Melbourne, Victoria, Australia
- Nicholas Polys, Ayat Mohammed, Jagathshree Iyer, Peter Radics, Faiz Abidi, Lance Arsenault, and Srijith Rajamohan. Immersive Analytics: Crossing the Gulfs with High-Performance Visualization. IEEE VR 2016 Workshop on Immersive Analytics
- Rajamohan,S and Anderson, W.K , HPC for Legacy EM Code, a Mixed Language Approach using CUDA. Applied Computational Electromagnetic Society 2012, Volume: GPU for CEM.
- Porting Algorithms to the IBM Cell Processor - an FFT case study. Penn State Research Symposium 2009.