CV
Sugata Ghosh, PhD
sugata.works@gmail.com | https://www.linkedin.com/in/sugataghosh/
Experience
- Ford Motor Private Limited
2024-Present
Reliability Data Scientist
Department: Global Data Insight and Analytics
- Indian Institute of Science Education and Research Kolkata
2018-2024
Research Fellow and Teaching Assistant
Research Focus: Stochastic ordering
Teaching Assistantship: Served as Teaching Assistant for the courses Statistics I, Probability I, and Analysis I. Involved in conducting tutorial sessions, preparing question papers, and the grading process.
Education
-
Indian Institute of Science Education and Research Kolkata
2018-2024
Doctor of Philosophy in Statistics -
Indian Institute of Technology Kanpur
2015-2017
Master of Science in Statistics
- University of Calcutta
2012-2015
Bachelor of Science in Statistics
Skills
-
Languages:
Python
,SQL
,R
,MATLAB
-
Tools:
LaTeX
,Jupyter Notebook
-
Statistical Software:
Minitab
Publications and Presentations
- Refereed Journal Publications
Ghosh, S., Nanda, A. K. (2023) Conditional precedence orders for stochastic comparison of random variables. Statistics and Probability Letters. https://doi.org/10.1016/j.spl.2022.109702
Ghosh, S., Dutta, S., Genton, M. G. (2017) A note on inconsistent families of discrete multivariate distributions. Journal of Statistical Distributions and Applications. https://doi.org/10.1186/s40488-017-0061-8
- Preprints
Ghosh, S., Nanda, A. K. (2021) Departure-based Asymptotic Stochastic Order for Random Processes. https://arxiv.org/abs/2103.01727
Ghosh, S., Nanda, A. K. (2021) Asymptotic Stochastic Comparison of Random Processes. https://arxiv.org/abs/2103.01720
- Academic Magazine Articles
Banerjee, P., Ghosh, S. (2016) A brief review on missing data. Prakarsho.*
Ghosh, S. (2014) A generalization of the Kelly gambling system. Prakarsho.
Dutta, T., Ghosh, S. (2014) An attempt to generate random numbers. Prakarsho.
- Presentations
Departure-based Asymptotic Stochastic Order for Random Processes
International Workshop on Reliability Theory and Survival Analysis (IWRTSA) 2022, IISER Kolkata
On Some Inconsistent Multivariate Distributions
Open House'16, IIT Kanpur
*Prakarsho: Departmental magazine published by the Department of Statistics, St. Xavier’s College, Kolkata.
Scholastic Achievements
- Scholarship and Research Fellowship
Research Fellowship from University Grants Commission, MHRD, Government of India
National Scholarship from Department of Higher Education, MHRD, Government of India
- Test Performances
AIR-94 in Mathematical Science paper in CSIR-UGC NET-JRF (Dec 2016)
AIR-31 in Mathematical Statistics paper in IIT-JAM (2015)
Seminars, Workshops, and Summer/Winter Schools
Winter School on Deep Learning: From Perceptrons to Diffusion Models
Organized by Electronics and Communication Sciences Unit, ISI Kolkata
International Workshop on Reliability Theory and Survival Analysis (IWRTSA) 2022
Organized by Department of Mathematics and Statistics, IISER Kolkata
Indo-French Center for Applied Mathematics (IFCAM) Winter School 2018
On Stochastic Methods for Uncertainty Quantification and Sensitivity Analysis of Complex Models
Organized by IISER Kolkata
National Seminar on Application of Statistics and Statistical Computing
Organized by Xaverian Statistical Association under Department of Statistics, St. Xavier’s College, Kolkata
Fellowship Programs
TMLC Fellowship Program
Conducted by The Machine Learning Company (2022-2023)
Contributed to the Conversational AI DeepPavlov project.
Data Science and Machine Learning Projects
Author Identification with Natural Language Processing
– Predicted the author of a new text, given a dataset of texts with corresponding authors.
– Trained an LSTM algorithm with the help of GloVe embeddings and obtained a validation log loss of \(0.581\) .
– GitHub repository: https://github.com/sugatagh/Spooky-Author-Identification
E-commerce Text Classification
– Classified products into four given categories based on their descriptions available on an e-commerce platform.
– Employed TF-IDF vectorizer and Word2Vec embedder with a number of classifiers. Obtained test accuracy of \(0.949\) with the hyperparameter-tuned model achieving the highest validation accuracy (TF-IDF + Linear SVM).
– GitHub repository: https://github.com/sugatagh/E-commerce-Text-Classification
Anomaly Detection in Credit Card Transactions
– Identified credit card transactions as authentic or fraudulent based on time, amount, and other attributes.
– Fitted a multivariate normal distribution and achieved test \(F_2\)-score of \(0.816\) with threshold value \(\approx 3.87 \times 10^{−19}\).
– GitHub repository: https://github.com/sugatagh/Anomaly-Detection-in-Credit-Card-Transactions
Higgs Boson Event Detection
Conducted by The Machine Learning Company
– Predicted whether or not an event produced in a particle accelerator indicates the discovery of a new particle.
– Trained a deep neural network, achieving test AMS (approximate median significance) score of \(1.200\) and test accuracy of \(0.824\), using GridSearchCV for hyperparameter optimization.
– GitHub repository: https://github.com/sugatagh/Higgs-Boson-Event-Detection
Patient Survival Prediction
Conducted by The Machine Learning Company
– Performed analytics and predicted the survival of a patient based on various relevant medical information.
– Trained a deep neural network, achieving test accuracy of \(0.923\), using Keras Tuner for hyperparameter tuning.
– GitHub repository: https://github.com/sugatagh/Patient-Survival-Prediction-using-Deep-Learning
Electron Energy Flux Prediction
Conducted by The Machine Learning Company
– Predicted the total electron energy flux based on various relevant features, in the context of modeling electron particle precipitation from the magnetosphere to the ionosphere.
– Trained a deep neural network, achieving test \(R^2\)-score of \(0.699\), using Keras Tuner for hyperparameter tuning.
– GitHub repository: https://github.com/sugatagh/Electron-Energy-Flux-Prediction-using-Deep-Learning
Site Energy Usage Intensity Prediction
Conducted by The Machine Learning Company
– Estimated energy usage intensity of a building in a given year, based on building characteristics and weather data.
– Trained random forest regressor to obtain test \(R^2\)-score of \(0.703\), using Optuna for hyperparameter optimization.
– GitHub repository: https://github.com/sugatagh/Site-Energy-Usage-Intensity-Prediction
Road Traffic Accident Severity Classification
Conducted by The Machine Learning Company
– Built prediction models to classify severity of road traffic accidents (slight injury, serious injury or fatal injury).
– Obtained test weighted \(F_1\)-score of \(0.795\) with XGBoost classifier, using GridSearchCV for hyperparameter tuning.
– GitHub repository: https://github.com/sugatagh/Road-Traffic-Accident-Severity-Classification
Natural Language Processing with Disaster Tweets
Jointly with Shyambhu Mukherjee
– Predicted whether a tweet indicates a disaster or not, using bag-of-words, TF-IDF, and Word2Vec models.
– Obtained average cross-validation \(F_1\)-score of \(0.783\) with Word2Vec embedder and support vector machine classifier with a radial basis function kernel.
– GitHub repository: https://github.com/sugatagh/Natural-Language-Processing-with-Disaster-Tweets
Credit Card Fraud Detection
Jointly with Shyambhu Mukherjee
– Classified credit card transactions as authentic or fraudulent, based on relevant data such as time and amount.
– Obtained test \(F_2\)-score of \(0.880\) with random forest algorithm after oversampling the minority class (fraudulent transactions) in the training set via synthetic minority over-sampling technique (SMOTE).
– GitHub repository: https://github.com/sugatagh/Credit-Card-Fraud-Detection
Online Internships
Machine Learning Internship Program
Conducted by Uniconverge Technologies and The IoT Academy (2022)
– Detected duplication of points of interest in a dataset of over \(1.5\) million place entries.
– Trained several algorithms and obtained test accuracy of \(0.770\) with hyperparameter-tuned XGBoost classifier.
– GitHub repository: https://github.com/sugatagh/Foursquare-Location-Matching
Academic Course Projects
A Time Series Analysis of Monthly Airline Revenue Passenger Mile (RPM)
Supervisor: Dr. Amit Mitra (IIT Kanpur)
– Analyzed RPM data for \(1996 – 2014\) and built a predictive model for forecasting future revenue values.
A Study on Performances in the Olympic Games
Supervisor: Dr. Sharmishtha Mitra (IIT Kanpur)
– Built a regression model to predict the overall performance of the countries in the Summer Olympic Games.
Students’ Future Plans and the Reasons Behind
Supervisor: Dr. Shalabh (IIT Kanpur)
– Examined the variation in career choices of the students at IIT Kanpur and how the reasons for such choices vary.
A Statistical Analysis of the Variation in Preference to Movie Genres among Spectators
Supervisors: Dr. Durba Bhattacharya and Prof. Soumya Banerjee (St. Xavier’s College, Kolkata)
– Studied how hobbies influence preferred movie genre of an individual. Checked bias due to gender and age-group.
– Analyzed differences in preferring one factor for a movie’s success over another across age-groups and gender.
Certifications
Generative AI for Everyone
Authorized by DeepLearning.AI, offered by Coursera (2023)
https://www.coursera.org/account/accomplishments/certificate/EV8T2EF4VUKN
Machine Learning Specialization
Authorized by Stanford University, offered by Coursera (2022)
https://www.coursera.org/account/accomplishments/specialization/certificate/U2MZV5HWRG5L
Data Analyst in SQL Track
Offered by DataCamp (2022)
https://www.datacamp.com/statement-of-accomplishment/track/689ba9d0ab9984f55aac593e6caacd1f9d197194
IBM Data Science Specialization
Authorized by IBM, offered by Coursera (2022)
https://www.coursera.org/account/accomplishments/specialization/certificate/9V355HMT2FB6
Applied Data Science with Python
Offered by Electronics and ICT Academy, IIT Roorkee (2021)
https://eict.iitr.ac.in/wp-content/uploads/L214613B669.jpg
Academic Courses
Statistics
Regression Analysis, Statistical Inference, Time-Series Analysis, Statistical Simulation and Data Analysis, Probabilistic Theory of Pattern Recognition, Multivariate Analysis, Analysis of Variance, Robust Statistical Methods, Nonparametric Inference, Non-linear Regression, Large Sample Theory, Sampling Theory.
Mathematics
Real Analysis, Linear Algebra, Multivariable Calculus, Numerical Analysis, Complex Variables, Ergodic Theory, Introduction to Graph Theory, Measure theory.
Probability and Applications
Probability Theory, Applied Stochastic Process.
Others
Computer Programming and Data Structures, Research Methodology.