Overview
Projects
E-commerce Text Classification
– Classified products into four given categories based on their descriptions available on an e-commerce platform.
– Employed TF-IDF vectorizer and Word2Vec embedder with a number of classifiers. Obtained test accuracy of \(0.949\) with the hyperparameter-tuned model achieving the highest validation accuracy (TF-IDF + Linear SVM).
– Project report | GitHub repository
Anomaly Detection in Credit Card Transactions
– Identified credit card transactions as authentic or fraudulent based on time, amount, and other attributes.
– Fitted a multivariate normal distribution and achieved test \(F_2\)-score of \(0.816\) with threshold value \(\approx 3.87 \times 10^{−19}\).
– Project report | GitHub repository
Higgs Boson Event Detection
Conducted by The Machine Learning Company.
– Predicted whether or not an event produced in a particle accelerator indicates the discovery of a new particle.
– Trained a deep neural network, achieving test AMS (approximate median significance) score of \(1.200\) and test accuracy of \(0.824\), using GridSearchCV for hyperparameter optimization.
– Project report | GitHub repository
Electron Energy Flux Prediction
Conducted by The Machine Learning Company.
– Predicted the total electron energy flux based on various relevant features, in the context of modeling electron particle precipitation from the magnetosphere to the ionosphere.
– Trained a deep neural network, achieving test \(R^2\)-score of \(0.699\), using Keras Tuner for hyperparameter tuning.
– Project report | GitHub repository
Site Energy Usage Intensity Prediction
Conducted by The Machine Learning Company.
– Estimated energy usage intensity of a building in a given year, based on building characteristics and weather data.
– Trained random forest regressor to obtain test \(R^2\)-score of \(0.703\), using Optuna for hyperparameter optimization.
– Project report | GitHub repository
Patient Survival Prediction
Conducted by The Machine Learning Company.
– Performed analytics and predicted the survival of a patient based on various relevant medical information.
– Trained a deep neural network, achieving test accuracy of \(0.923\), using Keras Tuner for hyperparameter tuning.
– Project report | GitHub repository
Road Traffic Accident Severity Classification
Conducted by The Machine Learning Company.
– Built prediction models to classify severity of road traffic accidents (slight injury, serious injury or fatal injury).
– Obtained test weighted \(F_1\)-score of \(0.795\) with XGBoost classifier, using GridSearchCV for hyperparameter tuning.
– Project report | GitHub repository
Natural Language Processing with Disaster Tweets
Jointly with Shyambhu Mukherjee.
– Predicted whether a tweet indicates a disaster or not, using bag-of-words, TF-IDF, and Word2Vec models.
– Obtained average cross-validation \(F_1\)-score of \(0.783\) with Word2Vec embedder and support vector machine classifier with a radial basis function kernel.
– Project report | GitHub repository
Credit Card Fraud Detection
Jointly with Shyambhu Mukherjee.
– Classified credit card transactions as authentic or fraudulent, based on relevant data such as time and amount.
– Obtained test \(F_2\)-score of \(0.880\) with random forest algorithm after oversampling the minority class (fraudulent transactions) in the training set via synthetic minority over-sampling technique (SMOTE).
– Project report | GitHub repository
Notes
Data Preparation with Pipeline and ColumnTransformer
– Created class objects (withfit
andtransform
methods) for each preprocessing and feature engineering step.
– Combined these classes intoPipeline
constructors (for example, one pipeline for float-type columns, another for object-type columns).
– Combined the pipelines and the corresponding lists of columns to create aColumnTransformer
constructor.
– Using this, implemented each pipeline on the desired set of columns.
– Note
Implementing Neural Network with Callbacks
– Created and compiled a simple neural network.
– IncorporatedEarlyStopping
,LearningRateScheduler
, andModelCheckpoint
callbacks in the training of the neural network.
– Note
A Simple Implementation of Support Vector Machine (SVM)
– Loaded the MNIST dataset of handwritten digits fromkeras.datasets
.
– Presented visualization of the digits from the input pixel values usingimshow
.
– Reshaped the data and used Support Vector Machine to predict the digits.
– Note
Term Frequency-Inverse Document Frequency (TF-IDF)
– Described the construction of the TF-IDF measure.
– Explained how it can be implemented to vectorize text data.
– Note
Implementing Logistic Regression from Scratch
– Implemented a logistic regression model from scratch without using any advanced library to understand how it works regarding binary classification.
– Note