Recent Projects
E-commerce Text Classification
Product categorization is a type of economic taxonomy that refers to a system of categories into which a collection of products would fall. The problem considered in this project involves the categorization of products offered on e-commerce platforms based on the descriptions of the products mentioned therein. The categories are: Electronics, Household, Books, and Clothing & Accessories. The purpose of such categorization is to enhance the user experience and achieve better results with external search engines.
read more
Anomaly Detection in Credit Card Transactions
An anomaly or outlier refers to a rare observation that deviates significantly from the majority of the data and does not conform to a well-defined notion of normal behaviour. Anomaly detection can be very useful in identifying fraudulent credit card transactions, which are rare compared to authentic transactions. Also, the methods through which fraudulent transactions occur keep evolving as the old ways get flagged by existing fraud detection systems. In this project, we shall develop a basic anomaly detection system that flags transactions with feature values deviating significantly from those of authentic transactions.
read more
Higgs Boson Event Detection
In particle physics, events refer to the results just after a fundamental interaction took place between subatomic particles, occurring in a very short time span in a well-localized region of space. A background event is explained by the existing theories. On the other hand, a signal event indicates a process that cannot be described by previous observations and leads to the potential discovery of a new particle. In this project, we aim to predict if a given event is background or signal.
read more
Electron Energy Flux Prediction
In McGranaghan et al. (2021), the authors considered the problem of modeling electron particle precipitation from the magnetosphere to the ionosphere. They attempted to address the problem through a new particle precipitation database, using machine learning tools to extract useful information from it. The new database contains 51 satellite years of Defense Meteorological Satellite Program (DMSP) observations. Based on this, we aim to predict the continuous target variable electron total energy flux.
read more
Site Energy Usage Intensity Prediction
Energy usage intensity (EUI) refers to the amount of energy used per square foot annually. It is calculated by dividing the total energy consumed by the building in a year by the total gross floor area. Like miles per gallon for cars, EUI is the prime indicator of the energy performance of a building. In this project, we aim to predict the continuous target variable site EUI, given the characteristics of the building and the weather data for its location.
read more
Patient Survival Prediction
Deriving a quick understanding of the overall health of a patient is of paramount importance in emergency healthcare situations. Often, this understanding is hindered for many different reasons. In particular, the intensive care units in hospitals often lack a verified medical history of the incoming patients. The knowledge of medical records indicates the survival odds of a patient to a great extent. In this project, we predict whether a patient will survive or not based on various relevant medical information.
read more
Road Traffic Accidents Severity Classification
The severity of road traffic accidents can be influenced by several factors, including the various attributes of the vehicles involved, the drivers, the casualties, and the surrounding conditions. The objective of the project is to build a prediction model to classify the severity of road traffic accidents into a hierarchy consisting of three categories, namely slight injury, serious injury, and fatal injury, based on the information on the pertinent attributes.
read more
Natural Language Processing with Disaster Tweets
Disaster-related tweets have the potential to alert relevant authorities early on so that they can take action to reduce damage and possibly save lives. In this project, we attempt to predict whether a given tweet indicates a real disaster or not. We take a number of text normalization processes into consideration. For text representation, we experiment with the bag of words model (count vectorizer), TF-IDF vectorizer, and word2vec embedding. For each approach, we consider several binary classifiers and compare their performances through cross-validation.
read more
Credit Card Fraud Detection
The detection of a fraudulent credit card transaction can be helped by a number of attributes regarding the particular transaction. In this work, we build a number of classification models to predict whether a credit card transaction is authentic or fraudulent based on the data regarding time, amount, and a set of PCA-transformed features. Apart from the unaltered training set, we use different resampled training sets (with both classes represented equally) to counter the class imbalance issue.
read more