In this project, I built machine learning models that best identifies potential donors for CharityML(a fictitious charity organization) with data collected for the U.S. census. To find the best approach, I performed EDA, feature engineering, and building training and predicting pipeline to evaluate and optimize the performance between different machine learning models.
The modified census dataset consists of approximately 32,000 data points, with each datapoint having 13 features. This dataset is a modified version of the dataset published in the paper “Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”, by Ron Kohavi. You may find this paper online, with the original dataset hosted on UCI.
The model I have used:
- SGD Classifier
- AdaBoost
- Logistic Regression
You can see the code(iPython notebook) here.