This project is the result of a Kaggle competition promoted by Data Science Academy in January of 2019.
The goal of the competition was to create a Machine Learning model to predict the occurrence of diabetes.
Data source: National Institute of Diabetes and Digestive and Kidney Diseases
Competition page: kaggle.com/c/competicao-dsa-machine-learnin..
The goal is to predict, based on diagnostic measures, whether a patient has diabetes.
Create a Machine Learning model to estimate the probability of the occurrence of diabetes.
I've used Python to perform an Exploratory Data Analysis (EDA) using visual and quantitative methods to understand and summarize a dataset without making any assumptions about its contents. Then I've performed Data Cleaning and built several Machine Learning models to compute the probability of occurrence of diabetes. The Logistic Regression model presented the best results.
In this competition, I've reached the accuracy of 76.27% and I got position 41 on the leaderboard.
I've written a blog post with details about the solution.
The code is also available at GitHub.