Predict the occurrence of diabetes
The goal of this project was to create a Machine Learning model to predict the occurrence of diabetes.
This project is the result of a Kaggle competition promoted by Data Science Academy in January of 2019.
The goal of the competition was to create a Machine Learning model to predict the occurrence of diabetes.
Data source: National Institute of Diabetes and Digestive and Kidney Diseases
Competition page: kaggle.com/c/competicao-dsa-machine-learnin..
Problem
The goal is to predict, based on diagnostic measures, whether a patient has diabetes.
Task
Create a Machine Learning model to estimate the probability of the occurrence of diabetes.
Solution
I've used Python to perform an Exploratory Data Analysis (EDA) using visual and quantitative methods to understand and summarize a dataset without making any assumptions about its contents. Then I've performed Data Cleaning and built several Machine Learning models to compute the probability of occurrence of diabetes. The Logistic Regression model presented the best results.
Results
In this competition, I've reached the accuracy of 76.27% and I got position 41 on the leaderboard.
Solution details
I've written a blog post with details about the solution.
The code is also available at GitHub.