Predict the occurrence of diabetes

The goal of this project was to create a Machine Learning model to predict the occurrence of diabetes.

Predict the occurrence of diabetes

This project is the result of a Kaggle competition promoted by Data Science Academy in January of 2019.

The goal of the competition was to create a Machine Learning model to predict the occurrence of diabetes.

Data source: National Institute of Diabetes and Digestive and Kidney Diseases

Competition page: kaggle.com/c/competicao-dsa-machine-learnin..

Problem

The goal is to predict, based on diagnostic measures, whether a patient has diabetes.

Task

Create a Machine Learning model to estimate the probability of the occurrence of diabetes.

Solution

I've used Python to perform an Exploratory Data Analysis (EDA) using visual and quantitative methods to understand and summarize a dataset without making any assumptions about its contents. Then I've performed Data Cleaning and built several Machine Learning models to compute the probability of occurrence of diabetes. The Logistic Regression model presented the best results.

Results

In this competition, I've reached the accuracy of 76.27% and I got position 41 on the leaderboard.


Solution details

I've written a blog post with details about the solution.

The code is also available at GitHub.

github