# Predict the occurrence of diabetes

## The goal of this project was to create a Machine Learning model to predict the occurrence of diabetes.

This project is the result of a Kaggle competition promoted by Data Science Academy in January of 2019.

The goal of the competition was to create a Machine Learning model to predict the occurrence of diabetes.

Data source: National Institute of Diabetes and Digestive and Kidney Diseases

Competition page: kaggle.com/c/competicao-dsa-machine-learnin..

### Problem

The goal is to predict, based on diagnostic measures, whether a patient has diabetes.

### Task

Create a Machine Learning model to estimate the probability of the occurrence of diabetes.

### Solution

I've used Python to perform an **Exploratory Data Analysis (EDA)** using visual and quantitative methods to understand and summarize a dataset without making any assumptions about its contents. Then I've performed Data Cleaning and built several **Machine Learning** models to compute the probability of occurrence of diabetes. The Logistic Regression model presented the best results.

### Results

In this competition, I've reached the accuracy of **76.27%** and I got **position 41 on the leaderboard**.

# Solution details

I've written a blog post with details about the solution.

The code is also available at GitHub.