EDA on Zomato Bangaluru Data Set

Introduction to EDA

Intuition

Imagine your wolf pack decides to watch a movie you haven’t heard of.There is absolutely no debate about that,it will lead to a state where you find yourself puzzled with lot of questions which needs to be answered in order to make a decision.Being a good chieftain the first question you would ask, what is the cast and crew of the movie?As a regular practice,you would also watch the trailer of the movie on YouTube.Furthermore,you’d find out ratings and reviews the movie has received from the audience.

Whatever investigating measures you would take before finally buying popcorn for your clan in theater,is nothing but what data scientists in their lingo call ‘Exploratory Data Analysis’.

Definition of EDA

Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

In simple language all we are trying to do is to understand the dataset

Performing EDA on Zomato DataSet (Bangaluru)

Loading the Data Set

Captureasasasasa.JPG

Looking through the reference sheet to understand what the feature name stands for

we have 17 columns
url(object): website of the resturant
address(object): actual address of the restaurant
online-order(object)----categorical data--- category yes/no--- do they take online orders
book-table(object)----categorical data ---- can we do reservation or is it first come first server basis--category yes/no
rating of the restaurant on a scale of 5 ---dataset it is given as tr

How to find the data type of each feature

sasaasasasa.JPG

Making Some Inference from data type

object data type in pandas is string.Hence we need to change the data type of the rate column as we know that the value should be in float and also we need to remove /5 as all the rating is done on 5

jejejejejCapture.JPG

Our Machine Learning Model can only understand numbers ,hence in the categorical column we wiill be changing all the no with 0 and yes with 1

Capture.JPG

Checking for missing values

AAACapture.JPG

Before Converting the data type we need to handle the NaN values

Capasasasture.JPG

Plotting Categorical Data we use pie chart or countplot

Caaaaaaapture.JPG

Plotting Numerica - Categorical data we use bar plot

Cqqqqqapture.JPG

For numeric Data we use distplot

Captuaaaaaaaaaaare.JPG