EDA on ALGERIAN FOREST Data Set

Introduction to the data set

This data set has the following features as input columns

  1. date ( self-explanatory) currently given in strings
  2. day ( self-explanatory) currently given in strings
  3. month ( self- explanatory) currently given in strings
  4. Temperature range ( 22-42) currently given in strings
  5. relative humidity is given as rh currently given in strings
  6. Wind speed is given as ws currently given in strings
  7. Fine fuel Moisture Code given as FFMC range(28.6-92.5) currently given in strings 8.Duff Moisture code given as DMC range (1.1-65.9)currently given in strings 9.Initial Spread Index given as ISI range(0-18.5)currently given in strings 10.Buildup Index range(0 to 3.1)currently given in strings
  8. Rain range(0 to 16.8)currently given in strings
  9. Fire Weather Index FWI range(1.1-6.8)currently given in strings
  10. Classes- non-fire and fire currently given in strings

Screenshot 2022-10-04 224039.jpg

Screenshot 2022-10-04 224217.jpg Co-Incidentally the null values are only in the columns

Screenshot 2022-10-04 224503.jpg

Before converting the dtype of string into their respective dtype first we need to handle the nan values as we only have two columns we can drop the entire rows

Screenshot 2022-10-04 224724.jpg

Now we are converting the dtype into their respective data types

Screenshot 2022-10-04 224844.jpg

Again making sure that there are no nan values left

Screenshot 2022-10-04 230109.jpg

We have one NaN value which is in the categorical feature

The machine learning algorithm cannot handle strings hence in categorical convert all the values which h has not fire into 0 and fire to 1 This is bascially one hot encoding with categories only To apply one hot encoding the categories should be independent or it should be nominal category

Screenshot 2022-10-04 230519.jpg

Now we now handling the categorical nan value For categorical data we replace nan values with mode and for numerical with mean

Screenshot 2022-10-04 230813.jpg

Screenshot 2022-10-04 232503.jpg

Separating columns as categorical and numerical

Screenshot 2022-10-04 232619.jpg

Univariate Analysis on Numerical Features

Screenshot 2022-10-04 232852.jpg

Univariate Analysis on Categorical Feature

Screenshot 2022-10-04 232958.jpg

Screenshot 2022-10-04 233121.jpg

Screenshot 2022-10-04 233225.jpg

Outlier Detection

Screenshot 2022-10-04 233326.jpg