EDA on ALGERIAN FOREST Data Set

Introduction to the data set

This data set has the following features as input columns

date ( self-explanatory) currently given in strings
day ( self-explanatory) currently given in strings
month ( self- explanatory) currently given in strings
Temperature range ( 22-42) currently given in strings
relative humidity is given as rh currently given in strings
Wind speed is given as ws currently given in strings
Fine fuel Moisture Code given as FFMC range(28.6-92.5) currently given in strings 8.Duff Moisture code given as DMC range (1.1-65.9)currently given in strings 9.Initial Spread Index given as ISI range(0-18.5)currently given in strings 10.Buildup Index range(0 to 3.1)currently given in strings
Rain range(0 to 16.8)currently given in strings
Fire Weather Index FWI range(1.1-6.8)currently given in strings
Classes- non-fire and fire currently given in strings

Screenshot 2022-10-04 224039.jpg

Screenshot 2022-10-04 224217.jpg Co-Incidentally the null values are only in the columns

Screenshot 2022-10-04 224503.jpg

Before converting the dtype of string into their respective dtype first we need to handle the nan values as we only have two columns we can drop the entire rows

Screenshot 2022-10-04 224724.jpg

Now we are converting the dtype into their respective data types

Screenshot 2022-10-04 224844.jpg

Again making sure that there are no nan values left

Screenshot 2022-10-04 230109.jpg

We have one NaN value which is in the categorical feature

The machine learning algorithm cannot handle strings hence in categorical convert all the values which h has not fire into 0 and fire to 1 This is bascially one hot encoding with categories only To apply one hot encoding the categories should be independent or it should be nominal category

Screenshot 2022-10-04 230519.jpg