EDA on ALGERIAN FOREST Data Set
Introduction to the data set
This data set has the following features as input columns
- date ( self-explanatory) currently given in strings
- day ( self-explanatory) currently given in strings
- month ( self- explanatory) currently given in strings
- Temperature range ( 22-42) currently given in strings
- relative humidity is given as rh currently given in strings
- Wind speed is given as ws currently given in strings
- Fine fuel Moisture Code given as FFMC range(28.6-92.5) currently given in strings 8.Duff Moisture code given as DMC range (1.1-65.9)currently given in strings 9.Initial Spread Index given as ISI range(0-18.5)currently given in strings 10.Buildup Index range(0 to 3.1)currently given in strings
- Rain range(0 to 16.8)currently given in strings
- Fire Weather Index FWI range(1.1-6.8)currently given in strings
- Classes- non-fire and fire currently given in strings
Co-Incidentally the null values are only in the columns
Before converting the dtype of string into their respective dtype first we need to handle the nan values as we only have two columns we can drop the entire rows
Now we are converting the dtype into their respective data types
Again making sure that there are no nan values left
We have one NaN value which is in the categorical feature
The machine learning algorithm cannot handle strings hence in categorical convert all the values which h has not fire into 0 and fire to 1 This is bascially one hot encoding with categories only To apply one hot encoding the categories should be independent or it should be nominal category
Now we now handling the categorical nan value For categorical data we replace nan values with mode and for numerical with mean
Separating columns as categorical and numerical
Univariate Analysis on Numerical Features
Univariate Analysis on Categorical Feature
Outlier Detection