Table of contents
Introduction
All other data visualization libraries are built on matplotlib
Types Of Data
Categorical Data: Data that can be categorized / groups can be made
Numerical Data: Data that cannot be categorized / cannot be grouped
Types Of Analysis
1. Univariate Analysis ---- analyzing single columns
2. Bivariate analysis --- analyzing two columns
3. Multivariate Analysis --analyzing multiple columns ---- more than 2
2-d Line plot
Used for bivariate Analysis
Type Of Data
1. Numerical - Numerical --- In most case , we do this
2. Categorical - Numerical
Used in time series data ---- data measure with respect to time ---most popular use case
example:
plt.plot()
Giving title and x-axis name and y-axis name
plt.title('Rohit Sharma Vs Virat Kohli Career Comparison') ## will add the title to entire table plt.xlabel('Season') plt.ylabel('Runs Scored')
Adding Color
# colors(hex) and line(width and style) and marker(size) plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F') plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6') plt.title('Rohit Sharma Vs Virat Kohli Career Comparison') plt.xlabel('Season') plt.ylabel('Runs Scored')
Change Line style
plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F',linestyle='solid',linewidth=3) plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6',linestyle='dashdot',linewidth=2) plt.title('Rohit Sharma Vs Virat Kohli Career Comparison') plt.xlabel('Season') plt.ylabel('Runs Scored')
Marker
A 2d plot is drawn by marking the x & y co-ordinate and then join using a line
Changing Marker and line-Width & controlling marker size
plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F',linestyle='solid',linewidth=3,marker='D',markersize=10) plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6',linestyle='dashdot',linewidth=2,marker='o') plt.title('Rohit Sharma Vs Virat Kohli Career Comparison') plt.xlabel('Season') plt.ylabel('Runs Scored')
Giving a label to each line
# legend -> location plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F',linestyle='solid',linewidth=3,marker='D',markersize=10,label='Virat') plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6',linestyle='dashdot',linewidth=2,marker='o',label='Rohit') plt.title('Rohit Sharma Vs Virat Kohli Career Comparison') plt.xlabel('Season') plt.ylabel('Runs Scored') plt.legend(loc='upper right')## this will prin the label and loc can used to change the position of the legend
Changing the limit of values
# limiting axes price = [48000,54000,57000,49000,47000,45000,4500000] year = [2015,2016,2017,2018,2019,2020,2021] plt.plot(year,price) plt.ylim(0,75000) plt.xlim(2017,2019)
Switching On Grids
plt.grid() # grids get activated plt.show()
Scatter Plots
Bivariate Analysis
numerical vs numerical
Used for finding the co-relation between two quantities
plt.scatter()
3-d plot using scatter
```python-repl
size
tips = sns.load_dataset('tips')
# slower
plt.scatter(tips['total_bill'],tips['tip'],s=tips['size']*20)
```
Bar Chart
x-axisCategorical vs y-axis aggregate function
horizontal bars-- more number of categories (greater than 5 )
# horizontal bar chart plt.barh(colors,children,color='black') ## does not work well for multiple bar chart
Histogram
Used for univariate analysis
Numerical Column
Used for frequency count