Matplotlib

Matplotlib

Introduction

  1. All other data visualization libraries are built on matplotlib

  2. Types Of Data

    1. Categorical Data: Data that can be categorized / groups can be made

    2. Numerical Data: Data that cannot be categorized / cannot be grouped

  3. Types Of Analysis

    1. Univariate Analysis ---- analyzing single columns

    2. Bivariate analysis --- analyzing two columns

    3. Multivariate Analysis --analyzing multiple columns ---- more than 2

2-d Line plot

  1. Used for bivariate Analysis

  2. Type Of Data

    1. Numerical - Numerical --- In most case , we do this

    2. Categorical - Numerical

  3. Used in time series data ---- data measure with respect to time ---most popular use case

    example:

     plt.plot()
    
  4. Giving title and x-axis name and y-axis name

     plt.title('Rohit Sharma Vs Virat Kohli Career Comparison')
     ## will add the title to entire table
     plt.xlabel('Season')
     plt.ylabel('Runs Scored')
    
  5. Adding Color

     # colors(hex) and line(width and style) and marker(size)
     plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F')
     plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6')
    
     plt.title('Rohit Sharma Vs Virat Kohli Career Comparison')
     plt.xlabel('Season')
     plt.ylabel('Runs Scored')
    
  6. Change Line style

     plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F',linestyle='solid',linewidth=3)
     plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6',linestyle='dashdot',linewidth=2)
    
     plt.title('Rohit Sharma Vs Virat Kohli Career Comparison')
     plt.xlabel('Season')
     plt.ylabel('Runs Scored')
    
  7. Marker

    A 2d plot is drawn by marking the x & y co-ordinate and then join using a line

  8. Changing Marker and line-Width & controlling marker size

     plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F',linestyle='solid',linewidth=3,marker='D',markersize=10)
     plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6',linestyle='dashdot',linewidth=2,marker='o')
    
     plt.title('Rohit Sharma Vs Virat Kohli Career Comparison')
     plt.xlabel('Season')
     plt.ylabel('Runs Scored')
    
  9. Giving a label to each line

     # legend -> location
     plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F',linestyle='solid',linewidth=3,marker='D',markersize=10,label='Virat')
     plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6',linestyle='dashdot',linewidth=2,marker='o',label='Rohit')
    
     plt.title('Rohit Sharma Vs Virat Kohli Career Comparison')
     plt.xlabel('Season')
     plt.ylabel('Runs Scored')
    
     plt.legend(loc='upper right')## this will prin the label and loc can used to change the position of the legend
    
  10. Changing the limit of values

    # limiting axes
    price = [48000,54000,57000,49000,47000,45000,4500000]
    year = [2015,2016,2017,2018,2019,2020,2021]
    
    plt.plot(year,price)
    plt.ylim(0,75000)
    plt.xlim(2017,2019)
    
  11. Switching On Grids

    plt.grid() # grids get activated 
    plt.show()
    

Scatter Plots

  1. Bivariate Analysis

  2. numerical vs numerical

  3. Used for finding the co-relation between two quantities

     plt.scatter()
    
  4. 3-d plot using scatter

    ```python-repl

    size

    tips = sns.load_dataset('tips')

# slower
plt.scatter(tips['total_bill'],tips['tip'],s=tips['size']*20)
```

Bar Chart

  1. x-axisCategorical vs y-axis aggregate function

  2. horizontal bars-- more number of categories (greater than 5 )

     # horizontal bar chart
     plt.barh(colors,children,color='black')
     ## does not work well for multiple bar chart
    

Histogram

  1. Used for univariate analysis

  2. Numerical Column

  3. Used for frequency count