Home      |       Contents       |       About

Prev: Modifying Rows in DataFrame       |      Next: The power of 'GroupBy'

Plotting Series and DataFrame objects

  • Plotting the data of a Series or DataFrame object can be accomplished by using the matplotlib.pyplot methods and functions.
  • However, as of version 0.17.0 pandas objects Series and DataFrame come equipped with their own .plot() methods. Thus, if you have a Series or DataFrame type object (let's say 's' or 'df') you can call the plot method by writing:
      s.plot() or df.plot()  
  • Keep in mind that in order to be flexible, the plot() method accepts a considerable number of arguments that can only be learned by practicing various plotting scenarios.
  • The examples below demonstrate common scenarios for plotting data in Series and DataFrame objects.

1. Plot Series object

  • Most often we would like to plot a Series obect using the index labels as axis-x and the data values as axis-y. This can be easily accomplished with the matplotlib.pyplot plot() method
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

s = pd.Series([np.random.randint(1,100) for i in range(1,100)])
p = plt.plot(s.index, s.values)
In [18]:
s = pd.Series(np.sin(np.linspace(0, 2*np.pi, 100)), index=np.linspace(0, 2*np.pi, 100))
plt.xticks(np.linspace(0, 2*np.pi, 5),('0', 'Ï€/2', 'Ï€', '3Ï€/2', '2Ï€'))
p = plt.plot(s.index, s.values)
In [93]:
s = pd.Series([np.random.randint(1,100) for i in range(1,100)])
n, bins, patches = plt.hist(s.values, color='cyan', alpha=0.5)
  • Alternatively you can use the Series.plot() method
In [97]:
s = pd.Series([np.random.randint(1,100) for i in range(1,100)])
p = s.plot(kind='hist', color='r', alpha=0.5)

See the Series.plot() documentation for explanations on the arguments that the method accepts.

2. Plot DataFrame object

First make sure that you run the code below so that df is properly constructed to further run the examples

In [22]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv("../../data/sampledata.csv",sep=',',skiprows=3, header=0, index_col=0)

#delete column 'Country' to only have columns with float data 
del df['Country']
df = df/1e6
df
Out[22]:
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
ESP 44.397319 45.226803 45.954106 46.362946 46.576897 46.742697 46.773055 46.620045 46.480882 46.418269
FRA 63.621376 64.016229 64.374990 64.707044 65.027512 65.342776 65.659790 65.972097 66.495940 66.808385
GRC 11.020362 11.048473 11.077841 11.107017 11.121341 11.104899 11.045011 10.965211 10.892413 10.823732
IRL 4.273591 4.398942 4.489544 4.535375 4.560155 4.576794 4.586897 4.598294 4.617225 4.640703
ITA 58.143979 58.438310 58.826731 59.095365 59.277417 59.379449 59.539717 60.233948 60.789140 60.802085
MLT 0.405308 0.406724 0.409379 0.412477 0.414508 0.416268 0.419455 0.423374 0.427364 0.431333
PRT 10.522288 10.542964 10.558177 10.568247 10.573100 10.557560 10.514844 10.457295 10.401062 10.348648
CYP 1.048293 1.063040 1.077010 1.090486 1.103685 1.116644 1.129303 1.141652 1.153658 1.165300

Extract and plot one Column data

  • Use .xs() DataFrame method (and pass the name of the column as argument) to get the column as a Series object. Remember to also set the 'axis=1' argument to denote the column direction.
In [25]:
s = df.xs('2015', axis=1)
s 
Out[25]:
Code
ESP    46.418269
FRA    66.808385
GRC    10.823732
IRL     4.640703
ITA    60.802085
MLT     0.431333
PRT    10.348648
CYP     1.165300
Name: 2015, dtype: float64
  • Call Series.plot() to plot the Series object
In [26]:
# plotting a barchart
p1 = s.plot(kind='bar', title='Countries population in millions in 2005', yticks=[10, 40, 80])

Extract and plot one Row data

  • Use .xs() again (but with 'axis=0' or no axis at all: defaults to 0) and pass the index row as argument to get the row as a Series object.
In [27]:
s = df.xs('ESP')
p = s.plot()
In [28]:
p1 = s.plot(kind='bar', ylim=(44,47))
p2 = s.plot(kind='line', colormap='Reds_r', title='Population of Spain')

Plotting DataFrame Column vs. Column

In [52]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

data = {'country': ['Italy','Spain','Greece','France','Portugal'],
        'popu': [61, 46, 11, 65, 10],
        'percent': [0.83,0.63,0.15,0.88,0.14],
        'area': [301,506,132,641,92]}

df = pd.DataFrame(data, index=['ITA', 'ESP', 'GRC', 'FRA', 'PRT'])
df.sort_values('country', axis=0, ascending=True, inplace=True)
df
Out[52]:
area country percent popu
FRA 641 France 0.88 65
GRC 132 Greece 0.15 11
ITA 301 Italy 0.83 61
PRT 92 Portugal 0.14 10
ESP 506 Spain 0.63 46
In [63]:
df.plot('area', 'popu', kind='scatter')
df.plot(df.index, 'area', kind='bar')
Out[63]:
<matplotlib.axes._subplots.AxesSubplot at 0xb1f8780>
In [72]:
# multiple plots 
fig, axes = plt.subplots(nrows=2, ncols=1)

p1 = df['popu'].plot(kind='bar', ax=axes[0], figsize=(10, 10), title='Population', alpha=0.5)
p2 = df['area'].plot(kind='bar', ax=axes[1], figsize=(10, 10), title='Area', color='r', alpha=0.5)

See the DataFrame.plot() documentation for explanations on the arguments that the method accepts.

. Free learning material
. See full copyright and disclaimer notice