Home      |       Contents       |       About

Prev: The 'DataFrame' object       |      Next: Modifying Columns in DataFrame

Data access in DataFrame

  • Accessing data in a DataFrame object can be done in many ways. Below we present the most important of them, that enable accessing data:
    • 1) By column
    • 2) By row (.ix, .iloc, .loc)
    • 3) By scalar values (.at, .iat)

First make sure that you run the code below so that df is properly constructed for the following examples to run smoothly

In [1]:
import pandas as pd

df = pd.read_csv("../../data/sampledata.csv",sep=',',skiprows=3, header=0, index_col=0)
df
Out[1]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
ESP Spain 44397319 45226803 45954106 46362946 46576897 46742697 46773055 46620045 46480882 46418269
FRA France 63621376 64016229 64374990 64707044 65027512 65342776 65659790 65972097 66495940 66808385
GRC Greece 11020362 11048473 11077841 11107017 11121341 11104899 11045011 10965211 10892413 10823732
IRL Ireland 4273591 4398942 4489544 4535375 4560155 4576794 4586897 4598294 4617225 4640703
ITA Italy 58143979 58438310 58826731 59095365 59277417 59379449 59539717 60233948 60789140 60802085
MLT Malta 405308 406724 409379 412477 414508 416268 419455 423374 427364 431333
PRT Portugal 10522288 10542964 10558177 10568247 10573100 10557560 10514844 10457295 10401062 10348648
CYP Cyprus 1048293 1063040 1077010 1090486 1103685 1116644 1129303 1141652 1153658 1165300

1. Accessing data by column

  • Selecting columns can be done in two ways:
    • a) Using the column names as indexes
    • b) Using the position (integer index) of the columns

a) Using column names

  • Just provide the column name as index, the way you would do it in a dictionary
In [2]:
c = df['Country']
print(type(c))
c
<class 'pandas.core.series.Series'>
Out[2]:
Code
ESP       Spain
FRA      France
GRC      Greece
IRL     Ireland
ITA       Italy
MLT       Malta
PRT    Portugal
CYP      Cyprus
Name: Country, dtype: object
  • You may provide the column label as a list, in which case you get a new DataFrame
In [3]:
c = df[['Country']]
print(type(c))
c
<class 'pandas.core.frame.DataFrame'>
Out[3]:
Country
Code
ESP Spain
FRA France
GRC Greece
IRL Ireland
ITA Italy
MLT Malta
PRT Portugal
CYP Cyprus
  • In case a column name is a single non-digit string, using the dot notation will work the same
In [4]:
df.Country
# but not this: df.2007
Out[4]:
Code
ESP       Spain
FRA      France
GRC      Greece
IRL     Ireland
ITA       Italy
MLT       Malta
PRT    Portugal
CYP      Cyprus
Name: Country, dtype: object
  • For more columns privide a list of column names (note the double brackets)
In [5]:
df[['Country', '2007', '2015']]
Out[5]:
Country 2007 2015
Code
ESP Spain 45226803 46418269
FRA France 64016229 66808385
GRC Greece 11048473 10823732
IRL Ireland 4398942 4640703
ITA Italy 58438310 60802085
MLT Malta 406724 431333
PRT Portugal 10542964 10348648
CYP Cyprus 1063040 1165300

b) Using column positions

  • Provide a list of column position. Note the list (not simply an indeger index)
In [6]:
df[[1]]
Out[6]:
2006
Code
ESP 44397319
FRA 63621376
GRC 11020362
IRL 4273591
ITA 58143979
MLT 405308
PRT 10522288
CYP 1048293
  • For more columns privide a list of column positions (note the double brackets)
In [7]:
df[[1, 3, 5]]
Out[7]:
2006 2008 2010
Code
ESP 44397319 45954106 46576897
FRA 63621376 64374990 65027512
GRC 11020362 11077841 11121341
IRL 4273591 4489544 4560155
ITA 58143979 58826731 59277417
MLT 405308 409379 414508
PRT 10522288 10558177 10573100
CYP 1048293 1077010 1103685
  • List comprehensions can also be used to define a list of integer indices
In [9]:
df[[i for i in range(3)]]
Out[9]:
Country 2006 2007
Code
ESP Spain 44397319 45226803
FRA France 63621376 64016229
GRC Greece 11020362 11048473
IRL Ireland 4273591 4398942
ITA Italy 58143979 58438310
MLT Malta 405308 406724
PRT Portugal 10522288 10542964
CYP Cyprus 1048293 1063040

2. Accessing data by row

a) Using the .loc indexer for selection by label

  • Provide a list with row labels
In [10]:
# Note that a row is now constructed as a new DataFrame
gr = df.loc[['GRC']]
print(type(gr))
gr
<class 'pandas.core.frame.DataFrame'>
Out[10]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
GRC Greece 11020362 11048473 11077841 11107017 11121341 11104899 11045011 10965211 10892413 10823732
  • You can provide a single label as an single index (not in a list). In this case the row is now constructed as a new Series object
In [11]:
gr = df.loc['GRC']
print(type(gr))
gr
<class 'pandas.core.series.Series'>
Out[11]:
Country      Greece
2006       11020362
2007       11048473
2008       11077841
2009       11107017
2010       11121341
2011       11104899
2012       11045011
2013       10965211
2014       10892413
2015       10823732
Name: GRC, dtype: object
  • More rows: provide the list of labels
In [12]:
df.loc[['GRC','ESP','PRT']]
Out[12]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
GRC Greece 11020362 11048473 11077841 11107017 11121341 11104899 11045011 10965211 10892413 10823732
ESP Spain 44397319 45226803 45954106 46362946 46576897 46742697 46773055 46620045 46480882 46418269
PRT Portugal 10522288 10542964 10558177 10568247 10573100 10557560 10514844 10457295 10401062 10348648
  • Slicing rows is also possible (note that single brackets are used)
In [13]:
df.loc['ESP':'GRC']
# You also get a new DataFrame
Out[13]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
ESP Spain 44397319 45226803 45954106 46362946 46576897 46742697 46773055 46620045 46480882 46418269
FRA France 63621376 64016229 64374990 64707044 65027512 65342776 65659790 65972097 66495940 66808385
GRC Greece 11020362 11048473 11077841 11107017 11121341 11104899 11045011 10965211 10892413 10823732

b) Using the .iloc indexer for selection by position

  • Provide the list of row positions
In [14]:
gr = df.iloc[[2]]
print(type(gr))
gr
<class 'pandas.core.frame.DataFrame'>
Out[14]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
GRC Greece 11020362 11048473 11077841 11107017 11121341 11104899 11045011 10965211 10892413 10823732
  • You can provide a single label as an single index (not in a list). In this case the row is now constructed as a new Series object
In [15]:
esp = df.iloc[0]
print(type(esp))
esp
<class 'pandas.core.series.Series'>
Out[15]:
Country       Spain
2006       44397319
2007       45226803
2008       45954106
2009       46362946
2010       46576897
2011       46742697
2012       46773055
2013       46620045
2014       46480882
2015       46418269
Name: ESP, dtype: object
  • More rows: provide the list of integer position indexes
In [16]:
df.iloc[[2,4,6]]
Out[16]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
GRC Greece 11020362 11048473 11077841 11107017 11121341 11104899 11045011 10965211 10892413 10823732
ITA Italy 58143979 58438310 58826731 59095365 59277417 59379449 59539717 60233948 60789140 60802085
PRT Portugal 10522288 10542964 10558177 10568247 10573100 10557560 10514844 10457295 10401062 10348648
  • Slicing rows is also possible (note that single brackets are used)
In [17]:
df.iloc[2:5]
Out[17]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
GRC Greece 11020362 11048473 11077841 11107017 11121341 11104899 11045011 10965211 10892413 10823732
IRL Ireland 4273591 4398942 4489544 4535375 4560155 4576794 4586897 4598294 4617225 4640703
ITA Italy 58143979 58438310 58826731 59095365 59277417 59379449 59539717 60233948 60789140 60802085

c) Using the .ix indexer

  • .ix is the most general indexer and supports bot inputs in label and integer format. Before using it however it is advisable to read the pandas documentation on .ix (here)
In [18]:
# .ix with row label
df.ix[['PRT']]
Out[18]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
PRT Portugal 10522288 10542964 10558177 10568247 10573100 10557560 10514844 10457295 10401062 10348648
In [19]:
# .ix with list of row positions
df.ix[[0,2,4]]
Out[19]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
ESP Spain 44397319 45226803 45954106 46362946 46576897 46742697 46773055 46620045 46480882 46418269
GRC Greece 11020362 11048473 11077841 11107017 11121341 11104899 11045011 10965211 10892413 10823732
ITA Italy 58143979 58438310 58826731 59095365 59277417 59379449 59539717 60233948 60789140 60802085
In [20]:
# Slicing with labels
df.ix['ESP':'GRC']
Out[20]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
ESP Spain 44397319 45226803 45954106 46362946 46576897 46742697 46773055 46620045 46480882 46418269
FRA France 63621376 64016229 64374990 64707044 65027512 65342776 65659790 65972097 66495940 66808385
GRC Greece 11020362 11048473 11077841 11107017 11121341 11104899 11045011 10965211 10892413 10823732
In [21]:
# Slicing with positions
df.ix[3:5]
Out[21]:
Country 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Code
IRL Ireland 4273591 4398942 4489544 4535375 4560155 4576794 4586897 4598294 4617225 4640703
ITA Italy 58143979 58438310 58826731 59095365 59277417 59379449 59539717 60233948 60789140 60802085

Accessing scalars (single data items)

  • Selecting scalars can be done in two major ways:
    • a) Using the [ ] notation to provide column & row indexes
    • b) Using the .at and .iat indexers

Using the [ ] notation

In [22]:
ita = df['Country']['ITA']
ita
Out[22]:
'Italy'
In [24]:
# Also works without brackets for non-digit string indices
esp = df.Country.ESP
esp
Out[24]:
'Spain'

Using the .at indexer

  • Provide first the row label and then the column name (remember: .at is a non-integer indexer)
In [25]:
d = df.at['FRA','Country'] 
d
Out[25]:
'France'
In [26]:
d = df.at['FRA','2015'] 
d
Out[26]:
66808385

Using the .iat indexer

In [27]:
d = df.iat[0,0] 
d
Out[27]:
'Spain'
In [28]:
d = df.iat[7,9] 
d
Out[28]:
1153658
In [29]:
x = 0
for i in range(len(df)):
    for n in range(len(df.columns)):
        if i==n: 
            print(df.iat[i,n])
Spain
63621376
11048473
4489544
59095365
414508
10557560
1129303
  • Obviously you can use .ix (also: .loc & .iloc) indexers along with column indexing to select single data
In [30]:
m = df.ix['MLT']['Country']
m
Out[30]:
'Malta'
In [31]:
m = df.ix['MLT'].Country
m
Out[31]:
'Malta'
In [32]:
m = df.ix['MLT']['2010']
m
Out[32]:
414508

. Free learning material
. See full copyright and disclaimer notice