Home      |       Contents       |       About

Prev: Data access in DataFrame       |      Next: Modifying Rows in DataFrame

Modifying Columns in DataFrame

Although DataFrames are meant to be populated by reading already organized data from external files, many times you will need to somehow manage and modify already existing columns (and rows) in a DF. Here, I present some of the most commonly used operations for managing columns, including how to:

  1. Rename columns
  2. Add columns
  3. Delete columns
  4. Insert/Rearrange columns
  5. Replace column contents

Set up a sample DataFrame

We are going to use the following DataFrame object in our examples

In [2]:
import numpy as np 
import pandas as pd

data = {'country': ['Italy','Spain','Greece','France','Portugal'],
        'popu': [61, 46, 11, 65, 10],
        'percent': [0.83,0.63,0.15,0.88,0.14]}

df = pd.DataFrame(data, index=['ITA', 'ESP', 'GRC', 'FRA', 'PRT'])
df
Out[2]:
country percent popu
ITA Italy 0.83 61
ESP Spain 0.63 46
GRC Greece 0.15 11
FRA France 0.88 65
PRT Portugal 0.14 10

1. Rename columns

  • Use rename() method of the DataFrame to change the name of a column
  • See rename() documentation here
In [3]:
# Rename 'popu' column to 'population'
dfnew = df.rename(columns={'popu': 'population'})
dfnew
Out[3]:
country percent population
ITA Italy 0.83 61
ESP Spain 0.63 46
GRC Greece 0.15 11
FRA France 0.88 65
PRT Portugal 0.14 10

2. Add columns

  • You can add a column to DataFrame object by assigning an array-like object (list, ndarray, Series) to a new column using the [ ] operator. This will modify the DataFrame 'in place' (no copy constructed)
In [4]:
# Add a list as a new column 
dfnew['capital city'] = ['Rome','Madrid','Athens','Paris','Lisbon']
dfnew
Out[4]:
country percent population capital city
ITA Italy 0.83 61 Rome
ESP Spain 0.63 46 Madrid
GRC Greece 0.15 11 Athens
FRA France 0.88 65 Paris
PRT Portugal 0.14 10 Lisbon
In [5]:
# Add an array as a new column 
ar = np.array([39,34,30,33,351])
ar
dfnew['Calling code'] = ar
dfnew
Out[5]:
country percent population capital city Calling code
ITA Italy 0.83 61 Rome 39
ESP Spain 0.63 46 Madrid 34
GRC Greece 0.15 11 Athens 30
FRA France 0.88 65 Paris 33
PRT Portugal 0.14 10 Lisbon 351
In [6]:
# Add a Series array as a new column 
# When adding a Series data are automatically aligned based on index 
ser = pd.Series(['es','it','fr','pt','gr'], index = ['ESP','ITA','FRA','PRT','GRC'])
dfnew['Internet domain'] = ser
dfnew
Out[6]:
country percent population capital city Calling code Internet domain
ITA Italy 0.83 61 Rome 39 it
ESP Spain 0.63 46 Madrid 34 es
GRC Greece 0.15 11 Athens 30 gr
FRA France 0.88 65 Paris 33 fr
PRT Portugal 0.14 10 Lisbon 351 pt

3. Delete columns

In [7]:
# Delete using del 
del dfnew['Internet domain']
dfnew
Out[7]:
country percent population capital city Calling code
ITA Italy 0.83 61 Rome 39
ESP Spain 0.63 46 Madrid 34
GRC Greece 0.15 11 Athens 30
FRA France 0.88 65 Paris 33
PRT Portugal 0.14 10 Lisbon 351
In [8]:
# Delete using drop() 
dfdrop = dfnew.drop(['Calling code'], axis=1)
dfdrop
Out[8]:
country percent population capital city
ITA Italy 0.83 61 Rome
ESP Spain 0.63 46 Madrid
GRC Greece 0.15 11 Athens
FRA France 0.88 65 Paris
PRT Portugal 0.14 10 Lisbon

4. Insert/Rearrange columns

  • Use the insert() method of the DataFrame to insert a column in a specific position
  • See insert() documentation here
In [9]:
# Note that the first column is in position with index '0'
ser = pd.Series(['es','it','fr','pt','gr'], index = ['ESP','ITA','FRA','PRT','GRC'])
dfnew.insert(1,'Internet domains',ser)
dfnew
Out[9]:
country Internet domains percent population capital city Calling code
ITA Italy it 0.83 61 Rome 39
ESP Spain es 0.63 46 Madrid 34
GRC Greece gr 0.15 11 Athens 30
FRA France fr 0.88 65 Paris 33
PRT Portugal pt 0.14 10 Lisbon 351

Rearrange

In [10]:
# Get the DataFrame column names as a list
clist = list(dfnew.columns)

# Rearrange list the way you like 
clist_new = clist[-1:]+clist[:-1]   # brings the last column in the first place

# Pass the new list to the DataFrame - like a key list in a dict 
dfnew = dfnew[clist_new]
dfnew
Out[10]:
Calling code country Internet domains percent population capital city
ITA 39 Italy it 0.83 61 Rome
ESP 34 Spain es 0.63 46 Madrid
GRC 30 Greece gr 0.15 11 Athens
FRA 33 France fr 0.88 65 Paris
PRT 351 Portugal pt 0.14 10 Lisbon
  • Alternatively you can write the column header list the way you like and pass it to the DataFrame object
In [11]:
clist = ['country','capital city','Internet domains','population','percent','Calling code']
dfnew = dfnew[clist]
dfnew
Out[11]:
country capital city Internet domains population percent Calling code
ITA Italy Rome it 61 0.83 39
ESP Spain Madrid es 46 0.63 34
GRC Greece Athens gr 11 0.15 30
FRA France Paris fr 65 0.88 33
PRT Portugal Lisbon pt 10 0.14 351

5. Replace column contents

  • Use the [ ] notation to assign new values to a column.
  • New value can either be scalar (it 'propagates' throughout the column cells) or a vector (array-like object) of the same size as the column
In [18]:
import numpy as np 
import pandas as pd

data = {'country': ['Italy','Spain','Greece','France','Portugal'],
        'popu': [61, 46, 11, 65, 10],
        'percent': [0.83,0.63,0.15,0.88,0.14]}

df = pd.DataFrame(data, index=['ITA', 'ESP', 'GRC', 'FRA', 'PRT'])
df.percent = '-'     # A single value 'propagates' to all column cells
df
Out[18]:
country percent popu
ITA Italy - 61
ESP Spain - 46
GRC Greece - 11
FRA France - 65
PRT Portugal - 10
In [19]:
df.percent = 0.001*df.popu   # Data in 'percent' and 'popu' columns are autonatically aligned  
df
Out[19]:
country percent popu
ITA Italy 0.061 61
ESP Spain 0.046 46
GRC Greece 0.011 11
FRA France 0.065 65
PRT Portugal 0.010 10

. Free learning material
. See full copyright and disclaimer notice