Prev: Data access in DataFrame | Next: Modifying Rows in DataFrame
Although DataFrames are meant to be populated by reading already organized data from external files, many times you will need to somehow manage and modify already existing columns (and rows) in a DF. Here, I present some of the most commonly used operations for managing columns, including how to:
We are going to use the following DataFrame object in our examples
import numpy as np
import pandas as pd
data = {'country': ['Italy','Spain','Greece','France','Portugal'],
'popu': [61, 46, 11, 65, 10],
'percent': [0.83,0.63,0.15,0.88,0.14]}
df = pd.DataFrame(data, index=['ITA', 'ESP', 'GRC', 'FRA', 'PRT'])
df
# Rename 'popu' column to 'population'
dfnew = df.rename(columns={'popu': 'population'})
dfnew
# Add a list as a new column
dfnew['capital city'] = ['Rome','Madrid','Athens','Paris','Lisbon']
dfnew
# Add an array as a new column
ar = np.array([39,34,30,33,351])
ar
dfnew['Calling code'] = ar
dfnew
# Add a Series array as a new column
# When adding a Series data are automatically aligned based on index
ser = pd.Series(['es','it','fr','pt','gr'], index = ['ESP','ITA','FRA','PRT','GRC'])
dfnew['Internet domain'] = ser
dfnew
# Delete using del
del dfnew['Internet domain']
dfnew
# Delete using drop()
dfdrop = dfnew.drop(['Calling code'], axis=1)
dfdrop
# Note that the first column is in position with index '0'
ser = pd.Series(['es','it','fr','pt','gr'], index = ['ESP','ITA','FRA','PRT','GRC'])
dfnew.insert(1,'Internet domains',ser)
dfnew
# Get the DataFrame column names as a list
clist = list(dfnew.columns)
# Rearrange list the way you like
clist_new = clist[-1:]+clist[:-1] # brings the last column in the first place
# Pass the new list to the DataFrame - like a key list in a dict
dfnew = dfnew[clist_new]
dfnew
clist = ['country','capital city','Internet domains','population','percent','Calling code']
dfnew = dfnew[clist]
dfnew
import numpy as np
import pandas as pd
data = {'country': ['Italy','Spain','Greece','France','Portugal'],
'popu': [61, 46, 11, 65, 10],
'percent': [0.83,0.63,0.15,0.88,0.14]}
df = pd.DataFrame(data, index=['ITA', 'ESP', 'GRC', 'FRA', 'PRT'])
df.percent = '-' # A single value 'propagates' to all column cells
df
df.percent = 0.001*df.popu # Data in 'percent' and 'popu' columns are autonatically aligned
df
. Free learning material
. See full copyright and disclaimer notice