The 'DataFrame' object¶

A DataFrame object is a multi-dimensional table-like data structure (similar to a multidimensional array) containing a labeled collection of columns, each of which can be a different value type (numeric, string, boolean, etc.).

Example¶

Suppose that now your country data table includes additionally: (a) the full country name and (b) the percentage of world population that each country population corresponds to.
Thus, for each country you have a row of data (instead of just a column entry as in a Series object).
Suppose also that your country data are still label-indexed by the country two-letter code. The DataFrame object for this data representation would look like the following:
- Data for each country in rows
- Index labels (country code) for each country (row)
- Names for each column

(data for this table come from: Wikipedia: List of countries and dependencies by population)

Series

Constructing a DataFrame object¶

To better understand how to construct a DataFrame object consider the following first:
- The DataFrame has many rows and columns; each column can be thought of as a Series object and all columns (Series objects) share the same label indexes.
- As with the Series, the most convenient way to construct a DataFrame is from a dictionary with 'column_name:columns' as 'key:value' pairs. This dictionary is then passed as argument to the DataFrame constructor.
- Remember also that usually you will need to read data from an external file. Understanding, however, how to construct a DataFrame using a dictionary is useful as well.

import pandas as pd

data = {'country': ['Italy','Spain','Greece','France','Portugal'],
        'popu': [61, 46, 11, 65, 10],
        'percent': [0.83,0.63,0.15,0.88,0.14]}

df = pd.DataFrame(data, index=['ITA', 'ESP', 'GRC', 'FRA', 'PRT'])
print(df)

      country  percent  popu
ITA     Italy     0.83    61
ESP     Spain     0.63    46
GRC    Greece     0.15    11
FRA    France     0.88    65
PRT  Portugal     0.14    10

Observe the way that the 'df' DataFrame is printed: columns do not necessarily appear in the order they were declared (of course, you can later change the column order as you like)

DataFrame from 'nested' dictionary¶

If a nested dictionary is passed to DataFrame constructor, then:
- the outer dict keys will become columns and
- the inner keys will become the row indices

# Data are on unemployment percentage in GRC and ESP

import pandas as pd

un_data = {'GRC':{'2013':25, '2014':26, '2015':24.5},
           'ESP':{'2013':23, '2014':27, '2015':26.5}}

df = pd.DataFrame(un_data)
df

Loading data from external file¶

Import from xls, xlsx¶

Use the read_excel() function to import data from xls/xlsx formatted file.
- In the following example, tha data refer to world countries population (source: "The World Bank")
- The 'sampledata.xls' file used is available here

import pandas as pd

df = pd.read_excel("../../data/sampledata.xls",sheetname="Data-8",\
                   skiprows=3, header=0, index_col=0)
df

We call the read_excel() mathod passing the following arguments:
- a) "../../data/sampledata.xls": the path and name of the data file (../../ denotes moving two levels above the folder where our code file is)
- b) sheetname="Data-8": give the name of the sheet we want to read (note that there are also other sheets in the xls file)
- c) skiprows=3: we skipped 3 rows in the beginning of the file that contain no data (check the xls file to see these rows)
- d) header=0: we defined as column names the first row after skipping (zero-indexed; in this case it would be the same even without setting the 'header' argument)
- e) _indexcol=0: finally, we define as label indexes the first (zero-indexed) column with country codes (if this was not set then new integer-based indexes would have been assigned - try it out in the code)
Read more about read_excel() here

Import from csv¶

In the following example we use the read_csv() function to read the same data from a csv formatted file
A csv file is a file where data are separated with ',' as delimiter (csv = 'comma separated values')
- The 'sampledata.csv' file used is available here

import pandas as pd

df = pd.read_csv("../../data/sampledata.csv",sep=',',skiprows=3, header=0, index_col=0)
df

As you can see we passed the same arguments and additionally sep=','
- Although it is not needed in this case, it is useful to know that you can explicitely state in read_csv() what the delimiter is. Assigning a delimiter value through 'sep' makes it easy to read data from csv files with various types of delimiters (for example ';').
Read more about the read_csv() function here

Copyright¶

. Free learning material
. See full copyright and disclaimer notice

	Country	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015
Code
ESP	Spain	44397319	45226803	45954106	46362946	46576897	46742697	46773055	46620045	46480882	46418269
FRA	France	63621376	64016229	64374990	64707044	65027512	65342776	65659790	65972097	66495940	66808385
GRC	Greece	11020362	11048473	11077841	11107017	11121341	11104899	11045011	10965211	10892413	10823732
IRL	Ireland	4273591	4398942	4489544	4535375	4560155	4576794	4586897	4598294	4617225	4640703
ITA	Italy	58143979	58438310	58826731	59095365	59277417	59379449	59539717	60233948	60789140	60802085
MLT	Malta	405308	406724	409379	412477	414508	416268	419455	423374	427364	431333
PRT	Portugal	10522288	10542964	10558177	10568247	10573100	10557560	10514844	10457295	10401062	10348648
CYP	Cyprus	1048293	1063040	1077010	1090486	1103685	1116644	1129303	1141652	1153658	1165300

	Country	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015
Code
ESP	Spain	44397319	45226803	45954106	46362946	46576897	46742697	46773055	46620045	46480882	46418269
FRA	France	63621376	64016229	64374990	64707044	65027512	65342776	65659790	65972097	66495940	66808385
GRC	Greece	11020362	11048473	11077841	11107017	11121341	11104899	11045011	10965211	10892413	10823732
IRL	Ireland	4273591	4398942	4489544	4535375	4560155	4576794	4586897	4598294	4617225	4640703
ITA	Italy	58143979	58438310	58826731	59095365	59277417	59379449	59539717	60233948	60789140	60802085
MLT	Malta	405308	406724	409379	412477	414508	416268	419455	423374	427364	431333
PRT	Portugal	10522288	10542964	10558177	10568247	10573100	10557560	10514844	10457295	10401062	10348648
CYP	Cyprus	1048293	1063040	1077010	1090486	1103685	1116644	1129303	1141652	1153658	1165300

	ESP	GRC
2013	23.0	25.0
2014	27.0	26.0
2015	26.5	24.5