Home      |       Contents       |       About

Prev: -       |      Next: The 'Series' object

What is pandas?

  • pandas (always written in lower case) is an open-source high-level and high-performance library for data analysis in the Python ecosystem.
  • Although numpy already offers the ndarray object and many relevant functions for data processing, working with numpy remains at a low performance level as the data representation is most of the time not user friendly.
  • By contrast, pandas provides easy-to-use labeled data structures and high level data manipulation tools (based on Python dictionaries and numpy arrays) rendering data analysis and processing extremely more understandable and efficient for humans.
  • Overall, pandas is built on top of numpy and offers to scientist/programmer sophisticated data manipulation features that are expected to significantly increase productivity.
  • The library name comes from panel data, a term for multidimensional data sets in statistics and econometrics.
  • Read more at the pandas homepage

Importing pandas in your code

  • As with other libraries before, it is advisable to clearly distinguish between namespaces and import pandas as pd in your code.
In [1]:
import pandas as pd 

Starting with pandas

  • To get started with pandas, you need to get a good understanding of the two major data structures: Series and DataFrame.
  • They are so important that sometimes they may be imported separately in the script:
In [2]:
from pandas import Series 
from pandas import DataFrame 
  • However, in all examples that follow we stick to the basic 'import pandas as pd'

. Free learning material
. See full copyright and disclaimer notice