Dictionary vs. List¶

Dictionary and lists have similarities and also differences that make each appropriate for specific programming tasks. Let's explore this in more detail

Dict - List comparison¶

Similarities
- Both are indexed: list is integer-based index while dict is key-based indexed (key should be of immutable type)
- Both are mutable: changing their items can happen "in place" (in memory they occupy without constructing new object)
- Both are iterable: they possess a __next__ method that returns items one by one (but remember that for a dict this happens without any particular order)
Differences
- List is a sequence - Dict is a collection: however, len() function operates properly for both of them returning their lenght (number of items they contain; in a dictionary an item is a key:value pair)
- List is ordered - Dict is not: a list is an ordered object (can be sorted). A dict is unordered, ordering is meaningless and items may returned in any order (careful when you are using a dict as a iterable)
- List uses integer-based indexing - Dict can use any immutable type as index: this makes dict based code much more readable and understandable by humans; for example, you get more information about what the code does if you read di['client_name'] compared to li[1]
- Dict is mapping - List is not: a dict always constructs some type of mapping between keys and values. A list is always some sequence of single items.

Dict - List use¶

Use a list:
- ...when you need a flexible sequence for all kind of computations and when integer-based indexing does not impose significant restrictions on your data representation
- ...also when ordering (sorting, etc.) of data is important
Use a dict:
- ...when you need a 'key:value' representation of data, which essentialy is a "look-up" table: you look for a value using a key
- ...also if you have to construct a sparse collection of items than a dense one

For example:

Using a sequence of numbers to represent data that will be ordered somehow is a typical task for using a list

import random
alist = [random.randint(1,100) for i in range(10)]
alist.sort()
alist

[7, 15, 24, 37, 41, 46, 63, 83, 91, 95]

Using a mapping sparse collection of data is a typical task for using a dict. Remember also that keys can be extracted and sorted, thus allowing "sorted" reference to dict pairs.

movies = {1975: 'Holy Grail',
          1979: 'Life of Brian',  
          1983: 'The Meaning  of Life'}
print(movies,'\n')                        # movies are printed without any specific order

sorted_years = sorted(movies.keys())      # sorted_years is now a sorted list of keys
print(sorted_years,'\n')

for year in sorted_year:
    print(year, movies[year])

{1979: 'Life of Brian', 1983: 'The Meaning  of Life', 1975: 'Holy Grail'} 

[1975, 1979, 1983] 

1975 Holy Grail
1979 Life of Brian
1983 The Meaning  of Life

What if I need a dictionary that is already ordered?
Then you needan OrderedDict object: a dictionray that 'remembers' the order of items as they were initially given. You need to import the collections module where the OrderedDict 'lives'.
Documentation can be found here

# A simple OrderedDict example
import collections

country = [('Name','France'),('Capital city','Paris'),('Population',50),('Currency','Euro')]
dunord = dict(country)
dord = collections.OrderedDict(country)

for i in dunord:
    print(i, end=' ')
print('\n')
for i in dord:
    print(i, end=' ')

Name Currency Population Capital city 

Name Capital city Population Currency

Dictionary as a hash table¶

List is flexible but...:
- ...although list is a very flexible structure this flexibility requires that several particular list-relevant processing is happenning 'under the hood'. Thus, the price for flexibility is reduced efficiency.
- Reduced efficiency is not a problem when data size is relatively small. But, it really turns to be a restriction when dealing with big data. This is also the reason for introducing a more efficient object, namely 'ndarray' (learn more about ndarrays in numpy section).
Dict is efficient because...:
- ...a dict is constructed as a hash table. In general, a hash table uses a hash function to implement a mapping between input keys and output values. The advantage is that the time required to perform a task in a dict (for example, access a certain item) is constant as compared to the time required in a list which increases as the size of the list increases.
  - Read more on hash table here
- Technically, the above difference is reported using the big-O notation by stating that dict performance is of O(1) (meaning: standard) while list performance is of O(n) (increases as n list size increases)
You can read more on this issue and also see a graphical representation of dict-list efficiency here

Efficiency of the 'in' operator¶

In the following example, we check the efficiency of item look-up operation in the list and dict objects
The notebook magic command %timeit returns the average time required for an operation
See that he code constructs a 1million items list and dict and looks for the item '999999' in both of them
The times measured are (on average)
- ~ 18.5 msec for the list
- ~ 95.5 nsec for the dict (with 1 msec = 1000 nsec)
Thus, this operation in a dict is faster executed by a factor of ~200.

# timeit example 

alist = [i for i in range(1000000)]
adict = {alist[i] : alist[i] for i in alist}

k = 999999

%timeit k in alist 
%timeit k in adict

100 loops, best of 3: 18.7 ms per loop
10000000 loops, best of 3: 95.6 ns per loop

Copyright¶

. Free learning material
. See full copyright and disclaimer notice