A dive into the language reference documentation

For the first year or two when I started coding, I thought learning a language was all about learning the syntax. So, that’s all I did.

Needless to say, I didn’t turn into a great developer. I was stuck. Then, one fine day, it just clicked. I realised I was doing this wrong. Learning the syntax should be the least of my concerns. What matters is everything else about the language. What exactly is all that? Read on.

This article is divided into three main subparts: The Data Model, the Execution model and the Lexical analysis.

This article is more an insight into how things work in Pythonland — in contrast to how to learn Python. You’ll find many how-to learning sources online.

What I didn’t find online was a single source of common ‘gotchas’ in Python. A source explaining how the language works. This attempts to solve that problem. I think I’ve come up short, there’s so much to it!

Everything here comes from the official documentation. I’ve condensed it — to the important points, reordered stuff and added my examples. All links point to the documentation.

Without further ado, here we go.

Data Model

Objects, values and types

Objects are Python’s abstraction for data.

Every object has its unique fixed identity, a fixed type and a value.

‘Fixed’ means the identity and type of an Object can never change.

The value may change. Objects whose value can change are called mutable while objects whose value can’t change are called immutable.

The mutability is determined by type :

  • Numbers, Strings and Tuples are immutable
  • Lists and Dictionaries are mutable

The identity of objects can be compared via the is operator.

id() returns the identity

type() returns the type

Note: The value of an immutable container object that contains a reference to a mutable object can change when the latter’s value is changed. However, the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same as having an unchangeable value.

This note made my head spin the first two times I read it.

Simple translation: Immutability is not the same as unchangeable value. In the example below, the tuple is immutable, while it’s value keeps changing (as the list changes).

Example:

>>> t = ("a", [1]) # a tuple of string and list
>>> id(t)
4372661064
>>> t
('a', [1])
>>> type(t)
<class 'tuple'>
>>> t[1]
[1]
>>> t[1].append(2)
>>> t
('a', [1, 2])
>>> id(t)
4372661064
>>> type(t)
<class 'tuple'>

The tuple is immutable, even though it contains a mutable object, a list.

Compare this to a string, where changing the existing array changes the object (since strings are immutable).

>>> x = "abc"
>>> id(x)
4371437472
>>> x += "d"
>>> x
'abcd'
>>> id(x)
4373053712

Here, the name , x is bound to another object of type string. This changes its id as well.

The original object, being immutable, stays immutable. The binding is explained in further detail below, which should make things clearer.

Built-in types

Python comes with several built-in types:

None

The type is represented by a single object, hence a single value. The sole object with type = NoneType

>>> type(None)
<class 'NoneType'>

Numbers

This is a collection of abstract base classes used to represent numbers. They can’t be instantiated, and int, float inherit from numbers.Number.

They are created by numeric literals and arithmetic operations. The returned objects are immutable, as we have seen. The following list of examples will make this clear:

>>> a = 3 + 4
>>> type(a)
<class 'int'>
>>> isinstance(a, numbers.Number)
True
>>> isinstance(a, numbers.Integral)
True
>>> isinstance(3.14 + 2j, numbers.Real)
False
>>> isinstance(3.14 + 2j, numbers.Complex)
True

Sequences

These represent finite ordered sets indexed by non negative integers. Just like an array from other languages.

len() returns the length of sequences. When length is n, the index set has elements from 0...n-1 . Then the ith element is selected by seq[i-1].

For a sequence l, you can select elements in between indexes using slicing: l[i:j].

There are two types of sequences: mutable and immutable.

  • Immutable sequences include: strings, tuples and bytes.
  • Mutable sequences include: lists and byte arrays

Sets

These represent unordered, finite sets of unique, immutable objects. They can’t be indexed, but can be iterated over. len() still returns the number of items in the set.

There are two types of sets: mutable and immutable.

  • A mutable set is created by set().
  • An immutable set is created by frozenset().

Mappings

Dictionary

These represent finite sets of objects indexed by nearly arbitrary values. Keys can’t be mutable objects. That includes lists, other dictionaries and other objects that are compared by value, and not by object identity.

This means a frozenset can be a dictionary key too!

Modules

A module object is a basic organisational unit in Python. The namespace is implemented as a dictionary. Attribute references are lookups in this dictionary.

For a module m, the dictionary is read-only, accessed by m.__dict__ .

It’s a regular dictionary so you can add keys to it!

Here’s an example, with the Zen of Python:

We are adding our custom function, figure() to the module this.

>>> import this as t
>>> t.__dict__
{'__name__': 'this', '__doc__': None, '__package__': '',
.....
.....
's': "Gur Mra bs Clguba, ol Gvz Crgref\n\nOrnhgvshy vf orggre guna
vqrn.\nAnzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!",
'd': {'A': 'N', 'B': 'O', 'C': 'P', 'D': 'Q', 'E': 'R', 'F': 'S', 
'u': 'h', 'v': 'i', 'w': 'j', 'x': 'k', 'y': 'l', 'z': 'm'},
'c': 97,
'i': 25
}
>>> def figure():
...   print("Can you figure out the Zen of Python?")
... 
>>> t.fig = figure
>>> t.fig()
Can you figure out the Zen of Python?
>>> t.__dict__
{'__name__': 'this', '__doc__': None, '__package__': '',
.....
.....
's': "Gur Mra bs Clguba, ol Gvz Crgref\n\nOrnhgvshy vf orggre guna
vqrn.\nAnzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!",
'd': {'A': 'N', 'B': 'O', 'C': 'P', 'D': 'Q', 'E': 'R', 'F': 'S', 
'u': 'h', 'v': 'i', 'w': 'j', 'x': 'k', 'y': 'l', 'z': 'm'},
'c': 97,
'i': 25
'fig': <function figure at 0x109872620>
}
>>> print("".join([t.d.get(c, c) for c in t.s]))
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Not very useful either, but good to know.

Operator Overloading

Python allows for operator overloading.

Classes have special function names — methods they can implement to use Python’s defined operators. This includes slicing, arithmetic operations and subscripting.

For example, __getitem__() refers to subscripting. Hence, x[i] is equivalent to type(x).__getitem__(x,i).

Hence, to use the operator [] on a class someClass : you need to define __getitem__() in someClass.

>>> class operatorTest(object):
...     vals = [1,2,3,4]
...     def __getitem__(self, i):
...         return self.vals[i]
... 
>>> x = operatorTest()
>>> x[2]
3
>>> x.__getitem__(2)
3
>>> type(x)
<class '__main__.OperatorTest'>
>>> type(x).__getitem__(x,2)
3
>>> OperatorTest.__getitem__(x,2)
3

Confused about why all of them are equivalent? That’s for next part — where we cover class and function definitions.

Likewise, the __str__() function determines the output when the str() method is called on an object of your class.

For comparison operations, the special function names are:

  • object.__lt__(self, other) for < (“less than”)
  • object.__le__(self, other) for (“less than or equal to”)
  • object.__eq__(self, other) for == (“equal to”)
  • object.__ne__(self, other) for != (“not equal to”)
  • object.__gt__(self, other) for > (“greater than”)
  • object.__ge__(self, other) for (“greater than or equal to”)

So for example, x < y is called as x.__lt__(y)

There are also special functions for arithmetic operations, like object.__add__(self, other).

As an example, x+y is called as x.__add__(y)

Another interesting function is __iter__().

You call this method when you need an iterator for a container. It returns a new iterator object that can iterate over all the objects in the container.

For mappings, it should iterate over the keys of the container.

The iterator object itself supports two methods:

  • iterator.__iter__() : Returns the object itself.

This makes iterators and the containers equivalent.

This allows the iterator and containers both to be used in for and in statements.

  • iterator.__next__() : Returns the next item from the container. If there are no further items, raises the StopIteration exception.
class IterableObject(object):    # The iterator object class
     vals = []
     it = 0
     def __init__(self, val):
         self.vals = val
         it = 0
 
     def __iter__(self):
         return self
 
     def __next__(self):
         if self.it < len(self.vals):
             index = self.it
             self.it += 1
             return self.vals[index]
         raise StopIteration
 
 class IterableClass(object):    # The container class
       vals = [1,2,3,4]
 
       def __iter__(self):
         return iterableObject(self.vals)
>>> iter_object_example = IterableObject([1,2,3])
>>> for val in iter_object_example:
...   print(val)
... 
1
2
3
>>> iter_container_example = IterableClass()
>>> for val in iter_container_example:
...  print(val)
... 
1
2
3
4

Cool stuff, right? There’s also a direct equivalent in Javascript.

Context Managers are also implemented via operator overloading.

with open(filename, 'r') as f

open(filename, 'r') is a context manager object which implements

object.__enter__(self) and

object.__exit__(self, exc_type, exc_value, traceback)
All the above three parameters are null when error is None.

class MyContextManager(object):
    def __init__(self, some_stuff):
        self.object_to_manage = some_stuff
    def __enter__(self):
        print("Entering context management")
        return self.object_to_manage # can do some transforms too
    
    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type is None:
            print("Successfully exited")
            # Other stuff to close
>>> with MyContextManager("file") as f:
...     print(f)
... 
Entering context management
file
Successfully exited

This isn’t useful — but gets the point across. Does that make it useful anyway?

vJ7yWCli55L8Sn1i-cnzBNyjXJm8quKKcPRq
Philosoraptor

Execution Model

A block is a piece of code executed as a unit in an execution frame.

Examples of blocks include:

  • Modules, which are top level blocks
  • Function body
  • Class definition
  • But NOT for loops and other control structures

Remember how everything is an object in Python?

Well, you have names bound to these objects. These names are what you think of as variables.

>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

Name binding, or assignment occurs in a block.

Examples of name binding — these are intuitive:

  • Parameters to functions are bound to the names defined in the function
  • Import statements bind name of module
  • Class and function definitions bind the name to class / function objects
  • Context managers: with ... as f : f is the name binding to the ... object

Names bound to a block are local to that block . That means global variables are simply names bound to the module.

Variables used in a block without being defined there are free variables.

Scopes define visibility of a name in a block. The scope of a variable includes the block it is defined in, as well as all blocks contained inside the defining block.

Remember how for loops aren’t blocks? That’s why iteration variables defined in the loop are accessible after the loop, unlike in C++ and JavaScript.

>>> for i in range(5):
...   x = 2*i
...   print(x, i)
... 
0 0
2 1
4 2
6 3
8 4
>>> print(x, i)    # outside the loop! x was defined inside.
8 4

When a name is used in a block, it is resolved using the nearest enclosing scope.

Note: If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound.

For example:

>>> name = "outer_scope"
>>> def foo():
...     name = "inner_function" if name == "outer_scope" \
                else "not_inner_function"
... 
>>> foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in foo
UnboundLocalError: local variable 'name' referenced before assignment

This is a wonderful traceback, which should make sense now.

We have the top level block, the module — in which there’s another block, the function. Every binding inside the function has the function as its top level scope!

Hence, when you’re binding the name name to the object "inner_function" : before the binding you’re checking its value. The rule says you can’t reference it before the binding. Exactly the reason for the UnboundLocalError.

3vBeauEpskRngPsPSWn2ww0KsgA3mNCGGzjl
Not this kind of Execution Model? Source

Lexical Analysis

Python lets you use line joinings. To explicitly continue lines, use a backslash.

Comments aren’t allowed after line joinings.

if a < 10 and b < 10 \ # Comment results in SyntaxError
and c < 10: # Comment okay
    return True
else:
    return False

Implicitly, line joining occurs on its own when elements are inside braces. Comments here are allowed.

month_names = ['Januari', 'Februari', 'Maart',      # These are the
               'April',   'Mei',      'Juni',       # Dutch names
               'Juli',    'Augustus', 'September',  # for the months
               'Oktober', 'November', 'December']   # of the year

Indentation

The number of spaces / tabs in the indentation doesn’t matter, as long as it’s increasing for things that should be indented. The first line shouldn’t be indented.

The four spaces rule is a convention defined by PEP 8: Style Guide. It’s good practice to follow it.

# Compute the list of all permutations of l.
def perm(l):
        # Comment indentation is ignored
    if len(l) <= 1:
                  return [l]
    r = []
    for i in range(len(l)):
             s = l[:i] + l[i+1:]     # Indentation level chosen
             p = perm(s)             # Must be same level as above
             for x in p:
              r.append(l[i:i+1] + x) # One space okay
    return r

There are a few reserved identifiers as well.

  • _ for import: functions / variables starting with _ aren’t imported.
  • __*__ for system defined names, defined by implementation : we’ve seen a few of these. ( __str__(), __iter__(), __add__() )

Python also offers Implicit String Literal concatenation

>>> def name():
...   return "Neil" "Kakkar"
...
>>> name()
'Neil Kakkar'

Format Strings

String formatting is a useful tool in Python.

Strings can have { expr } in the string literal where expr is an expression. The expression evaluation is substituted in place.

Conversions can be specified to convert the result before formatting.

!r calls repr(), !s calls str() and !a calls ascii()

>>> name = "Fred"
>>> f"He said his name is {name!r}."
"He said his name is 'Fred'."
>>> f"He said his name is {repr(name)}."  # repr() is equiv. to !r
"He said his name is 'Fred'."
>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal("12.34567")
>>> f"result: {value:{width}.{precision}}"  # nested fields
'result:      12.35'
# This is same as "{decf:10.4f}".format(decf=float(value))
>>> today = datetime(year=2017, month=1, day=27)
>>> f"{today:%B %d, %Y}"  # using date format specifier
'January 27, 2017'
>>> number = 1024
>>> f"{number:#0x}"  # using hexadecimal format specifier
'0x400'

It’s a cleaner syntax to using str.format()

Summary

With this, we’ve covered the major pillars of Python. The object data model, execution model with its scopes and blocks and some bits on strings. Knowing all this puts you ahead of every developer who only knows the syntax. That’s a higher number than you think.

Other stories in this series:

Enjoyed this? Don’t miss a post again — subscribe to my mailing list!