Python Tips

— Alex Reinhart

Why does my script run when I try to test it? #

Sometimes you try to run your unit tests and discover that when they import your script to test it, your script runs – possibly trying to read user input, open files, access a database, or other things you don’t want. When you import a script to run tests, you just want to import its functions, without running the rest of the code.

Here’s an example. Imagine you’ve written anagrams.py:

import sys

def find_anagrams(words):
    # find all the anagrams.
    # code goes here

    return anagrams

filename = sys.argv[1]

with open(filename, "r") as f:
    print(find_anagrams(f.readlines()))

And then in test_anagrams.py you want to test the find_anagrams function:

import pytest

import anagrams

def test_find_anagrams():
    words = ["iceman", "cinema", "ducks"]

    assert anagrams.find_anagrams(words) == {{"iceman", "cinema"}, {"ducks"}}

Or something like that – you might have written your code differently. But then you try to run pytest -v test_anagrams.py and get a weird error message about being unable to open the file -v. What happened?

When you import a file, all the code inside it runs.

So when you do import anagrams, it tries to read sys.argv[1], open the file, and print out anagrams.

That’s frustrating when you’re trying to write tests. Fortunately, there is a standard way to avoid this:

import sys

def find_anagrams(words):
    # find all the anagrams.
    # code goes here

    return anagrams

if __name__ == "__main__":
    filename = sys.argv[1]

    with open(filename, "r") as f:
        print(find_anagrams(f.readlines()))

__name__ is a special variable in Python, and it is equal to "__main__" only when this module has been run at the command line – not when it has been imported by another module. Put any code that should only run in a command-line script, and not when imported for testing, inside an if block, as shown above.

What is UnicodeDecodeError and why can’t the codec decode a byte? #

In code where you read in a text file or other input, you might get an error message like this:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2402: ordinal not in range(128)

You may observe that this error occurs when the input contains characters outside the normal Latin alphabet, such as accented characters (ëçý…), symbols (like ÷ or ¶), or characters from other scripts (like 日本語 or 简体).

The problem is that Python needs to know the encoding of text it reads in: it needs to know how to convert binary bytes into specific characters. There are many different character encodings. In the above message, Python has defaulted to using ASCII, an old American encoding that does not support many characters outside the normal Latin alphabet, so when it encountered bytes that have no meaning in ASCII, it got confused. On Windows, Python often defaults to CP-1252 instead, which also supports a limited range of characters.

Python functions that open files, such as open(), take an optional argument to specify the encoding. The most common encoding that supports nearly every language and script – and the encoding we try to use for all the files we provide you – is UTF-8. We hence recommend you set the encoding to 'utf-8' when opening files, unless you know the file is in a different encoding:

with open("foo.txt", "r", encoding="utf-8"):
    lines = f.readlines()
    ## and so on...

If you’re not sure of the encoding of a file, on Macs and Linux you can use the file command to find out:

$ file foo.txt
foo.txt: ASCII text