How to use style guidelines to improve your Python code

Picture by Kevin Ku, Unsplash.

14 minutes to read

The flexibility of programming languages like Python means that any code you write to tackle a given problem will differ in approach and style to code written by someone else. While not a major issue when you’re the sole developer, this can cause problems when working in a team.

“A foolish consistency is the hobgoblin of little minds”

Code layout, naming conventions, comments, and a whole range of other differences make non-standard code harder to comprehend, harder to debug, and slower to review. Therefore, most data science teams adopt some kind of code style guidelines to ensure that everyone produces code that shares some similarities, making these processes easier for everyone.

PEP 8 Style Guide for Python Code

Probably the most widely used Python style guide is PEP 8. PEP is essentially a design document designing various things for Python’s core developers, steering council, and other decision makers. The official Style Guide for Python Code is, therefore, just one of many things in PEP 8.

I’ll cover the basics of the PEP 8 style guidelines here, but I’d recommend bookmarking the full guide to keep on top of all the little intricacies. It’s also worth configuring your IDE to check your code meets the PEP 8 standards, as this can reduce rejections during code review.

png Picture by Joshua Aragon, Unsplash.

1. Imports

Seeing as imports should go at the top of every Python script, we’ll start with these. To make imports easier to read and understand, the guideline is to group them in the following order: standard library imports, related third party imports, local library or specific imports.

In addition, each package or module you import should go on a separate line, unless the modules are from the same package. These should always go at the top of the file, not at the point at which they’re used within the code, though this is common in Jupyter notebooks.

# Standard library imports
import time
import sys

# Third party imports
import pandas as pd
import numpy as np

# Local or specific imports
import local_package
from other_package import customers, products, prices

2. Naming conventions

One of my most talented former colleagues had a particular issue with poorly chosen names for variables, functions, and other things. This was for very good reason. When you’re reading someone else’s code during the code review process, or if you’re later trying to debug an issue, the work is a lot easier when the formatting of the name tells you what you’re looking at (function, method, package, or class), and what the code actually does.

While the naming conventions used for the different types of code are easy to follow, getting engineers to select a name which clearly describes what the code does can be harder. Whether that’s because they don’t understand the code, because they refactored their code and forgot to change the names, because they’re not careful with their work, because their code is not very DRY, or because they just didn’t realise, is unclear.

However, unclear naming introduces unintentional code obfuscation and means others have to work harder to decipher what has been written. It’s often better to quickly interrupt a colleague and ask them whether the variable name you’ve selected is clear, than to have it questioned during code review. For example, if you came across something called binarise_yes_no(), you’d likely know its purpose without the need to read the function.

df = binarise_yes_no(df)

def binarise_yes_no(df):
    """Loops through a Pandas dataframe and binarises all Yes or No values to 1 or 0. 
    
    Args:
        df (object): Pandas dataframe containing columns with Yes or No values
    
    Returns: 
        df (object): Pandas dataframe with Yes or No values converted to 1 or 0
        
    Usage: 
        df = binarise_yes_no(df)
    """
    
    for col in df.select_dtypes(include=['object']).columns:
        df[col] = df[col].replace(('Yes', 'No'), (1, 0))
        
    return df

There are a number of standard naming conventions we’re supposed to use for naming functions, methods, modules, packages, variables, classes, and constants. The idea is that you should be able to see what type of code feature you’re looking at without seeing the code, as this automatically tells you how it works without needing to check.

Type	Guideline	Example
Function	One or more lowercase words separated by underscores.	`delete(), save_file()`
Method	One or more lowercase words separated by underscores.	`deduplicate, run_model`
Module	One or more short lowercase words separated by underscores.	`products, some_module`
Package	One or more short lowercase words without underscores.	`customers, productutils`
Variable	One or more lowercase letter or words separated by underscores.	`x, rfm_score`
Class	One or more words with a leading capital and no underscores.	`Inventory, CustomerSegmentation`
Constant	One or more short uppercase words separated by underscores.	`ID, API_KEY`

3. Code indentation and whitespace

Code indentation matters in Python. In fact, incorrectly indented Python can fail to run. However, the choice of whether the indent with tabs or spaces has historically been a subject of hot debate. This is nowhere better illustrated than in the below clip from Silicon Valley, in which Richard Hendricks dumps his girlfriend for favouring spaces over tabs.

While Richard’s girlfriend does annoyingly tap the space bar to indent her code, instead of using an IDE which configures the tab key to insert spaces, he did perhaps go a bit too far in dumping her, particularly since she was actually following the PEP 8 style guidelines.

These state that four spaces should be used per indent level. However, Python 3 doesn’t allow you to mix tabs and spaces, so if you’re working on legacy code you’ll need to continue using tabs, or change the tabs to spaces throughout (but beware that this will make your code diffs a nightmare to decipher). If you’re only reformatting, put your changes in a separate commit.

Code alignment

When adding code including assignments (via the equals Python operator), you should just use a single space to separate the code, and not line up the equals signs, so they can be vertically scanned.

# Correct:
x = 1
y = 2
long_variable = 3

# Incorrect
x             = 1
y             = 2
long_variable = 3

For functions, it often helps to indent the arguments to make them easier to scan without the need to scroll. This is called vertical alignment. You can do it with one or more arguments, but personally I find it far easier to read with a single argument per line.

# Correct
foo = long_function_name(var_one, var_two,
                         var_three, var_four)

# Correct
foo = long_function_name(var_one, 
                         var_two,
                         var_three, 
                         var_four)

# Incorrect
foo = long_function_name(var_one, var_two,
    var_three, var_four)

Positioning operators

For many years, the convention was to put a break after mathematical operators. However, the new approach is to put them before value, as this is thought to aid readability.

total = (organic
         + paid
         + social
         + direct)

Code whitespace

Another common area for inconsistencies is in the use of internal whitespace. To be fair, the guidelines for this are a bit more long-winded, so you can see why these arise. In most cases, the whitespace goes after the comma or similar operator only.

# Correct
some_function(stuff[0], {things:5})

# Incorrect
some_function( stuff[0], { things:5 } )

# Correct
if x == 4: print x, y; x, y = y, x

# Incorrect
if x == 4 : print x , y ; x , y = y , x

Things are a bit different when slices are involved, because the colon is used like a binary operator. Here are the examples from the PEP guidelines.

# Correct
ham[1:9], ham[1:9:3], ham[:9:3], ham[1::3], ham[1:9:]
ham[lower:upper], ham[lower:upper:], ham[lower::step]
ham[lower+offset : upper+offset]
ham[: upper_fn(x) : step_fn(x)], ham[:: step_fn(x)]
ham[lower + offset : upper + offset]

# Incorrect
ham[lower + offset:upper + offset]
ham[1: 9], ham[1 :9], ham[1:9 :3]
ham[lower : : upper]
ham[ : upper]

Code vertical whitespace

Vertical whitespace should be added to your Python code using blank lines. These help separate things and make your code easier to read and scan quickly. Top-level functions and Class definitions should be separated by two blank lines. Extra blank lines should be used sparingly to indicate logical sections within the code.

def function_one():
    pass


def function_two():
    pass

4. Line lengths

Even though our monitors are getting wider and screen resolutions are increasing, the convention is to limit line lengths to 79 characters (or 72 for comments). Longer lines are harder to read and two windows of code can’t be easily compared on a standard screen. Most IDEs, such as PyCharm are configured to this line length, but you can usually adjust this parameter and add a marker line to your code editor to let you see when lines are too long.

Longer lines can be handled using brackets, braces, and parentheses, which let Python know that the content is grouped. Alternatively, longer lines can be broken up using backslashes for line continuation.

with open('/path/to/some/file/you/want/to/read') as file_1, \
     open('/path/to/some/file/being/written', 'w') as file_2:
    file_2.write(file_1.read())

5. String quotes

Python handles single-quoted strings and double-quoted strings in the same way, so there’s no guideline on which you should use. One of my worst habits (and one I see in most code) is inadvertently mixing single-quotes and double-quotes within a script. You should pick one and stick with it.

# Correct
mappings = remap_values(h_sorted, "index", "h")

# Also correct
mappings = remap_values(h_sorted, 'index', 'h')

6. Code comments

A lot of developers say that you shouldn’t need comments if you’ve written your code properly, as it should be self-explanatory. However, while that’s partly true, I suspect I am not alone in having written code months or years ago that I considered self-explanatory at the time, only now for it to appear to have been written while I was under the influence of hallucinogenic drugs.

Personally, I like well-commented code, especially in Python. Comment blocks or docstrings can be picked up by IDEs and printed using print(function_name.__doc__ to give you the full guidance on using the feature, without the need to check the original source code and try to decipher it yourself. Docstrings are covered separately in PEP 257.