When we start our journey as programmers it’s normal to get excited by the new possibilities. We get the capacity to do many things that otherwise would be impossible, making projects faster and assuring consistency.
But the problems start when you need to modify a script that you wrote 6 months ago. That’s when you find out that you don’t remember why you were applying some specific filters or calculating a value in a odd way.
As a Reporting Analyst and I am always creating and changing scripts and after applying the tips provided in The Art of Readable Code by Dustin Boswell and Trevor Foucher I could reduce the time needed to apply changes from 5 to 2 days (60% faster).
1 How do you know if your code is readable?
In order for a code to be readable, it needs to:
Have explicit names for variables, functions and function arguments.
Have comments that explain the reasons behind the code. At the end, the reader should know as much as the writer did.
Be understood without reading it twice.
2 Practical Tecniques
Once we know what we want to achieve, it is really useful to know some techniques that might help us for that purpose. In this article, we will use R to make all the our examples and the datasets::mtcars data.frame as it is widely used for simple examples.
2.1 Creating explicit names
2.1.1 Naming variables
Defining good variable names is more important than writing a good comment and we should try to give as much context as possible in the variable name. To make this possible:
Name based on variable value.
Boolean variables can use words like is, has and should avoid negations. For example: is_integer, has_money and should_end.
Looping index can have a name followed by the the suffix i. For example: club_i and table_i.
Add dimensions unit a suffix. For example: price_usd, mass_kg and distance_miles.
Never change the variable’s value in different sections, instead create a new variable making explicit the change in the name. For example, we can have the variable priceand latter we can create the variable price_discount.
To write good variable names might take some iteration and you might need to play devil’s advocate in under to find out a better name than the initial one.
2.1.2 Defining functions
Creating explicit functions names can transform a complex process into a simple one.
Start the function with an explicit verb to avoid misunderstandings.
Word
Alternatives
send
deliver, dispatch, announce, distribute, route
find
search, extract, locate, recover
start
launch, create, begin, open
make
create, set up, build, generate, compose, add, new
The function name must describe its output.
A function should do only one thing, otherwise break the functions in more simpler ones to keep the name explicit.
Use the following words to define range arguments.
Word
Use
min and max
Useful to denominate included limits
first and last
Useful to denominate exclusive limits
begin and end
Useful to denominate either inclusive or exclusive limits
Coding Example
keep_rows_in_percentile_range <-function(DF, var_name, min_prob, max_prob){if(!is.data.frame(DF)) stop("DF should be a data.frame") values <- DF[[var_name]]if(!is.numeric(values)) stop("var_name should be a numeric column of DF") min_value <-quantile(values, na.rm =TRUE, probs = min_prob) max_value <-quantile(values, na.rm =TRUE, probs = max_prob) value_in_range <- values >= min_value & values <= max_valuereturn(DF[value_in_range, ])}
2.2 Commenting correctly
The first step to have a commented project is to have a README file explaining how the code works in a way that to should be enough to present the project to a new team member, but it is also important to add comments to:
Explain how custom functions behave in several situations with minimal examples.
Explain the reasons behind the decisions that have been taken related to coding style and business logic, like method and constant selection.
Make explicit pending problems to solve and the initial idea we have to start the solution.
Avoid commenting bad names, fix them instead.
Summarize coding sections with a description faster to read than the original code.
Coding Example
Let’s comment our custom function to explain each point.
# 1. Behavior# This function can filter the values of any data.frame if the var_name# is numeric no matter if the column has missing values as it will omit them# 2. Reasons behind decisions# As we are not expecting to make inferences imputation is not necessary.keep_rows_in_percentile_range <-function(DF, var_name, min_prob, max_prob){# 5. Reading the code is faster than reading a comment, so we don't need itif(!is.data.frame(DF)) stop("DF should be a data.frame")# 2. Reasons behind decisions# We are going to use this vector many times and # saving it as a variable makes the code much easier to read values <- DF[[var_name]]# 5. Reading the code is faster than reading a comment, so we don't need itif(!is.numeric(values)) stop("var_name should be a numeric column of DF")# 2. Reasons behind decisions# Even though a single quantile call could return both values in a vector# it is much simpler to understand if we save each value in a variable min_value <-quantile(values, na.rm =TRUE, probs = min_prob) max_value <-quantile(values, na.rm =TRUE, probs = max_prob)# 4. The boolean test has an explicit name value_in_range <- values >= min_value & values <= max_valuereturn(DF[value_in_range, ])}
Note
Writing good comments can be challenging, so you better do it in 3 steps:
Write down whatever comment is on your mind
Read the comment and see what needs to be improved
Make the needed improvements
2.3 Code style
It is important to apply a coding style that make easy to scan the code before going into detail to certain parts. Some advice to improve code style are:
Similar code should look similar and be grouped in blocks, it will facilitate finding spelling mistakes and prevent repetitive comments.
We can see how this tips was applied in the keep_rows_in_percentile_range function.
Avoid keeping temporal variables in the global environment .GlobalEnv, instead create a function to make clear the purpose or use pipes (base::|> or magrittr::%>%).