As I have previously shared, code quality can be summarized along 3 axes: It is easy to understand, easy to change, and correct.
Today I want to talk about a trait that indicates how easy to understand is a codebase: Signal to noise ratio.
What is signal to noise ratio?
Signal-to-noise ratio (SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise.
https://en.wikipedia.org/wiki/Signal-to-noise_ratio
In software development, this means how much of you code explains your intention/ideas/knowledge vs how much doesn’t.
Why is signal to noise ratio important?
Well, as mentioned before, this is an indicator of how easy to understand is your code. That means, how much time and mental effort is required to understand what the code does and more importantly, why it does it that way. Understanding these 2 facts is a requirement before changing how the code works. There’s no workaround for that.
What is the most influential factor on the signal to noise ratio?
If I were to pick a single attribute on a codebase to change its signal to noise ratio, that would be the abstraction level. You see in my experience poor signal to noise ratio comes from either under abstraction (too much detail) or over abstraction (too many layers of artifacts, too much indirection).
Under abstraction and its effect on the signal to noise ratio
How many times have you been tasked to make a little, tiny change in behavior, only to find yourself with a 200 lines function (… I just had a PTSD episode). The problem with a 200 lines function is that there’s too much detail to easily figure out the what and why.
This detail overload doesn’t happen just at the level of huge functions, but also at the level of language constructs. Take a look:
decimal orderTotal; foreach(var line in orderLines) { orderTotal+= line.Total; }
So as you can see, the idea here is that the order total is the sum of the order lines total. So what code here isn’t relevant to that idea? Think about it for a moment. Done?
decimal orderTotal; foreach(var line in orderLines) { orderTotal+= line.Total; }
Surprise! I bet a lot of you didn’t see that coming! This is because sometimes we get so used to the language that we give those things for granted. I know I did. It took me a lot of effort learning Smalltalk (and banging my head against the wall every time I tried to do something new) to rewire some parts of my brain. But you can’t deny it. Iterating over the lines is just a detail to sum up the lines total. I does not help conveying the main idea. It’s noise. How would you fix that? Actually, there are several ways.
decimal sumLinesTotal(){ decimal linesTotal; foreach(var line in orderLines) { linesTotal+= line.Total; } return linesTotal; } ... decimal orderTotal = sumLinesTotal();
How’s that? Not a big deal right? But, now there’s no doubt about the code intention. I know, some of you may think this is dumb. The code itself wasn’t that complex to start with, why should we create a new function just for this? Well, what do you think would happen to a 200 lines function if you started doing this? Not only for loops but every place where implementation details (the how) appear. I dare you to try it. Now, if you are using C# there are other ways to be explicit about this:
decimal orderTotal = orderLines.Sum(orderLine=>orderLine.Total);
Over abstraction and its effect on the signal to noise ratio
Over abstraction happens when we add unnecessary artifacts to a codebase. This is a prime example of accidental complexity. A very common cause of this is speculative generality: the idea that someday we may need to do something and preparing the code to handle such cases, even when we don’t have the need right now. But there are more common, more subtle cases.
So let’s say we have a report API to which we make requests:
public EmployeeData GetEmployeeData(Guid id); public EmployeeData { Guid Id; ... } public ManagerData GetManagerData(Guid id); public ManagerData { Guid Id; ... }
So our relational mindset tell us that we are duplicating data here (id) and that we should remove that duplication.
public class ReportData { Guid Id; } public EmployeeData GetEmployeeData(Guid id); public EmployeeData: ReportData { ... } public ManagerData GetManagerData(Guid id); public ManagerData: ReportData { ... }
Great! duplication removed! but wait! we can go even further! Isn’t it everything we’re returning just report data? Let’s make that explicit!
public class ReportData { Guid Id; } public ReportData GetEmployeeData(Guid id); public EmployeeData: ReportData { ... } public ReportData GetManagerData(Guid id); public ManagerData: ReportData { ... }
But now the client code need to cast the result to the concrete type. Maybe we can make the ReportData object accommodate different sets of data?
public class ReportData { Guid Id; Dictionary<string, object> Data; } public ReportData GetEmployeeData(Guid id); public ReportData GetManagerData(Guid id);
So now let’s say you are given a ReportData object. How can you know if you are dealing with an employee or a manager’s data? You could query the data dictionary for a particular key that represents a property available only in employee (or manager), or worse, you can introduce a key in the dictionary that says which type of data is contained in it, moving from strongly typed to stringly typed. This is all noise. The signal has been effectively diluted.
Some guidelines to improve your signal to noise ratio
By this point I hope is clear to you that to improve your signal to noise ratio, using the right abstraction level is key. So I’ll share with you some of my observations on the abstraction process.
Step 1: remove noise by encapsulating details away into functions
Encapsulation and abstraction are closely related. I’ll talk about it in another post. Suffice to say that as you are encapsulating details away, you’re also raising the abstraction level. The trick to avoid going overboard is to think about what you want to express: the signal. Is that clear enough? A good rule of thumb is trying to make your functions 5 lines or less.
Step 2: uncover the objects
You will find that some functions act upon the same set of data. Those are objects hidden in the mist. Move, both the data and the functions that act upon it to a class. Naming the class will have an impact on the clarity of your signal, but don’t worry to get it right the first time, you can rename it (and you will) as your understanding increases.
Step 3: wash, rinse and repeat
Repeat the 2 previous steps over and over. If the idea you want to convey is still not clearly expressed by the code go to step 4.
Step 4: select a metaphor
To be discussed on the next post. 🙂
A quick comment on comments
As I began writing I mentioned that you need to understand the what as well as the why of the code. The former can clearly be expressed by the code. If that’s not the case, you haven’t reached the right level of abstraction yet. As for the latter, this is the only situation in which I find comments justifiable. Explain constraints or whatever it is that lead you to chose the current solution.
Closing thoughts
Man, that was longer than I expected! I hope this can give you some hints on what to look for the next time you are on a code review (yours or someone else’s). As always if you have any comments, doubts or whatever, leave them below. Good coding!