Can automation developers benefit from low-level programming?
Automation development is usually regarded as a domain of high-level programming.
To what extent can automation developers benefit from learning some low level programming skills? doesn’t it worth the trouble?
TL;DR
Yes
Desires are infinite
Learning a new skill is awesome.
In my experience working with a broad spectrum of software developers, software developers are extremely intellectually curious.
I would guess they are characterize with high levels of Openness To Experience.
They look forward to experiment with new things, they appreciate intellectual ideas and abstract thinking, they are innovative, creative and tend to enjoy arts.
In the tech fields, high openness is a double edged sword.
On the one edge it allows flexible thinking and ability to change, it drives the individual to explore new horizons and bring up more ideas and technologies to the table.
However, it comes at the cost of overly fantasizing, favoring newer ideas for the sake of being progressive (just because something is new doesn’t mean that it’s better!) and investing too much time in learning the skills that they are left with little time to put these skills into practice.
Our desires are infinite but our resources are finite!
We may want to learn every single technological eco-system on this planet, but our time is limited, every minute we invest in learning one skill is a minute we won’t spend on anything else, so that learning skills is subjected to opportunity costs.
When resources are finite, we have to make a compromise, we have to carefully select a set of skills that are worth our while, knowing that it will come at the cost of other skills.
In this article I’ll try to elaborate on the benefits I believe automation developers can enjoy by investing some of their learning time on learning low-level programming.
It’s all relative
Low level vs high level is on a continuum, for a python developer, C++ would be regarded as low level.
For a C++ developer, C and Cobol would be regarded as low level and for a C developer assembly would be regarded as low level.
Low level and high level deals with the level of abstraction the language provides over the machine code, the lower the language is on the spectrum the more machine-like it is, and the higher the language the more human-like it is.
So what’s the difference?
Comparing the entire spectrum of languages would be extremely difficult.
Instead, I’ll take a look at one high level language and compare it with a low(er) level language.
The high level language I’m going to cover is Python, a multi purpose, multi paradigm generic language.
It is dynamically typed, interpreted and is praised for it’s readable idiomatic syntax.
The lower level language I’m going to cover is C, a procedural language.
It is statically typed, compiled directly to machine code.
Despite the fact it has been used since the 1970s, it is still one of the most popular languages in the world, and it is probably the most influential one.
Example
Lets look at the implementation of a hash table in C and in Python.
Hash tables are extremely useful data structure as they allow quick lookups.
The C implementation for a hash table looks something like this:
In C there’s no built in implementation for hash table, we have to implement that one on our own.
In short, we create an array of 10000 slots, each slot is a linked list.
The hash function takes the key and returns an associated index from 0 to 9999.
The hashtable_set function inserts this entry to the associated linked list using the hash function and hashtable_get uses the hash function to get the associated linked list, and then it searches for the correct key value pair in the list, and then it retrieves it’s value.
As you can see, it is very tedious and prone to error.
In python, the implementation is very easy, using the standard library dict object.
We are down to 2 lines of code instead of 100, and the code is infinitely more readable.
However, C code is more efficient in both execution time and memory consumption.
lets compare the two using the standard linux time command:
Python:
User time (seconds): 0.00
System time (seconds): 0.10
Percent of CPU this job got: 73%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.14
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 4752
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1230
Voluntary context switches: 0
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
C:
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 656
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 190
Voluntary context switches: 0
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Python code took 14 milliseconds to run while c took only 2 milliseconds.
Python maximal memory usage mounted up to 4752 KB while C required only 656 KB.
Pythons extra layers of abstraction is a major impediment to it’s performance.
Having looked at the code, lets name a few of the differences between the two.
Diff 1: Memory management
Python manages memory internally.
The python memory management system is based on reference counting.
Whenever we declare a new variable, the python interpreter allocates the memory required to contain it and sets the reference count to 1.
With each additional variable (or name) that uses the same memory the reference count gets incremented, and with every variable that stops using the memory the reference count is decremented.
When the reference count reaches 0, this is when pythons garbage collector frees the memory for other uses.
In C on the other hand there’s no such thing.
The programmer allocates new memory when needed, and it is their responsibility to free that memory when it’s no longer in use.
Failing to do so leads to a memory leakage.
diff2: Platform dependency
Python is compiled by the local machine into bytecode and then during the execution the bytecode is lazily interpreted into machine code.
This has 2 major consequences.
First, the target machine must have a python interpreter, either installed or delivered.
Second, once the python interpreter is located, the python code can be executed on any platform.
As long as it doesn’t contain any explicit references to operating system, you can write your code once and run it anywhere.
C on the other hand gets compiled directly into machine code, it can only work on the specific operating system it was compiled on.
Not only that, but it is possible that the code it self can only be compiled for a specific family of operating systems.
For example, the fork function that creates a child process is only accessible in a linux operating system, the windows equivalent to this function is spawn.
In python on the other hand, you can create an abstract process using the subprocess module from the standard library.
Diff3: Pointers
In C language, pointers pretty much run the whole show.
Using pointers, you can reference the address of any variable in your program and change values by their reference.
In python there are no explicit pointers to addresses but instead you pass variables by their assignment.
Every time you pass a variable to a function you are increasing the reference count by one and then the reference count is decrease when you leave the scope of the function.
In that case, variables can change by reference if they are mutable objects.
In other high level programming languages such as Java and C#, you pass variables by value and not by reference and in order to declare on pointer you can move out of the “safe” code.
Diff4: Paradigms of programing
C is structured procedural language.
It doesn’t support any classes or association of data types with behavior.
Python on it’s part supports multiple paradigms of programming, including procedural style, but also object oriented style or functional style.
Where does automation programming fits in?
Automation programming does not require performance or memory control to the point where being close to the metal is needed.
It shouldn’t be a surprise that the vast majority of automation developers use high level programming languages.
In addition to the programming language choice, automation developers typically use an automated testing framework such that they have an extra layer of abstraction on top of their programming language.
Automation is more around business flows than machine operations, as such, high level languages makes more sense, this way you can get more work done with less efforts.
Why do I need to learn low level programming?
When you were a Junior automation developer, everything was plain and simple.
There were some easy manual tasks that needed to be automated, and you put your magic fingers to work and implemented the code to automate it.
However, as you progress in your career, you become more heavily involved in many of the organizations endeavors.
At this level, you start thinking about big picture stuff.
At this point, you think about running automation at scale, for many different purposes and across many different teams.
This is the point when you also invest more for the future.
In other words, instead of investing in short sighted gains, you start investing in long term progress, both for the organization you are currently employed at, but also for your self.
Working with embedded development teams
Some products are 100% cloud native, but not all of them.
In many cases, your product has a physical component and you might find your self automating tests for such devices, using protocols to communicate with them.
Understanding the challenges in low level development might help you to provide better automation solutions to your colleagues.
You can write your automation code knowing what kind of problems to anticipate from embedded systems and your automation will be more realistic and concise.
It can help you optimize your automation code!
Specifically for python — you can use C or C++ to write python extension packages and it can boost your code execution performance.
Running automation at scale requires high performance sometimes.
Remember! as you get more involved with development of automation infrastructure, you need to take performance and scaling considerations to a greater acount.
You might actually need it some day!
For many of you it sounds very unlikely, but again, in the more advanced stages of your career you can find your self doing crazy stuff :)
So lets imagine you need to test some insane new protocol for a future product in your company and the only implementation available for it is written in C.
In this case you have to understand the code and even write some additional code on top of it.
Learning another high level language is relatively easy, but learning low level programming is a whole different skill, it is better for you to understand the concepts and have some hands on experience with it.
It can improve your code quality
It’s a Cliché, I know…
But it’s true.
Knowledge of low level programming improves the quality of your high level programming.
- It can help you better understand your basic data structures and algorithms, just like we saw in the hash map example.
- It can help you understand the concepts of pass by reference vs pass by value at first hand, this is important when dealing with mutable objects.
- Better understanding of networking on it’s low level part (TCP\UDP).
- Better understanding of concurrency and multi-threading \ multi processing.
- Low level strings are typically zero terminated while high level strings are immutable objects.
This difference can help you better understand how to use strings in your high level programming language of choice.
Where do we start?
There are plenty of learning materials on the web.
For me, the best way to learn is through experiment, but you can go with your preferred learning strategy.
Here are some of my favorite YouTube sources for low level programming:
- Jacon Sorber — https://www.youtube.com/c/JacobSorber
- Code vault — https://www.youtube.com/channel/UC6qj_bPq6tQ6hLwOBpBQ42Q
- Awesome video by Anthony Sottile — https://www.youtube.com/watch?v=HrEzCI3jIHw
- Free code camp — https://www.youtube.com/watch?v=KJgsSFOSQv0
Conclusion
Most of your learning time should be dedicated to automation and devops specific topics.
The methodologies, technologies and eco-systems.
However, some of your learning time can be invested in learning low level programming, and I believe you can greatly benefit from it in the long run, especially at the more advanced stages of your career as an automation developer.