Features of Rust
Now let's have a look at Rust's features, without going too much into details. Just a light overview of what this is all about and how it might be helfpul for us data engineers.
Statically-typed and compiled
Rust is a statically typed language. To simplify, this means that the type of variables and values must be known or at least specified at compile time. This mostly allows for better error checking. It can also be helfpul in eliminating the guess work when using code you didn't write yourself. I had many times the case where Python failed at runtime when I specified a variable with an incompatible type. Not that I didn't check beforehand, I just couldn't check all possible variants. In Rust, the compiler does that for me.
Rust is in fact compiled, it just means the code is transformed into machine code (LLVM) before actually running. It can have impact on the efficiency and execution time of the programs but that always depends on the implementation.
Being statically-typed and compiled, you can get more guarantees as to the performance (it's predictable) of your program and a reduced likelihood of runtime errors.
In short, Rust is statically-typed and compiled which helps us avoid shipping code that might have avoidable mistakes, had we run enough tests beforehand. These guarantees are built into the Rust toolchain.
By also being typed, it slightly increases the verbosity but doesn't do so unnecessarily like other programming languages. There is for example no need for defining a
class
nor writing public static void
to confuse yours truly.It's a give and take.
Safety features (borrowing and ownership)
Rust comes with a series of safety features that enable it to be a great tool for efficiently working with data. Borrowing and ownership refer to models for safe and efficient memory management.
Ownership means that any given value has one owner that is responsible for it's lifetime. In other words, this owner is responsible for allocating and deallocating the memory occupied by the value past it's lifetime (ie: when it's no longer needed). This is very helpful in avoiding things like memory leaks and other unpleasant SRE nightmare fuel.
Borrowing on the other hand, means that it is possible to pass around references of the values without changing/transferring ownership. Think of this "reference to a value" like an alias or a symlink to the contents of the value. Different parts of the program can use the value without requiring a copy step. This borrowing can be done in a mutable way (we can change the value) or in an immutable way (we can't change the value). Only one mutable borrow of a value can be active at a time.
Now the natural question might be how borrowing affects ownership, especially in the mutable borrow. In this case, the ownership of the value is temporarily given up (aka borrowed).
Let's take a handheld console as an example. You borrow it from your friend Louis. Meanwhile, it is not possible for Louis (or any other friend) to use that console even though he still has ownership over that console. The handheld console has only 1 game in it and you manage to beat your friend's high score. Now you give the console back to Louis and disappoint him by pulverising his high score.
Rust kinda works the same. Let's see through an example. Just like previously, you can run the code. Can you guess the output before running it?
fn main() {
// Louis owns the console
let mut console_high_score = 8999;
{ // I borrow the console for this block
let y = &mut console_high_score;
// I'm ruining the high score here by performing an action (mutation)
*y += 2;
} // At the end of this block, I return it to Louis
println!("console_high_score is now {}", console_high_score);
}
For every line of code or object in your Rust code, you'll get used to reason about these two features to not end up scratching your head but a bit of planning upfront will save a lot of headache afterwards. This won't be easy at first but the compiler will help as you'll see in the next chapters.
Don't worry too much about the & or * characters for now.
Concurrency support
The ownership/borrowing model explained above helps preventing data races by enforcing strict rules over shared resources. In the case of the handheld console, another friend cannot borrow the console from Louis while it's in my hands (unlike some stocks and derivatives, for the financially inclined).
This is especially helpful when dealing concurrent programs. To simplify to the extreme, concurrency is when two or more overlapping tasks are solved by a single processor, for example: Serializing, validating and storing a piece of data can be run on one single processor.
It is usually necessary to share data between concurrent tasks and there are many ways Rust helps with that, other than just ownership and borrowing. There are things like threads, futures, streams and more but it might be a bit too soon to explore.
The point is, Rust has built-in support for safe concurrency without the mess and this is very good news for data intensive tasks.
Let's checkout the advantages of Rust for Data engineering: