Introduction to Rust
We'll start this guide by a short introduction to Rust as well as showing a few code snippets for how you can start using Rust.
That way, we keep the examples practical and down to earth. When you hover on a code block, you'll see a play button, upon clicking on it, the code will run. More on that, in a following chapter.
What is Rust?
Rust is a programming language which according the official website (January 2023) is "a language empowering everyone to build reliable and efficient software". It was designed by Graydon Hoare and was released a little more than a decade ago ~in 2006, more on Rust's history on the next chapter. The key points are that it's multi purpose programming language that has influences from a wide range of programming languages, notably C++.
It is frequently advertised as a memory safe and performant alternative to C/C++.
The following is Rust code that outputs "Hello World", try and run it π
fn main() {
println!("Hello, World!");
}
How Rust works is that the code you write is first compiled to machine code and then executed. This leverages what's called the Rust compiler and LLVM. It's not important to know what these are yet, but you can look them up if you want. The gist of it being: the code generated by this process is platform independent. Matter of fact, this code could also run in your browser β β‘ β
The following is also Rust code. Can you guess what it does before running it? You can edit this one as well to replace the names of the functions if you like but keep it short, we have work to do.
fn mystery_function(input: &str) -> i32 {
let mysteries = "aeiouAEIOU";
input.chars().filter(|c| mysteries.contains(*c)).count() as i32
}
fn main(){
let input_string = "Hello, world!";
let mystery_count = mystery_function(&input_string);
println!("The number of mysteries in '{}' is {}", input_string, mystery_count);
}
That looks pretty much like Python, right? The only weird looking parts are those &, * and the -> symbols and perhaps the str and i32 too. We'll get to those in time, not to worry.
Why is Rust a good choice for data engineering?
The argument that Rust is a memory safe and performant alternative to C/C++ doesn't matter that much to Data Engineers, since we don't usually deal with C or C++.
What is important is that Rust is a systems programming language which enforces a certain approach to programming that is at the same time efficient and good at eliminating a whole category of errors that usually happen with data workloads.
Here's perhaps a very simplified example that doesn't require dwelling over performance, low-level control or even memory usage.
Let's take the following Python code & run it multiple times:
import random
def count_characters(s: str, c: str) -> int:
return s.count(c)
def get_count():
if(random.random() < 0.5):
return count_characters("Hello world", "o")
else:
return count_characters("Hello world", 23)
result = get_count()
print(result)
If you run the above code multiple times, you'll see that it sometimes fails but sometimes works. Of course, a linter, a battery of tests or even a type checker if properly set up, might show this up as a warning or even an error to the user, but it won't prevent that code to end up running somewhere, especially if it's not caught in due time.
When using complex objects or when the scope of the project increases, these type of errors sneak in and they're discovered only too late. You can't test everything.
This happens more than you think, this simplistic example is meant to convey that preventing these issues to hit production is an after thought and there will never be any guarantees.
Consider the equivalent code in Rust and try to run it:
extern crate rand;
use rand::Rng;
fn count_characters(s: &str, c: char) -> usize {
s.chars().filter(|x| *x == c).count()
}
fn main() {
let mut rng = rand::thread_rng();
if rng.gen_bool(0.5) {
count_characters("Hello world", 'o');
} else {
count_characters("Hello world", 23);
}
}
Notice what the compiler says. Does it make a bit more sense now? The compiler doesn't allow you to build this application, needless to say this won't end up in your production server.
In essence, interface definitions and specifications are enforced at the lowest level and not as an afterthought in a CI/CD pipeline or in some external YAML file serving as documentation. This of course always requires a bit more work and planning upfront but you'll get a few guarantees in exchange, especially the guarantees that matter in the context of working with data. This way of approaching things is not the reason for Rust's performance and scalability but certainly enables it.
We saw through a simple example what makes Rust perfect for Data Engineering, even though it's a very simple one. Over the next chapters we'll discover some more and get them to run.
If you like it so far, consider subscribing for free to get the new chapters.