lunes, 23 de febrero de 2015

Day 1: Porting rollingsum

Camlistore is an awesome project written in golang, a language backed by Google. The only bad thing is the documentation, which is ... well ... invisible. So I started to look into the code to understand how it works. Roughly, at the same time I started to play with Rust, a new language backed by Mozilla. And I decided to learn Rust by porting a (very small, really tiny) part of Camlistore. Look at the README file in the GitHub repo for more details. This is not Rust tutorial. It is just my experience with this port which hopefully help others.

Here we go ...
The first thing I decided to migrate was rollsum.go which implements rolling checksums. This is then used to split files into chunks. After creating a project with cargo as described in the Rust Book, I copied and pasted the go code and started doing some simple changes. Briefly:
  • Change function names to snake case
  • Add pub to those functions starting with capital letters
  • Rewriting the signatures using self
After these simple changes, I had to work on the types. Most of the changes were just syntax (uint32 in go became u32 in rust, no big deal). For the constants, I had too look in more depth. Constants in Go are a different kind of beast as they are untyped (Read about it, it is really interesting). I chose usize for all constants related to array position and u32 for BLOB_BITS and BLOB_SIZE (I am almost certain that I should have chosen usize, what do you think?) The rest was really straight forward. I had to look up that the bitwise complement which in go is written as ^x in rust becomes !x. Again, no big deal.

Good roadblocks

You may say that roadblocks are never good. But I think that if you can learn from them they are useful. While porting this particular line in go:

rs.add(rs.window[rs.wofs], ch)

I wrote this in rust

self.add(self.window[self.wofs], ch); 

and I got the following error when I tried to compile:

src/rollsum.rs:36:16: 36:38 error: cannot use `self.window[..]` because it was mutably borrowed
src/rollsum.rs:36       self.add(self.window[self.wofs], ch);
                                 ^~~~~~~~~~~~~~~~~~~~~~
src/rollsum.rs:36:7: 36:11 note: borrow of `*self` occurs here
src/rollsum.rs:36       self.add(self.window[self.wofs], ch);
                        ^~~~
src/rollsum.rs:36:28: 36:37 error: cannot use `self.wofs` because it was mutably borrowed
src/rollsum.rs:36       self.add(self.window[self.wofs], ch);
                                             ^~~~~~~~~
src/rollsum.rs:36:7: 36:11 note: borrow of `*self` occurs here
src/rollsum.rs:36       self.add(self.window[self.wofs], ch);

This helped me understand a little bit more about a very important concept in Rust: borrowing and lifetime. Doing self.window[self.wofs] does not make a copy so you cannot give to function something that is not yours. I ended up doing the following:

let b = self.window[self.wofs];
self.add(b, ch);

Which I am not sure if it is the best and more idiomatic way but it works.

Bad roadblocks

Rust syntax has changed a lot (really a lot) in the last months. So the usual answer to all questions ("let me google that") does not work easily. Most of the time you end up with some example using an old syntax. For example, I was looking for the keyword to define a constant in rust and I ended up using the word static (see here) just to see the compiler complain. I was actually looking for constant (see here). I expect this to get better over time once that Rust 1.0 is out. 

(By the way, the same was true was for usize but I knew about this already as I was following the discussion quite closely. In case you care, I am actually a big fan of having a separate type for this)

Useful help on my way

In addition to the Rust docs, looking at a real project was really helpful. In particular, I took from rust-crypto how fill an array or vector with random bytes and other useful things.

Final notes

While this blog is not about comparing Rust and Go, I am sure that people will ask. But please remember that I am new to Rust, so It is very likely not idiomatic, not safe, not fast, not good and not fun. I will be happy to accept contributions.

I took the benchmark in rollsum_test.go and I simplified to this code. And I implemented the same in Rust. Running Go was 2.7x slower in this simple benchmark. However, counting the splits in a 12 Mb file was 30x slower in Rust (see examples in the repo and this for the go file). I guess the Rust file IO is still too young to compete.

UPDATE 2015-02-26: Following a suggestion in the comments, I tried again this small benchmark but compiling with optimizations enabled. Now Rust is just 1.7 times slower. Thanks Manish for the hint! (By the way, I like more when optimizations are enabled by default and you need to turn them off)


Regarding the code, Rust is a little bit more verbose. In particular, Go accepts add(drop, add uint8) as uint8 for both arguments. Also initialising all values in a struct is not needed if you want the default value for the type. Might be more convenient, but Rust makes thing clear. No big deal for me. What I keep forgetting is the semicolon. And in Rust, the lack of semicolon actually has a meaning. I would need to get use to it.

Rust Version:
 rustc 1.0.0-nightly (b9ba643b7 2015-02-13 21:15:39 +0000)

Running benchmark: cargo bench

Code at the moment of this blog post: 

https://github.com/trygenericdev/learncamlirust/tree/Day01

Go Version: go1.4.1 darwin/amd64

Running benchmark: go test -bench .