Sunday, December 16, 2018

YCSB ( Yahoo's .. Benchmarking )

Abstract :

 Introduction :

I was working with YCSB for a week or so. I wanted to benchmark redis performance for READ adn UPDATE. By default YCSB implements READ, UPDATE using redis commands hgetall and hset. As the YCSB workload does not already do any pipelined Read and update operations. I had tweak it the redis client bindings to YCSB to achieve that.  And coming from a kernel background i was not familiar with the building such java based benchmarks.
I could easily get started by downloading the build snapshot as instructed in the README. So to start with the code i had to clone it form the github repo and started looking at the code. I could find the build command quite easily in the README. But it was not going to be so easily.

It uses maven to build the whole project, so i figured i could use an option -pl to build only redis-bindings instead of building for all the database bindings.
In one of my machines it would not build throwing some exception related java heap. So i moved to another machine.

As started tinkering with the code, what i cared the least was the indentation, but some how the maven build would throw error for indentation and spacing and wouldn't  build. I could not figure out a way to avoid that and had to comply. One of the other things was the indentation was quite different from what i see in linux kernel. It expected to 2 spaces for every next level of indentation, unlike a 4 or 8 spaced tab.

YCSB abstracts the DB client side code quite nicely, and provides hooks like abstact methods like read() and update() which one can implement on the client side.

I had to just go and change some code here to get my things done. YCSB uses jedis, ( redis Java APIs)  to implemetn redis client. I needed to know the basic redis commands and then how to send those commands using the api. Then i needed to figure out how to pipeline it ( and in the meantime i got to know that it has a feature called transaction ).

Just a brief about it , Redis is in-memory data base and commands directly change the contents on the server. Redis's pipelining feature does nothing but makes the command execution asyncronous or non-blocking. You could send n number of commands and collect the response at a later point of time, hence not waiting for the pervious one to finish.

Redis uses Pipeline class to do that. This is a nice tutorial to get stated with. Generally the tutorials introduce both transation and pipelining togethere which becomes confusing to understand which api is meant for what. But this blog helps bring clarity. Basically Transaction makes sure that a set of commands are executed  atomicly and nothing to do wtih pipelining. But beacuse they are syntacically same, it becomes  confusing.

With all the required changes i could sucessfully build it but , as i had to run on a different machine just copied the whole build set up. Which was a mistake. It would give me errors saying that it could not find some classes in the classpath. I figured out , that it creates the snapshot of the build in a tar.gz and a jar. By simply extracting this tar i could use the binaries, without any errors.

It was quite some learning. I enjoyed it.