Static Oneplus 不可控制论

2015/03/10 - by Oneplus • C++associative container

A benchmark on mapping few keys


Recently, I’ve worked on optimizing my transition-based parser and came across such situation:

I need an associated, or key-value, structure to store the cached scored for each transition action at a certain state. In practice, the number of transition actions is relatively small (less than 100), so I became curious about the best associated data structure for small set of keys.

Previous posts on such performance benchmark mainly focus on the comparsion of different hash maps on large scale, like this one. This post differs from theirs by setting the number of keys to a small number. Time consumption on random insertion, ordered insertion, random retrieve and ordered retrieve is evaluated. I didn’t give a try on the delete operation cause there is no such operation in my problems.

Three types of mapping facilities are adopted in this posts. They are:

  • Tree: std::map
  • Ordered List: boost::container::flat_map
  • Hashtable: std::unordered_map, boost::unordered_map, google::sparse_hash_map, google::dense_hash_map

std::map is a several-time loser in previous benchmark because of the O(log n) time complexity. I choose the std::map and hope it can win a round in this special situation. boost::container::flat_map is suggested by this StackOverflow question. It seems the key are ordered and stored in a list in boost::container::flat_map, rather than a tree like the std::map does. Expectingly, this data structure should have some nice memory allocation feature compared with the std::map. The Hashtable group is actually my old friends and I used google::dense_hash_map and std::unordered_map a lot.

Results

All the experiments are conducted on a Xeon(R) CPU E5-2620 @ 2.00GHz server. My GCC is a new-version one, the 4.8.2. The code used in this benchmark can be found at this gist. -O3 optimizing level is configured in compling the code.

The benchmark results are shown below.

Ordered insertion

number of keys

Random insertion

number of keys

Ordered queries

number of keys

Random queries

number of keys

Summary

Generally speaking, the google::dense_hash_map is still the best choice if you are aggressively purchasing the speed performance. Experimental result confirms that the dense_hash_map achieves both the fastest insertion and fastest querying.

Also, we see that on the condition that the number of keys is relatively small, the sorted group (std::map and boost::container::flat_map) performs better than the hash group. One thing we need to note that, after the number of key increase to about 100, the sorted group starts to perform poor compared to the hash group.

By comparing the std::map and boost::container::flat_map, their performances are almost identical. I am a little disappointed by this fact because I always considering the boost::container::flat_map as a silver bullet for the few key mapping problems.

For practice, there are several suggestion:

  • For the mapping problem with less than 100 keys, std::map is still a solution. It’s especially true if ordered keys is in your concern.
  • google::dense_hash_map is still number one choice, if memory and the annoying set_empty_key() are both not the problems.
blog comments powered by Disqus