阿里面试题: 为什么 Map 桶中个数超过 8 才转为红黑树

这是笔者一个好友面试阿里时, 被问及的一个问题, 应该不少人看到这个问题都会一面懵逼. 因为, 大部分的文章都是分析链表是怎么转换成红黑树的, 但是并没有说明为什么当链表长度为 8 的时候才做转换动作. 笔者第一反应也是一样, 只能初略的猜测是因为时间和空间的权衡.

要弄明白这个问题, 我们首先要明白为什么要转换, 这个问题比较简单, 因为 Map 中桶的元素初始化是链表保存的, 其查找性能是 O(n), 而树结构能将查找性能提升到 O(log(n)). 当链表长度很小的时候, 即使遍历, 速度也非常快, 但是当链表长度不断变长, 肯定会对查询性能有一定的影响, 所以才需要转成树. 至于为什么阈值是 8, 我想, 去源码中找寻答案应该是最可靠的途径.

8 这个阈值定义在 HashMap 中, 如下所示, 这段注释只说明了 8 是 bin(bin 就是 bucket, 即 HashMap 中 hashCode 值一样的元素保存的地方) 从链表转成树的阈值, 但是并没有说明为什么是 8:

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

我们继续往下看, 在 HashMap 中有一段 Implementation notes , 笔者摘录了几段重要的描述, 第一段如下所示, 大概含义是当 bin 变得很大的时候, 就会被转换成 TreeNodes 中的 bin, 其结构和 TreeMap 相似, 也就是红黑树:

This map usually acts as a binned (bucketed) hash table, but
when bins get too large, they are transformed into bins of TreeNodes,
each structured similarly to those in java.util.TreeMap

继续往下看, TreeNodes 占用空间是普通 Nodes 的两倍 , 所以只有当 bin 包含足够多的节点时才会转成 TreeNodes, 而是否足够多就是由 TREEIFY_THRESHOLD 的值决定的. 当 bin 中节点数变少时, 又会转成普通的 bin. 并且我们查看源码的时候发现, 链表长度达到 8 就转成红黑树, 当长度降到 6 就转成普通 bin.

这样就解析了为什么不是一开始就将其转换为 TreeNodes, 而是需要一定节点数才转为 TreeNodes, 说白了就是 trade-off, 空间和时间的权衡:

Because TreeNodes are about twice the size of regular nodes, we
use them only when bins contain enough nodes to warrant use
(see TREEIFY_THRESHOLD). And when they become too small (due to
removal or resizing) they are converted back to plain bins.  In
usages with well-distributed user hashCodes, tree bins are
rarely used.  Ideally, under random hashCodes, the frequency of
nodes in bins follows a Poisson distribution
(http://en.wikipedia.org/wiki/Poisson_distribution) with a
parameter of about 0.5 on average for the default resizing
threshold of 0.75, although with a large variance because of
resizing granularity. Ignoring variance, the expected
occurrences of list size k are (exp(-0.5)*pow(0.5, k)/factorial(k)).
The first values are:
0:    0.60653066
1:    0.30326533
2:    0.07581633
3:    0.01263606
4:    0.00157952
5:    0.00015795
6:    0.00001316
7:    0.00000094
8:    0.00000006
more: Less than 1 in ten million

这段内容还说到: 当 hashCode 离散性很好的时候, 树型 bin 用到的概率非常小, 因为数据均匀分布在每个 bin 中, 几乎不会有 bin 中链表长度会达到阈值. 但是在随机 hashCode 下, 离散性可能会变差, 然而 JDK 又不能阻止用户实现这种不好的 hash 算法, 因此就可能导致不均匀的数据分布. 不过理想情况下随机 hashCode 算法下所有 bin 中节点的分布频率会遵循泊松分布 , 我们可以看到, 一个 bin 中链表长度达到 8 个元素的概率为 0.00000006, 几乎是不可能事件. 所以, 之所以选择 8, 不是拍拍屁股决定的, 而是根据概率统计决定的. 由此可见, 发展 30 年的 Java 每一项改动和优化都是非常严谨和科学的.

画外音

笔者通过搜索引擎搜索这个问题, 发现很多下面这个答案 (猜测也是相互转发):

红黑树的平均查找长度是 log(n), 如果长度为 8, 平均查找长度为 log(8)=3, 链表的平均查找长度为 n/2, 当长度为 8 时, 平均查找长度为 8/2=4, 这才有转换成树的必要; 链表长度如果是小于等于 6,6/2=3, 而 log(6)=2.6, 虽然速度也很快的, 但是转化为树结构和生成树的时间并不会太短.

笔者认为这个答案不够严谨: 3 相比 4 有转换的必要, 而 2.6 相比 3 就没有转换的必要? 起码我不敢苟同这个观点.

看完给笔者点击下面的 "广告" 就是对原创最大的鼓励

点鸭点鸭

↓↓↓↓

来源: http://www.tuicool.com/articles/BVZbiyn

与本文相关文章

暂无,快来抢沙发吧！