当前位置：

首页
/
IT
/
程序
/
Map 综述（一）：理解HashMap终极篇

Map 综述（一）：理解HashMap终极篇

本文原创作者：
作者博客地址：

一. HashMap 概述

Map 是 Key-Value 对映射的抽象接口，该映射不包括重复的键，即一个键对应一个值。HashMap 是 Java Collection Framework 的重要成员，也是 Map 族 (如下图所示) 中我们最为常用的一种。简单地说， HashMap 是基于哈希表的 Map 接口的实现，以 key-value 的形式存在，即 Entry 对象。在 HashMap 中，其会根据 hash 算法来计算 key-value 的存储位置并进行快速存取。特别地，HashMap 最多只允许一条 Entry 的键为 Null(多条会覆盖)，但允许多条 Entry 的值为 Null。此外，HashMap 是 Map 的一个非同步的实现。

同样地，HashSet 也是 Java Collection Framework 的重要成员，是 Set 接口的常用实现类，但其与 HashMap 有很多相似之处。对于 HashSet 而言，其采用 Hash 算法决定元素在 Set 中的存储位置，这样可以保证元素的快速存取；对于 HashMap 而言，其将 key-value 当成一个整体 (Entry 对象) 来处理，其也采用同样的 Hash 算法去决定 key-value 的存储位置从而保证键值对的快速存取。虽然 HashMap 和 HashSet 实现的接口规范不同，但是它们底层的 Hash 存储机制完全相同。实际上，HashSet 本身就是在 HashMap 的基础上实现的。因此，通过对 HashMap 的数据结构、实现原理、源码实现三个方面了解，我们不但可以进一步掌握其底层的 Hash 存储机制，也有助于对 HashSet 的了解。

必须指出的是，虽然容器号称存储的是 Java 对象，但实际上并不会真正将 Java 对象放入容器中，只是在容器中保留这些对象的引用。也就是说，Java 容器实际上包含的是引用变量，而这些引用变量指向了我们要实际保存的 Java 对象。

二. HashMap 在 JDK 中的定义

HashMap 实现了 Map 接口，并继承 AbstractMap。其中 Map 接口定义了键值隐射规则。和 AbstractCollection 抽象类在 Collection 族的作用类似， AbstractMap 抽象类提供了 Map 接口的骨干实现，以最大限度地减少实现 Map 接口所需的工作。

public class HashMap < K,
V > extends AbstractMap < K,
V > implements Map < K,
V > ,
Cloneable,
Serializable {...
}

三. HashMap 的构造函数

HashMap 一共提供了四个构造函数，其中默认无参的构造函数和参数为指定容器的构造函数为 Java Collection Framework 的规范实现，其余两个构造函数则是 HashMap 特别提供的。

1、HashMap()

该构造函数意在构造一个具有默认初始容量 (16) 和默认加载因子 (0.75) 的空 HashMap，是 Java Collection Framework 规范推荐提供的，其源码如下：

 /**
     * Constructs an empty HashMap with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        //负载因子:用于衡量的是一个散列表的空间的使用程度
        this.loadFactor = DEFAULT_LOAD_FACTOR; 
        //HashMap进行扩容的阈值，它的值等于 HashMap 的容量乘以负载因子
        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        // HashMap的底层实现仍是数组，只是数组的每一项都是一条链
        table = new Entry[DEFAULT_INITIAL_CAPACITY];
        init();
    }

2、HashMap(int initialCapacity, float loadFactor)

该构造函数意在构造一个指定初始容量和负载因子的空 HashMap，其源码如下：

    /**
     * Constructs an empty HashMap with the specified initial capacity and load factor.
     */
    public HashMap(int initialCapacity, float loadFactor) {
        //初始容量不能小于 0
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
        //初始容量不能超过 2^30
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        //负载引子不能小于 0            
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        // HashMap 的容量必须是 2 的幂次方，超过 initialCapacity 的最小 2^n 
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;   
        //负载因子
        this.loadFactor = loadFactor;
        //设置HashMap的容量极限，当HashMap的容量达到该极限时就会进行扩容操作
        threshold = (int)(capacity * loadFactor);
        // HashMap的底层实现仍是数组，只是数组的每一项都是一条链
        table = new Entry[capacity];
        init();
    }

3、HashMap(int initialCapacity)

该构造函数意在构造一个指定初始容量和默认负载因子 (0.75) 的空 HashMap，其源码如下：

 
    // Constructs an empty HashMap with the specified initial capacity and the default load factor (0.75)
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);  // 直接调用上述构造函数
    }

4、HashMap(Map<? extends K, ? extends V> m)

该构造函数意在构造一个与指定 HashMap 具有相同映射的 HashMap，其初始容量不小于 16 (具体依赖于指定 Map 的大小)，负载因子是 0.75，是 Java Collection Framework 规范推荐提供的，其源码如下：

    // Constructs a new HashMap with the same mappings as the specified Map. 
    // The HashMap is created with default load factor (0.75) and an initial capacity
    // sufficient to hold the mappings in the specified Map.
    public HashMap(Map m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                      DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
        putAllForCreate(m);
    }

在这里，我们提到了两个参数：初始容量和加载因子。这两个参数是影响 HashMap 性能的重要参数。其中，容量表示哈希表中桶的数量 (table 数组的大小)，初始容量是创建哈希表时的容量；负载因子是哈希表在其容量自动增加之前可以达到多满的一种尺度，它衡量的是一个散列表的空间的使用程度，负载因子越大表示散列表的装填程度越高，反之愈小。

对于使用链表法的散列表来说，查找一个元素的平均时间是 O(1+a)，a 指定是链的长度，是一个常数。因此，负载因子越大，那么对空间的利用更充分，但查找效率的也就越低；负载因子越小，那么散列表的数据将越稀疏，对空间造成的浪费也就越严重。系统默认负载因子为 0.75，这是时间和空间成本上一种折衷，一般情况下我们是无需修改的。

四. HashMap 的数据结构

1、哈希相关概念

Hash 就是把任意长度的输入 (又叫做预映射， pre-image)，通过散列算法，变换成固定长度的输出 (通常是整型)，该输出就是散列值。这种转换是一种压缩映射，也就是说，散列值的空间通常远小于输入的空间。不同的输入可能会散列成相同的输出，从而不可能从散列值来唯一的确定输入值。简单的说，就是一种将任意长度的消息压缩到某一固定长度的消息摘要函数。

2、哈希的应用–数据结构

我们知道，数组的特点是：寻址容易，插入和删除困难; 而链表的特点是：寻址困难，插入和删除容易。那么我们能不能综合两者的特性，做出一种寻址容易，插入和删除也容易的数据结构呢？答案是肯定的，这就是我们要提起的哈希表。事实上，哈希表有多种不同的实现方法，我们接下来解释的是最常用的一种方法 —— 拉链法，我们可以将其理解为链表的数组，如下图所示：

我们可以从上图看到，左边很明显是个数组，数组的每个成员是一个链表。该数据结构所容纳的所有元素均包含一个指针，用于元素间的链接。我们根据元素的自身特征把元素分配到不同的链表中去，反过来我们也正是通过这些特征找到正确的链表，再从链表中找出这个元素。其中，根据元素特征计算元素数组下标的方法就是散列法。

总的来说，哈希表适合用作快速查找、删除的基本数据结构，通常需要总数据量可以放入内存。在使用哈希表时，有以下几个关键点：

hash 函数的选择：针对不同的对象 (字符串、整数等) 具体的哈希方法；
碰撞处理：常用的有两种方式，一种是 open hashing，即拉链法；另一种就是 closed hashing，即开地址法 (opened addressing)。
更多关于哈希 (Hash) 的介绍，请移步我的博文。

3、HashMap 的数据结构

我们知道在 Java 中最常用的两种结构是数组和链表，几乎所有的数据结构都可以利用这两种来组合实现，HashMap 就是这种应用的一个典型。实际上，HashMap 就是一个链表数组，如下是它数据结构：

从上图中，我们可以形象地看出 HashMap 底层实现还是数组，只是数组的每一项都是一条链。其中参数 initialCapacity 就代表了该数组的长度，也就是桶的个数。在第三节我们已经了解了 HashMap 的默认构造函数的源码：

 /**
     * Constructs an empty HashMap with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        //负载因子:用于衡量的是一个散列表的空间的使用程度
        this.loadFactor = DEFAULT_LOAD_FACTOR; 
        //HashMap进行扩容的阈值，它的值等于 HashMap 的容量乘以负载因子
        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        // HashMap的底层实现仍是数组，只是数组的每一项都是一条链
        table = new Entry[DEFAULT_INITIAL_CAPACITY];
        init();
    }

从上述源码中我们可以看出，每次新建一个 HashMap 时，都会初始化一个 Entry 类型的 table 数组，其中 Entry 类型的定义如下：

static class Entry implements Map.Entry {
    final K key;     // 键值对的键
    V value;        // 键值对的值
    Entry next;    // 下一个节点
    final int hash;     // hash(key.hashCode())方法的返回值
    /**
     * Creates new entry.
     */
    Entry(int h, K k, V v, Entry n) {     // Entry 的构造函数
        value = v;
        next = n;
        key = k;
        hash = h;
    }
    ......
}

其中 Entry 为 HashMap 的内部类，实现了 Map.Entry 接口，其包含了键 key、值 value、下一个节点 next，以及 hash 值四个属性。事实上，Entry 是构成哈希表的基石，是哈希表所存储的元素的具体形式。

五. HashMap 快速存取的实现

上面简单介绍了哈希相关概念，并分析了 HashMap 的数据结构，下面将探讨 HashMap 是如何实现快速存取的。

1、HashMap 的存储实现

在 HashMap 中，键值对的存储是通过 put(key,vlaue) 方法来实现的，其源码如下：

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with key, or null if there was no mapping for key.
     *  Note that a null return can also indicate that the map previously associated null with key.
     */
    public V put(K key, V value) {
        //当key为null时，调用putForNullKey方法，并将该键值对保存到table的第一个位置 
        if (key == null)
            return putForNullKey(value); 
        //根据key的hashCode计算hash值
        int hash = hash(key.hashCode());             //  ------- (1)
        //计算该键值对在数组中的存储位置（哪个桶）
        int i = indexFor(hash, table.length);              // ------- (2)
        //在table的第i个桶上进行迭代，寻找 key 保存的位置
        for (Entry e = table[i]; e != null; e = e.next) {      // ------- (3)
            Object k;
            //判断该条链上是否存在hash值相同且key值相等的映射，若存在，则直接覆盖 value，并返回旧value
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;    // 返回旧值
            }
        }
        modCount++; //修改次数增加1，快速失败机制
        //原HashMap中无该映射，将该添加至该链的链头
        addEntry(hash, key, value, i);            
        return null;
    }

通过上述源码我们可以清楚了解到 HashMap 保存数据的过程：

首先，判断 key 是否为 null，若为 null，则直接调用 putForNullKey 方法；若不为空，则先计算 key 的 hash 值，然后根据 hash 值搜索在 table 数组中的索引位置，如果 table 数组在该位置处有元素，则查找是否存在相同的 key，若存在则覆盖原来 key 的 value，否则将该元素保存在链头（最先保存的元素放在链尾）。此外，若 table 在该处没有元素，则直接保存。这个过程看似比较简单，但其实有很多需要回味的地方，下面我们一一来看。

先看源码中的 (3) 处，此处迭代原因就是为了防止存在相同的 key 值。如果发现两个 hash 值（key）相同时，HashMap 的处理方式是用新 value 替换旧 value，这里并没有处理 key，这就解释了 HashMap 中没有两个相同的 key。

1). 对 NULL 键的特别处理：putForNullKey()

我们直接看其源码：

    /**
     * Offloaded version of put for null keys
     */
    private V putForNullKey(V value) {
        // 若key==null，则将其放入table的第一个桶，即 table[0]
        for (Entry e = table[0]; e != null; e = e.next) {   
            if (e.key == null) {   // 若已经存在key为null的键，则替换其值，并返回旧值
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;        // 快速失败
        addEntry(0, null, value, 0);       // 否则，将其添加到 table[0] 的桶中
        return null;
    }

通过上述源码我们可以清楚知到，HashMap 中可以保存键为 NULL 的键值对，且该键值对是唯一的。若再次向其中添加键为 NULL 的键值对，将覆盖其原值。此外，如果 HashMap 中存在键为 NULL 的键值对，那么一定在第一个桶中。

2). HashMap 中的哈希策略

在上述的 put(key,vlaue) 方法的源码中，我们标出了 HashMap 中的哈希策略（即 (1)、(2) 两处），hash() 方法用于对 Key 的 hashCode 进行重新计算，而 IndexFor 用于生成这个 Entry 对象的插入位置。当计算出来的 hash 值与 hashMap 的 (length-1) 做了 & 运算后，会得到位于区间 [0，length-1] 的一个值。也就是说，hash 值的均匀分布会直接导致这个值分布的均匀，从而提高 HashMap 的存取效率。

我们首先看 (1) 处的 hash() 方法，该方法为一个纯粹的数学计算，用于进一步计算 key 的 hash 值，源码如下：

    /**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. 
     * 
     * Note: Null keys always map to hash 0, thus index 0.
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

正如 JDK 官方对该方法的描述那样，使用 hash() 方法对一个对象的 hashCode 进行重新计算是为了防止质量低下的 hashCode() 函数实现。由于 hashMap 的支撑数组长度总是 2 的倍数，通过右移可以使低位的数据尽量的不同，从而使分布尽量均匀。更多关于该 hash(int h) 方法的介绍请见，此不赘述。

通过上述 hash()方法计算 hash 值后，怎么才能保证 table 元素分布均与呢？我们会想到取模，但是由于取模的消耗较大，HashMap 是通过调用 (2) 处的 indexFor()方法处理的，其不但简单而且效率很高，对应源码如下所示：

    /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);  // 作用等价于取模运算，但这种方式效率更高
    }

我们知道，HashMap 的底层数组长度总是 2 的 n 次方。当 length 为 2 的 n 次方时，h&(length - 1) 就相当于对 length 取模，而且速度比直接取模要快得多，这是 HashMap 在速度上的一个优化。至于 HashMap 的底层数组长度为什么是 2 的 n 次方，下面将给出解释。

总而言之，上述的 hash() 方法和 indexFor() 方法的作用只有一个：均匀分布 table 数据以便充分利用空间。

3). HashMap 中键值对的添加：addEntry()
我们直接看其源码：

     /**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     * 
     * 永远都是在链表的表头添加新元素
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        //获取bucketIndex处的链表
        Entry e = table[bucketIndex];
        //将新创建的 Entry 链入 bucketIndex处的链表的表头 
        table[bucketIndex] = new Entry(hash, key, value, e);
        //若HashMap中元素的个数超过极限值 threshold，则容量扩大两倍
        if (size++ >= threshold)
            resize(2 * table.length);
    }

通过上述源码我们可以清楚知到，HashMap 永远都是在链表的表头添加新元素。此外，若 HashMap 中元素的个数超过极限值 threshold，则其容量将扩大两倍。

4). HashMap 的扩容：resize()

我们直接看其源码：

     /**
     * Rehashes the contents of this map into a new array with a
     * larger capacity.  This method is called automatically when the
     * number of keys in this map reaches its threshold.
     *
     * If current capacity is MAXIMUM_CAPACITY, this method does not
     * resize the map, but sets threshold to Integer.MAX_VALUE.
     * This has the effect of preventing future calls.
     *
     * @param newCapacity the new capacity, MUST be a power of two;
     *        must be greater than current capacity unless current
     *        capacity is MAXIMUM_CAPACITY (in which case value
     *        is irrelevant).
     */
    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        // 若 oldCapacity 已达到最大值，直接将 threshold 设为 Integer.MAX_VALUE
        if (oldCapacity == MAXIMUM_CAPACITY) {  
            threshold = Integer.MAX_VALUE;
            return;             // 直接返回
        }
        // 否则，创建一个更大的数组
        Entry[] newTable = new Entry[newCapacity];
        //将每条Entry重新哈希到新的数组中
        transfer(newTable);
        table = newTable;
        threshold = (int)(newCapacity * loadFactor);  // 重新设定 threshold
    }

该方法的作用及触发动机如下：

Rehashes the contents of this map into a new array with a larger capacity. This method is called automatically when the number of keys in this map reaches its threshold.

5). HashMap 的重哈希：transfer()

我们直接看其源码：

    /**
     * Transfers all entries from current table to newTable.
     */
    void transfer(Entry[] newTable) {
        // 将原数组 table 赋给数组 src
        Entry[] src = table;
        int newCapacity = newTable.length;
        // 将数组 src 中的每条链重新添加到 newTable 中
        for (int j = 0; j < src.length; j++) {
            Entry e = src[j];
            if (e != null) {
                src[j] = null;   // src 回收
                // 将每条链的每个元素依次添加到 newTable 中相应的桶中
                do {
                    Entry next = e.next;
                    // int hash = hash(key.hashCode());
                    // 计算在newTable中的位置，注意原来在同一条子链上的元素可能被分配到不同的子链
                    int i = indexFor(e.hash, newCapacity);   
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while (e != null);
            }
        }
    }

该方法的作用如下：

Transfers all entries from current table to newTable.

特别需要注意的是，在重哈希的过程中，原属于一个桶中的 Entry 对象可能被分到不同的桶，因为 HashMap 的容量发生了变化，那么 h&(length - 1) 的值也会发生相应的变化。极端地说，如果重哈希后，原属于一个桶中的 Entry 对象仍属于同一桶，那么重哈希也就失去了意义。

2、HashMap 的读取实现

相对于 HashMap 的存储而言，读取就显得比较简单了。因为只需通过 key 的 hash 值定位到 table 数组的某个特定的桶，然后查找并返回该 key 对应的 value 即可，源码如下：

/**
     * Returns the value to which the specified key is mapped,
     * or  null} if this map contains no mapping for the key.
     *
     * 
More formally, if this map contains a mapping from a key
     *  k} to a value  v} such that  (key==null ? k==null :
     * key.equals(k))}, then this method returns  v}; otherwise
     * it returns  null}.  (There can be at most one such mapping.)
     *
     *
 
A return value of  null} does not necessarily
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to  null}.
     * The  #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     *
 
 @see #put(Object, Object)
     */
    public V get(Object key) {
        // 若为null，调用getForNullKey方法返回相对应的value
        if (key == null)
            // 从table的第一个桶中寻找 key 为 null 的映射；若不存在，直接返回null
            return getForNullKey();  
        // 根据该 key 的 hashCode 值计算它的 hash 码 
        int hash = hash(key.hashCode());
        // 找出 table 数组中对应的桶
        for (Entry e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            //若搜索的key与查找的key相同，则返回相对应的value
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

在这里能够根据 key 快速的取到 value，除了和 HashMap 的数据结构密不可分外，还和 Entry 有莫大的关系。在前面就已经提到过，HashMap 在存储过程中并没有将 key，value 分开来存储，而是当做一个整体 key-value 来处理的，这个整体就是 Entry 对象。特别地，在 Entry 对象中国，value 的地位要比 key 低一些，其相当于 key 的附属。

其中，针对键为 NULL 的键值对，HashMap 提供了专门的方法：getForNullKey()，其源码如下：

 /**
     * Offloaded version of get() to look up null keys.  Null keys map
     * to index 0.  This null case is split out into separate methods
     * for the sake of performance in the two most commonly used
     * operations (get and put), but incorporated with conditionals in
     * others.
     */
    private V getForNullKey() {
        // 键为NULL的键值对若存在，则必定在第一个桶中
        for (Entry e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        // 键为NULL的键值对若不存在，则直接返回 null
        return null;
    }

特别地，调用 HashMap 的 get(Object key) 方法后，若返回值是 NULL，则存在如下两种可能：

该 key 对应的值就是 null;
HashMap 中不存在该 key。

3、HashMap 存取小结

在存储的过程中，系统根据 key 的 hash 值来定位 Entry 在 table 数组中的哪个桶，并且将其放到对应的链表的链头；在取的过程中，同样根据 key 的 hash 值来定位 Entry 在 table 数组中的哪个桶，然后在该桶中查找并返回。

来源: http://www.bubuko.com/infodetail-1985778.html

与本文相关文章

暂无,快来抢沙发吧！