Java 面试必问之 Hashmap 底层实现原理 (JDK1.7)

1. 前言

Hashmap 可以说是 Java 面试必问的, 一般的面试题会问:

Hashmap 有哪些特性?

Hashmap 底层实现原理(get\put\resize)

Hashmap 怎么解决 hash 冲突?

Hashmap 是线程安全的吗?

...

今天就从源码角度一探究竟. 笔者的源码是 OpenJDK1.7

2. 构造方法

首先看构造方法的源码

// 默认初始容量
    static final int DEFAULT_INITIAL_CAPACITY = 16;
    // 默认负载因子
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    // 数组, 该数据不参与序列化
    transient Entry[] table;
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        // 初始容量 16, 扩容因子 0.75, 扩容临界值 12
        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        // 基础结构为 Entry 数组
        table = new Entry[DEFAULT_INITIAL_CAPACITY];
        init();
    }

由以上源码可知, Hashmap 的初始容量默认是 16, 底层存储结构是数组(到这里只能看出是数组, 其实还有链表, 下边看源码解释). 基本存储单元是 Entry, 那 Entry 是什么呢? 我们接着看 Entry 相关源码,

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;    // 链表后置节点
        final int hash;
        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;   // 头插法: newEntry.next=e
            key = k;
            hash = h;
        }
        ...
    }

由 Entry 源码可知, Entry 是链表结构. 综上所述, 可以得出:

Hashmap 底层是基于数组和链表实现的

3. Hashmap 中 put()过程

我已经将 put 过程绘制了流程图帮助大家理解

先上 put 源码

public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        // 根据 key 计算 hash
        int hash = hash(key.hashCode());
        // 计算元素在数组中的位置
        int i = indexFor(hash, table.length);
        // 遍历链表, 如果相同覆盖
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        // 头插法插入元素
        addEntry(hash, key, value, i);
        return null;
    }

上图中多次提到头插法, 啥是头插法呢? 接下来看 addEntry 方法

void addEntry(int hash, K key, V value, int bucketIndex) {
        // 取出原 bucket 链表
        Entry<K,V> e = table[bucketIndex];
        // 头插法
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        // 判断是否需要扩容
        if (size++>= threshold)
            // 扩容好容量为原来的 2 倍
            resize(2 * table.length);
    }

结合 Entry 类的构造方法, 每次插入新元素的时候, 将 bucket 原链表取出, 新元素的 next 指向原链表, 这就是头插法 . 为了更加清晰的表示 Hashmap 存储结构, 再绘制一张存储结构图.

4. Hashmap 中 get()过程

get()逻辑相对比较简单, 如图所示

我们来对应下 get()源码

public V get(Object key) {
        // 获取 key 为 null 的值
        if (key == null)
            return getForNullKey();
        // 根据 key 获取 hash
        int hash = hash(key.hashCode());
        // 遍历链表, 直到找到元素
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

5. Hashmap 中 resize()过程

只要是新插入元素, 即执行 addEntry()方法, 在插入完成后, 都会判断是否需要扩容. 从 addEntry()方法可知, 扩容后的容量为原来的 2 倍.

void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
        // 新建数组
        Entry[] newTable = new Entry[newCapacity];
        // 数据迁移
        transfer(newTable);
        // table 指向新的数组
        table = newTable;
        // 新的扩容临界值
        threshold = (int)(newCapacity * loadFactor);
    }

这里有个 transfer()方法没讲, 别着急, 扩容时线程安全的问题出现在这个方法中, 接下来讲解数组复制过程.

6. Hashmap 扩容安全问题

大家都知道结果: 多线程扩容有可能会形成环形链表, 这里用图给大家模拟下扩容过程.

首先看下单线程扩容的头插法

然后看下多线程可能会出现的问题

以下是源码, 你仔细品一品

void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        for (int j = 0; j <src.length; j++) {
            Entry<K,V> e = src[j];
            if (e != null) {
                // 释放旧 Entry 数组的对象引用
                src[j] = null;
                do {
                    Entry<K,V> next = e.next;
                    // 重新根据新的数组长度计算位置(同一个 bucket 上元素 hash 相等, 所以扩容后必然还在一个链表上)
                    int i = indexFor(e.hash, newCapacity);
                    // 头插法 (同一位置上新元素总会被放在链表的头部位置), 将 newTable[i] 的引用赋给了 e.next
                    e.next = newTable[i];
                    // 将元素放在数组上
                    newTable[i] = e;
                    // 访问下一个元素
                    e = next;
                } while (e != null);
            }
        }
    }

7. Hashmap 寻找 bucket 位置

static int indexFor(int h, int length) {
        // 根据 hash 与数组长度 mod 运算
        return h & (length-1);
    }

由源码可知, jdk 根据 key 的 hash 值和数组长度做 mod 运算, 这里用位运算代替 mod.

hash 运算值是一个 int 整形值, 在 java 中 int 占 4 个字节, 32 位, 下边通过图示来说明位运算.

8. AD

如果您觉得还行, 请关注公众号[当我遇上你] , 您的支持是我输出的最大动力.

同时, 欢迎大家一起交流学习.

来源: https://www.cnblogs.com/idea360/p/12419688.html

与本文相关文章

暂无,快来抢沙发吧！