世界上最牛的 Java 代码去哪找?当然是 JDK 咯~计划学习一下常见容器的源码. 我会把我觉得比较有意思或者玄学的地方更新到这里.
以下 JDK 源码及 Javadoc 均从 java version "1.8.0_131" 版本实现中摘录或翻译 java.util.ArrayList
首先,开头就很有意思,声明了两个空数组:
116 行 - 126 行
有什么差别?注释上告诉我们,EMPTY_ELEMENTDATA 用于空数组实例,而
/**
* Shared empty array instance used for empty instances.
*/
private static final Object[] EMPTY_ELEMENTDATA = {};
/**
* Shared empty array instance used for default sized empty instances. We
* distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
* first element is added.
*/
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
DEFAULTCAPACITY_EMPTY_ELEMENTDATA
用于默认的空数组实例,他们的区别在于是否能获知首次添加元素时对数组的扩充量.这样看我们也是一头雾水,不如我们看一下他是怎样应用的:
143 行 - 166 行
我们看,当我们使用 new ArrayList() 创建 ArrayList 实例时,elementData 被赋值为
/**
* Constructs an empty list with the specified initial capacity.
*
* @param initialCapacity the initial capacity of the list
* @throws IllegalArgumentException if the specified initial capacity
* is negative
*/
public ArrayList(int initialCapacity) {
if (initialCapacity > 0) {
this.elementData = new Object[initialCapacity];
} else if (initialCapacity == 0) {
this.elementData = EMPTY_ELEMENTDATA;
} else {
throw new IllegalArgumentException("Illegal Capacity: " + initialCapacity);
}
}
/**
* Constructs an empty list with an initial capacity of ten.
*/
public ArrayList() {
this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
DEFAULTCAPACITY_EMPTY_ELEMENTDATA
,而如果我们这样创建实例 new ArrayList(0),elementData 将会被赋值为 EMPTY_ELEMENTDATA.这看似没什么区别,我们再看一下扩容数组的函数.
222 行 - 228 行
这个函数中对 elementData 的引用进行判断,如果引用是
private void ensureCapacityInternal(int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
}
ensureExplicitCapacity(minCapacity);
}
DEFAULTCAPACITY_EMPTY_ELEMENTDATA
则会在
DEFAULT_CAPACITY(值为10)
与 minCapacity 中选择最大值为扩容后的长度.这里相当于用 elementData 的引用值做一个标记:如果我们使用 new ArrayList() 创建实例,则 ArrayList 在首次添加元素时会将数组扩容至 DEFAULT_CAPACITY 如果我们使用 new ArrayList(0) 创建实例则会按照
newCapacity = oldCapacity + (oldCapacity >> 1);(255行)
规则扩充实例.
那么,为什么这样做.我们想,我们使用空构造器和 new ArrayList(0) 创建实例的应用场景是不一样的,前者是我们无法预估列表中将会有多少元素,后者是我们预估元素个数会很少.因此 ArrayList 对此做了区别,使用不同的扩容算法.
然而令我惊讶的是,通过引用判断来区别用户行为的这种方式,这是我想不到的,如果是我,我一定会再设置一个标志变量.
238 行 - 244 行
这个变量显然是数组的最大长度,但是为什么要
/**
* The maximum size of array to allocate.
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
Integer.MAX_VALUE - 8
呢,注释给了我们答案:一些 VM 会在数组头部储存头数据,试图尝试创建一个比
Integer.MAX_VALUE - 8
大的数组可能会产生 OOM 异常.
246 行 - 262 行
此处是扩容数组的核心代码,其计算新数组的长度采用
/**
* Increases the capacity to ensure that it can hold at least the
* number of elements specified by the minimum capacity argument.
*
* @param minCapacity the desired minimum capacity
*/
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0) newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0) newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
oldCapacity + (oldCapacity >> 1)
,新数组大概是原数组长度的 1.5 倍,使用位运算计算速度会比较快.
然而我想说的并不是这个.而是这句话
if (newCapacity - minCapacity < 0)
.在 ArrayList 类代码中,随处可见类似 a-b<0 的判断,那么为什么不用 a
下面是我写的测试代码:
这段代码的输出是 false false, 我们可以看到,两种写法在正常情况下是等效的.但是我们考虑一下如果 newCap 溢出呢?我们令
int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
int newCap = 65421;
System.out.println(newCap > MAX_ARRAY_SIZE);
System.out.println(newCap - MAX_ARRAY_SIZE > 0);
newCap = Integer.MAX_VALUE + 654321
输出结果就变成 了 false true. 这是为什么?
我们看,newCap 由于溢出,进位覆盖了符号位,因此 newCap 为负,我们将
newCap: -2147418228
10000000000000001111111110001100
MAX_ARRAY_SIZE: 2147483639
01111111111111111111111111110111
MAX_ARRAY_SIZE*-1: -2147483639
10000000000000000000000000001001
newCap - MAX_ARRAY_SIZE: 65429
00000000000000001111111110010101
newCap - MAX_ARRAY_SIZE
看成
newCap + MAX_ARRAY_SIZE * -1
,加和时两符号位相加又变成了 0(两个大负数相加又一次溢出),结果为正,而这次溢出恰恰是我们想要的,得到了正确的结果.
649 行 - 675 行
这个变量很有意思,它是用来标记数组结构的改变.每一次数组发生结构改变(比如说增与删)这个变量都会自增.当 List 进行遍历的时候,遍历的前后会检查这个遍历是否被改变,如果有改变则将抛出异常. 这种设计体现了 fail-fast 思想,这是一种编程哲学.尽可能的抛出异常而不是武断的处理一个可能会造成问题的异常.这种思想在很多应用场景的开发(尤其是多线程高并发)都起着重要的指导作用.ErLang 就很提倡这种做法.
/**
* The number of times this list has been <i>structurally modified</i>.
* Structural modifications are those that change the size of the
* list, or otherwise perturb it in such a fashion that iterations in
* progress may yield incorrect results.
*
* <p>This field is used by the iterator and list iterator implementation
* returned by the {@code iterator} and {@code listIterator} methods.
* If the value of this field changes unexpectedly, the iterator (or list
* iterator) will throw a {@code ConcurrentModificationException} in
* response to the {@code next}, {@code remove}, {@code previous},
* {@code set} or {@code add} operations. This provides
* <i>fail-fast</i> behavior, rather than non-deterministic behavior in
* the face of concurrent modification during iteration.
*
* <p><b>Use of this field by subclasses is optional.</b> If a subclass
* wishes to provide fail-fast iterators (and list iterators), then it
* merely has to increment this field in its {@code add(int, E)} and
* {@code remove(int)} methods (and any other methods that it overrides
* that result in structural modifications to the list). A single call to
* {@code add(int, E)} or {@code remove(int)} must add no more than
* one to this field, or the iterators (and list iterators) will throw
* bogus {@code ConcurrentModificationExceptions}. If an implementation
* does not wish to provide fail-fast iterators, this field may be
* ignored.
*/
protected transient int modCount = 0;
后记
看完 ArrayList 的源码,我心里有这样的感受:严谨,高效,还有很多我之前不知道的操作.自己的代码和大牛的代码差距还是很大的.看完这些我也不知道我能吸收多少...... 慢慢来吧.
来源: https://juejin.im/post/5a5eeeb56fb9a01cb64ed13c