面试 13: 基于排序算法的总结

浑浑噩噩, 我们前面已经讲解了冒泡, 插入, 选择, 归并, 快排 5 种排序算法, 其他的由于时间关系, 我们就不一一例举了.

说到排序, 不得不想到我们 JDK 中自带的 Arrays.sort() 和 Collections.sort() 方法. 这两个方法基本算是 Java 中排序王牌了. 在我们常用的方式中, Arrays.sort() 接受一个数组型参数, 而 Collections.sort() 接受一个 List 集合参数. 抛开它们可以接受多种类型的参数, 我们且看看平时排序用的较多的 Arrays.sort() 是如何实现的.

/**
     * Sorts the specified array into ascending numerical order.
     *
     * <p>Implementation note: The sorting algorithm is a Dual-Pivot Quicksort
     * by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm
     * offers O(n log(n)) performance on many data sets that cause other
     * quicksorts to degrade to quadratic performance, and is typically
     * faster than traditional (one-pivot) Quicksort implementations.
     *
     * @param a the array to be sorted
     */
    public static void sort(int[] a) {
        DualPivotQuicksort.sort(a, 0, a.length - 1, null, 0, 0);
    }

可以看到, 在 JDK 里面把排序交给 DualPivotQuicksort 类进行全全处理, 该类使用的是一种叫做双轴快速排序的算法. 从注释中我们可以看出, 该排序方法默认是升序排序的, 并且同样提供 O(nlogn) 的时间复杂度.

我们不妨进到

DualPivotQuicksort.sort()

查看代码.

/**
     * Sorts the specified range of the array using the given
     * workspace array slice if possible for merging
     *
     * @param a the array to be sorted
     * @param left the index of the first element, inclusive, to be sorted
     * @param right the index of the last element, inclusive, to be sorted
     * @param work a workspace array (slice)
     * @param workBase origin of usable space in work array
     * @param workLen usable size of work array
     */
    static void sort(int[] a, int left, int right,
                     int[] work, int workBase, int workLen) {
        // Use Quicksort on small arrays
        if (right - left <QUICKSORT_THRESHOLD) {
            sort(a, left, right, true);
            return;
        }
        /*
         * Index run[i] is the start of i-th run
         * (ascending or descending sequence).
         */
        int[] run = new int[MAX_RUN_COUNT + 1];
        int count = 0; run[0] = left;
        // Check if the array is nearly sorted
        for (int k = left; k < right; run[count] = k) {
            if (a[k] < a[k + 1]) { // ascending
                while (++k <= right && a[k - 1] <= a[k]);
            } else if (a[k]> a[k + 1]) { // descending
                while (++k <= right && a[k - 1]>= a[k]);
                for (int lo = run[count] - 1, hi = k; ++lo <--hi; ) {
                    int t = a[lo]; a[lo] = a[hi]; a[hi] = t;
                }
            } else { // equal
                for (int m = MAX_RUN_LENGTH; ++k <= right && a[k - 1] == a[k]; ) {
                    if (--m == 0) {
                        sort(a, left, right, true);
                        return;
                    }
                }
            }
            /*
             * The array is not highly structured,
             * use Quicksort instead of merge sort.
             */
            if (++count == MAX_RUN_COUNT) {
                sort(a, left, right, true);
                return;
            }
        }
        // Check special cases
        // Implementation note: variable "right" is increased by 1.
        if (run[count] == right++) { // The last run contains one element
            run[++count] = right;
        } else if (count == 1) { // The array is already sorted
            return;
        }
        // Determine alternation base for merge
        byte odd = 0;
        for (int n = 1; (n <<= 1) < count; odd ^= 1);
        // Use or create temporary array b for merging
        int[] b;                 // temp array; alternates with a
        int ao, bo;              // array offsets from 'left'
        int blen = right - left; // space needed for b
        if (work == null || workLen < blen || workBase + blen> work.length) {
            work = new int[blen];
            workBase = 0;
        }
        if (odd == 0) {
            System.arraycopy(a, left, work, workBase, blen);
            b = a;
            bo = 0;
            a = work;
            ao = workBase - left;
        } else {
            b = work;
            ao = 0;
            bo = workBase - left;
        }
        // Merging
        for (int last; count> 1; count = last) {
            for (int k = (last = 0) + 2; k <= count; k += 2) {
                int hi = run[k], mi = run[k - 1];
                for (int i = run[k - 2], p = i, q = mi; i <hi; ++i) {
                    if (q>= hi || p <mi && a[p + ao] <= a[q + ao]) {
                        b[i + bo] = a[p++ + ao];
                    } else {
                        b[i + bo] = a[q++ + ao];
                    }
                }
                run[++last] = hi;
            }
            if ((count & 1) != 0) {
                for (int i = right, lo = run[count - 1]; --i>= lo;
                    b[i + bo] = a[i + ao]
                );
                run[++last] = right;
            }
            int[] t = a; a = b; b = t;
            int o = ao; ao = bo; bo = o;
        }
    }

注意上面的代码, 可以看到当我们传入的数组长度小于常量 QUICKSORT_THRESHOLD 时, 我们直接采用双轴快速排序. 从定义中我们可以看到这个常量值为 286.

/**
     * If the length of an array to be sorted is less than this
     * constant, Quicksort is used in preference to merge sort.
     */
    private static final int QUICKSORT_THRESHOLD = 286;

对于双轴快速排序, 受于篇幅关系, 这里就不细讲了, 感兴趣的可以 Google 一下.

在 JDK 的 sort() 方法中, 实际上还做了一次判断. 当传入的数组长度小于常量

INSERTION_SORT_THRESHOLD

值时, 将直接采用插入排序.

/**
     * Sorts the specified range of the array by Dual-Pivot Quicksort.
     *
     * @param a the array to be sorted
     * @param left the index of the first element, inclusive, to be sorted
     * @param right the index of the last element, inclusive, to be sorted
     * @param leftmost indicates if this part is the leftmost in the range
     */
    private static void sort(int[] a, int left, int right, boolean leftmost) {
        int length = right - left + 1;
        // Use insertion sort on tiny arrays
        if (length <INSERTION_SORT_THRESHOLD) {
            if (leftmost) {
                /*
                 * Traditional (without sentinel) insertion sort,
                 * optimized for server VM, is used in case of
                 * the leftmost part.
                 */
                for (int i = left, j = i; i < right; j = ++i) {
                    int ai = a[i + 1];
                    while (ai < a[j]) {
                        a[j + 1] = a[j];
                        if (j-- == left) {
                            break;
                        }
                    }
                    a[j + 1] = ai;
                }
            } else {
                /*
                 * Skip the longest ascending sequence.
                 */
                do {
                    if (left>= right) {
                        return;
                    }
                } while (a[++left]>= a[left - 1]);
                /*
                 * Every element from adjoining part plays the role
                 * of sentinel, therefore this allows us to avoid the
                 * left range check on each iteration. Moreover, we use
                 * the more optimized algorithm, so called pair insertion
                 * sort, which is faster (in the context of Quicksort)
                 * than traditional implementation of insertion sort.
                 */
                for (int k = left; ++left <= right; k = ++left) {
                    int a1 = a[k], a2 = a[left];
                    if (a1 < a2) {
                        a2 = a1; a1 = a[left];
                    }
                    while (a1 < a[--k]) {
                        a[k + 2] = a[k];
                    }
                    a[++k + 1] = a1;
                    while (a2 < a[--k]) {
                        a[k + 1] = a[k];
                    }
                    a[k + 1] = a2;
                }
                int last = a[right];
                while (last < a[--right]) {
                    a[right + 1] = a[right];
                }
                a[right + 1] = last;
            }
            return;
        }
        // 源码太多, 在此省略
    }

由于插入排序并不需要遍历整个数组, 整个排序算法的核心性能消耗是移位, 所以在数组长度较小的时候, 插入排序的效果是非常好的.

可见我们的 JDK 排序利用了多个排序算法的各种优良属性, 避免出现我们前面讲到的各种糟糕情况. 所以我们不得不在此对各种算法进行一定的总结.

总结

相信从前面的文章中大家也有了自己对排序算法的认知, 目前并没有十全十美的排序算法, 有优点就会有缺点, 即使是快速排序法, 也只是在整体性能上优越, 它也存在排序不稳定, 需要大量辅助空间, 对少量数据排序无优势等不足. 因此我们必须从多个角度剖析一下它们的优劣.

为了省时间, 我们这里就直接搬运大话数据结构中的总结了.

希尔排序和堆排序, 在南尘的文章中没有讲, 感兴趣的直接 Google.

image.png

从算法的简单性来看, 我们将 7 种算法分为两类:

简单算法: 冒油, 简单选择, 直接插入.

改进算法: 希尔, 堆, 归并, 快速.

从平均情况来看, 显然最后 3 种改进算法要胜过希尔排序, 并远远胜过前 3 种简单算法 .

从最好情况看, 反而冒泡和直接插入排序要更胜一筹, 也就是说, 如果你的待排序序列总是基本有序, 反而不应该考虑 4 种复杂的改进算法.

从最坏情况看, 堆排序与归并排序又强过快速排序以及其他简单排序.

从这三组时间复杂度的数据对比中, 我们可以得出这样一个认识:

堆排序和归并排序就像两个参加奥数考试的优等生, 心理素质强, 发挥稳定. 而快速排序像是很情绪化的天才, 心情好时表现极佳, 碰到较糟糕环境会变得差强人意. 但是他们如果都来比赛计算个位数的加减法, 它们反而算不过成绩极普通的冒泡和直接插入.

从空间复杂度来说, 归并排序强调要马跑得快, 就得给马吃个饱. 快速排序也有相应的空间要求, 反而堆排序等却都是少量索取, 大量付出, 对空间要求是 O(1), 如果执行算法的软件所处的环境非常在乎内存使用量的多少时, 选择归并排序和快速排序就不是一个较好的决策了.

从稳定性来看, 归并排序独占鳌头, 我们前面也说过, 对于非常在乎排序稳定性的应用中, 归并排序是个好算法.

从待排序记录的个数上来说, 待排序的个数 n 越小, 采用简单排序方法越合适. 反之, n 越大, 采用改进排序方法越合适. 这也就是为什么在 JDK 中对快速排序优化时, 增加了一个阀值 47, 低于阀值时换作直接插入排序的原因.

好了, 本篇文章有点状态迷糊, 大家且将就看看, 我们下周再见.

来源: http://www.jianshu.com/p/135f13d77443

与本文相关文章

暂无,快来抢沙发吧！