最近做 Machine Learning 作业, 要在 Jupyter Notebook 上用 keras 搭建 Neural Network. 结果连最简单的一层神经网络都运行不了, 更奇怪的是我先用 iris 数据集跑了一遍并没有任何问题, 但是用老师给的 fashion mnist 一运行服务器就提示挂掉重启. 更更奇怪的是同样的 code 在同学的电脑上跑也是一点问题都没有, 让我一度以为是我的 MacBook 年代久远配置太低什么的, 差点要买新电脑了 >_< 今天上课经 ML 老师几番调试, 竟然完美解决了, 不愧是 CMU 大神!(这里给 Prof 强烈打 call, 虽然他看不懂中文 ><) 因为刚学 python 没多久, 还很不熟悉, 经过这次又学会好多新技能
出问题的完整 code 如下, 就是用 Keras 实现 logistic regression, 是一个简单的一层网络, 但是每次运行到最后一行 server 就挂掉, 然后重启 kernel.
- %matplotlib inline
- import numpy as np
- import matplotlib.pyplot as plt
- from sklearn.decomposition import PCA, FastICA
- from sklearn.linear_model import LogisticRegression
- from keras.models import Sequential
- from keras.layers import Dense, Activation, Conv2D
- from keras.utils import to_categorical
- from keras.datasets import fashion_mnist
- (x3_train, y_train), (x3_test, y_test) = fashion_mnist.load_data()
- n_classes = np.max(y_train) + 1
- # Vectorize image arrays, since most methods expect this format
- x_train = x3_train.reshape(x3_train.shape[0], np.prod(x3_train.shape[1:]))
- x_test = x3_test.reshape(x3_test.shape[0], np.prod(x3_test.shape[1:]))
- # Binary vector representation of targets (for one-hot or multinomial output networks)
- y3_train = to_categorical(y_train)
- y3_test = to_categorical(y_test)
- from sklearn import preprocessing
- scaler = preprocessing.StandardScaler()
- x_train_scaled = scaler.fit_transform(x_train)
- x_test_scaled = scaler.fit_transform(x_test)
- n_output = y3_train.shape[1]
- n_input = x_train_scaled.shape[1]
- nn_lr = Sequential()
- nn_lr.add(Dense(units=n_output, input_dim= n_input, activation = 'softmax'))
- nn_lr.compile(optimizer = 'sgd', loss = 'categorical_crossentropy', metrics = ['accuracy'])
由于 Jupyter Notebook 只是一直重启 kernel, 并没有任何错误提示, 所以让人无从下手. 但是经老师提示原来启动 Jupyter Notebook 时自动打开的 terminal 上会记录运行的信息 (小白第一次发现..), 包括了 kerter 中止及重启的详细过程及原因:
- [I 22:11:54.603 NotebookApp]Kernel interrupted: 7e7f6646-97b0-4ec7-951c-1dce783f60c4
- [I 22:13:49.160 NotebookApp]Saving file at /Documents/[Rutgers]Study/2019Spring/MACHINE LEARNING W APPLCTN LARGE DATASET/hw/Untitled1.ipynb
- 2019-03-28 22:13:49.829246: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
- 2019-03-28 22:13:49.829534: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
- OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
- OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
- [I 22:13:51.049 NotebookApp]KernelRestarter: restarting kernel (1/5), keep random ports
- kernel c1114f5a-3829-432f-a26a-c2db6c330352 restarted
还有另外一个方法, 把代码 copy 到 ipython 中, 也可以得到类似的信息, 所以最后定位的错误是:
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
谷歌了一下, GitHub 上有一个很详细的讨论帖, 但是楼主是运行 XGBoost 时遇到了这个问题, 让我联想到寒假安装 XGBoost 确实经过了很曲折的过程, 可能不小心把某个文件重复下载到了不同路径, 于是程序加载 package 时出现了冲突. 帖子里提供了几种可能的原因及解决方法:
1. 卸载 clang-omp
- brew uninstall libiomp clang-omp
- as long as u got gcc v5 from brew it come with openmp
- follow steps in:
尝试了卸载 xgboost 再安装, 然后卸载 clang-omp, 得到错误提示
- No such keg: /usr/local/Cellar/libiomp
- pip uninstall xbgoost
- pip install xgboost
- brew uninstall libiomp clang-omp
2. 直接在 jupyter notebook 里运行:
- # DANGER! DANGER!
- import os
- os.environ['KMP_DUPLICATE_LIB_OK']='True'
老师说这行命令可以让系统忽略 package 冲突的问题, 自行选择一个 package 使用. 试了一下这个方法确实有效, 但这是非常危险的做法, 极度不推荐!
3. 找到重复的 libiomp5.dylib 文件, 删除其中一个
在 Finder 中确实找到了两个文件, 分别在~/anaconda3/lib和~/anaconda3/lib/python3.6/site-packages/_solib_darwin/_U@mkl_Udarwin_S_S_Cmkl_Ulibs_Udarwin___Uexternal_Smkl_Udarwin_Slib (????) 可是不太确定应该删除哪一个, 感觉这种做法也蛮危险的, 删错了整个跑不起来了.
4. OpenMP 冲突
Hint: This means that multiple copies of the OpenMP runtime have been linked into the program
根据提示信息里的 Hint, 搜了下 TensorFlow OpenMP.OpenMP 是一个多线程并行编程的平台, TensorFlow 似乎有自己的并行计算架构, 并用不上 OpenMP(see )
5. 卸载 nomkl
I had the same error on my Mac with a python program using numpy, keras, and matplotlib. I solved it with 'conda install nomkl'.
这是最后有效的做法! nomkl 全称是 Math Kernel Library (MKL) Optimization, 是 Interl 开发的用来加速数学运算的模块, 通过 conda 安装 package 可以自动使用 mkl, 更详细的信息可以看这个 Anaconda 的官方文档 https://docs.anaconda.com/mkl-optimizations/ .
To opt out, run conda install nomkl and then use conda install to install packages that would normally include MKL or depend on packages that include MKL, such as scipy, numpy, and pandas.
可能是 numpy 之类的 package 更新时出现了一些冲突, 安装 nomkl 之后竟然神奇地解决了, 后来又尝试把 MKL 卸载了, 程序依然正常运行.. 卸载命令如下:
conda remove mkl mkl-service
总结:
1. 老师好厉害呀, 三下五除二就把问题解决了 ><
2. 经大神提醒, 运行 python 之前创建一个虚拟环境可以很好避免 package 冲突之类的问题, 具体方法: https://www.jianshu.com/p/d8e7135dca40 .
来源: https://www.cnblogs.com/sherrydatascience/p/10626474.html