我的 tensorflow+keras 版本:
- print(tf.VERSION) # '1.10.0'
- print(tf.keras.__version__) # '2.1.6-tf'
tf.keras 没有实现 AdamW, 即 Adam with Weight decay. 论文《DECOUPLED WEIGHT DECAY REGULARIZATION》提出, 在使用 Adam 时, weight decay 不等于 L2 regularization. 具体可以参见 当前训练神经网络最快的方式: AdamW 优化算法 + 超级收敛 或 L2 正则 = Weight Decay? 并不是这样 https://zhuanlan.zhihu.com/p/40814046 .
keras 中没有实现 AdamW 这个 optimizer, 而 tensorflow 中实现了, 所以在 tf.keras 中引入 tensorflow 的 optimizer 就好.
如下所示:
- import tensorflow as tf
- from tensorflow.contrib.opt import AdamWOptimizer
- mnist = tf.keras.datasets.mnist
- (x_train, y_train),(x_test, y_test) = mnist.load_data()
- x_train, x_test = x_train / 255.0, x_test / 255.0
- model = tf.keras.models.Sequential([
- tf.keras.layers.Flatten(input_shape=(28, 28)),
- tf.keras.layers.Dense(512, activation=tf.nn.relu),
- tf.keras.layers.Dropout(0.2),
- tf.keras.layers.Dense(10, activation=tf.nn.softmax)
- ])
- # adam = tf.train.AdamOptimizer()
- # adam with weight decay
- adamw = AdamWOptimizer(weight_decay=1e-4)
- model.compile(optimizer=adamw,
- loss='sparse_categorical_crossentropy',
- metrics=['accuracy'])
- model.fit(x_train, y_train, epochs=10, validation_split=0.1)
- print(model.evaluate(x_test, y_test))
如果只是像上面这样使用的话, 已经没问题了. 但是如果要加入 tf.keras.callbacks 中的某些元素, 如 tf.keras.callbacks.ReduceLROnPlateau(), 可能就会出现异常 AttributeError: 'TFOptimizer' object has no attribute 'lr'.
以下代码将出现 AttributeError: 'TFOptimizer' object has no attribute 'lr', 就是因为加入了 tf.keras.callbacks.ReduceLROnPlateau(), 其它两个 callbacks 不会引发异常.
- import tensorflow as tf
- from tensorflow.contrib.opt import AdamWOptimizer
- mnist = tf.keras.datasets.mnist
- (x_train, y_train),(x_test, y_test) = mnist.load_data()
- x_train, x_test = x_train / 255.0, x_test / 255.0
- model = tf.keras.models.Sequential([
- tf.keras.layers.Flatten(input_shape=(28, 28)),
- tf.keras.layers.Dense(512, activation=tf.nn.relu),
- tf.keras.layers.Dropout(0.2),
- tf.keras.layers.Dense(10, activation=tf.nn.softmax)
- ])
- # 按照 val_acc 的值来保存模型的参数, val_acc 有提升才保存新的参数
- ck_callback = tf.keras.callbacks.ModelCheckpoint('checkpoints/weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5', monitor='val_acc', mode='max',
- verbose=1, save_best_only=True, save_weights_only=True)
- # 使用 tensorboard 监控训练过程
- tb_callback = tf.keras.callbacks.TensorBoard(log_dir='logs')
- # 在 patience 个 epochs 内, 被监控的 val_loss 都没有下降, 那么就降低 learning rate, 新的值为 lr = factor * lr_old
- lr_callback = tf.keras.callbacks.ReduceLROnPlateau(patience=3)
- adam = tf.train.AdamOptimizer()
- # adam with weight decay
- # adamw = AdamWOptimizer(weight_decay=1e-4)
- model.compile(optimizer=adam,
- loss='sparse_categorical_crossentropy',
- metrics=['accuracy'])
- model.fit(x_train, y_train, epochs=10, validation_split=0.1, callbacks=[ck_callback, tb_callback, lr_callback])
- print(model.evaluate(x_test, y_test))
解决办法如下所示:
- import tensorflow as tf
- from tensorflow.contrib.opt import AdamWOptimizer
- from tensorflow.keras import backend as K
- from tensorflow.python.keras.optimizers import TFOptimizer
- mnist = tf.keras.datasets.mnist
- (x_train, y_train),(x_test, y_test) = mnist.load_data()
- x_train, x_test = x_train / 255.0, x_test / 255.0
- model = tf.keras.models.Sequential([
- tf.keras.layers.Flatten(input_shape=(28, 28)),
- tf.keras.layers.Dense(512, activation=tf.nn.relu),
- tf.keras.layers.Dropout(0.2),
- tf.keras.layers.Dense(10, activation=tf.nn.softmax)
- ])
- # 按照 val_acc 的值来保存模型的参数, val_acc 有提升才保存新的参数
- ck_callback = tf.keras.callbacks.ModelCheckpoint('checkpoints/weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5', monitor='val_acc', mode='max',
- verbose=1, save_best_only=True, save_weights_only=True)
- # 使用 tensorboard 监控训练过程
- tb_callback = tf.keras.callbacks.TensorBoard(log_dir='logs')
- # 在 patience 个 epochs 内, 被监控的 val_loss 都没有下降, 那么就降低 learning rate, 新的值为 lr = factor * lr_old
- lr_callback = tf.keras.callbacks.ReduceLROnPlateau(patience=3)
- learning_rate = 0.001
- learning_rate = K.variable(learning_rate)
- # adam = tf.train.AdamOptimizer()
- # # 在 tensorflow 1.10 版中, TFOptimizer 在 tensorflow.python.keras.optimizers 中可以找到, 而 tensorflow.keras.optimizers 中没有
- # adam = TFOptimizer(adam)
- # adam.lr = learning_rate
- # adam with weight decay
- adamw = AdamWOptimizer(weight_decay=1e-4)
- adamw = TFOptimizer(adamw)
- adamw.lr = learning_rate
- model.compile(optimizer=adamw,
- loss='sparse_categorical_crossentropy',
- metrics=['accuracy'])
- model.fit(x_train, y_train, epochs=10, validation_split=0.1, callbacks=[ck_callback, tb_callback, lr_callback])
- print(model.evaluate(x_test, y_test))
用 TFOptimizer 包裹一层就行了, 这样在使用 tf.keras.callbacks.ReduceLROnPlateau() 时也没有问题了.
在导入 TFOptimizer 时, 注意它所在的位置. 1.10 版本的 tensorflow 导入 keras 就有两种方式 --tensorflow.keras 和 tensorflow.python.keras, 这样其实有点混乱, 而 TFOptimizer 的导入只在后者能找到.(有点神奇... 似乎 1.14 版本 tensorflow 去掉了第一种导入方式, 但 tensorflow 2.0 又有了...)
References
当前训练神经网络最快的方式: AdamW 优化算法 + 超级收敛 -- 机器之心
L2 正则 = Weight Decay? 并不是这样 -- 杨镒铭 https://zhuanlan.zhihu.com/p/40814046
ReduceLROnPlateau with native optimizer: 'TFOptimizer' object has no attribute 'lr' #20619
来源: https://www.cnblogs.com/wuliytTaotao/p/10986952.html