本篇是 keras 源码笔记系列的第三篇. 在前两篇中, 我们分析了 keras 对 Tensor 和 Layer 等概念的处理, 并说明了它们是如何作用别弄个构成有向无环图的. 本篇着眼于多层网络模型层面的抽象, 即与用户距离最近的接口, 源代码文件是和 / keras/model.py, 要观察的类是 Model 和 Sequential.
本系列第一篇:[源码笔记] keras 源码分析之 Tensor, Node 和 Layer https://blog.ddlee.cn/posts/4943e1b8/
第二篇:[源码笔记] keras 源码分析之 Container https://blog.ddlee.cn/posts/ba61101c/
Model: 添加了训练信息的 Container
Model.compile() 主要完成了配置 optimizer, loss, metrics 等操作, 而要执行的 fit, evaluate 等则不在 compile 过程中配置.
- def compile(self, optimizer, loss, metrics=None, loss_weights=None,
- sample_weight_mode=None, **kwargs):
- loss = loss or {}
- self.optimizer = optimizers.get(optimizer)
- self.sample_weight_mode = sample_weight_mode
- self.loss = loss
- self.loss_weights = loss_weights
- loss_function = losses.get(loss)
- loss_functions = [loss_function for _ in range(len(self.outputs))]
- self.loss_functions = loss_functions
- # Prepare targets of model.
- self.targets = []
- self._feed_targets = []
- for i in range(len(self.outputs)):
- shape = self.internal_output_shapes[i]
- name = self.output_names[i]
- target = K.placeholder(ndim=len(shape),
- name=name + '_target',
- sparse=K.is_sparse(self.outputs[i]),
- dtype=K.dtype(self.outputs[i]))
- self.targets.append(target)
- self._feed_targets.append(target)
- # Prepare metrics.
- self.metrics = metrics
- self.metrics_names = ['loss']
- self.metrics_tensors = []
- # Compute total loss.
- total_loss = None
- for i in range(len(self.outputs)):
- y_true = self.targets[i]
- y_pred = self.outputs[i]
- loss_weight = loss_weights_list[i]
- if total_loss is None:
- total_loss = loss_weight * output_loss
- else:
- total_loss += loss_weight * output_loss
- for loss_tensor in self.losses:
- total_loss += loss_tensor
- self.total_loss = total_loss
- self.sample_weights = sample_weights
Model 对象的 fit() 方法封装了_fit_loop() 内部方法, 而_fit_loop() 方法的关键步骤由_make_train_function() 方法完成, 返回 history 对象, 用于回调函数的处理.
- def fit(self, x=None, y=None, ...):
- self._make_train_function()
- f = self.train_function
- return self._fit_loop(f, ins, ...)
在_fit_loop() 方法中, 回调函数完成了对训练过程的监控记录等任务, train_function 也被应用于传入的数据:
- def _fit_loop(self, f, ins, out_labels=None, batch_size=32,
- epochs=100, verbose=1, callbacks=None,
- val_f=None, val_ins=None, shuffle=True,
- callback_metrics=None, initial_epoch=0):
- self.history = cbks.History()
- callbacks = [cbks.BaseLogger()] + (callbacks or []) + [self.history]
- callbacks = cbks.CallbackList(callbacks)
- out_labels = out_labels or []
- callbacks.set_model(callback_model)
- callbacks.set_params({
- 'batch_size': batch_size,
- 'epochs': epochs,
- 'samples': num_train_samples,
- 'verbose': verbose,
- 'do_validation': do_validation,
- 'metrics': callback_metrics or [],
- })
- callbacks.on_train_begin()
- callback_model.stop_training = False
- for epoch in range(initial_epoch, epochs):
- callbacks.on_epoch_begin(epoch)
- batches = _make_batches(num_train_samples, batch_size)
- epoch_logs = {}
- for batch_index, (batch_start, batch_end) in enumerate(batches):
- batch_ids = index_array[batch_start:batch_end]
- batch_logs = {}
- batch_logs['batch'] = batch_index
- batch_logs['size'] = len(batch_ids)
- callbacks.on_batch_begin(batch_index, batch_logs)
- # 应用传入的 train_function
- outs = f(ins_batch)
- callbacks.on_batch_end(batch_index, batch_logs)
- callbacks.on_epoch_end(epoch, epoch_logs)
- callbacks.on_train_end()
- return self.history
_make_train_function() 方法从 optimizer 获取要更新的参数信息, 并传入来自 backend 的 function 对象:
- def _make_train_function(self):
- if self.train_function is None:
- inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
- training_updates = self.optimizer.get_updates(
- self._collected_trainable_weights,
- self.constraints,
- self.total_loss)
- updates = self.updates + training_updates
- # Gets loss and metrics. Updates weights at each call.
- self.train_function = K.function(inputs,
- [self.total_loss] + self.metrics_tensors,
- updates=updates,
- name='train_function',
- **self._function_kwargs)
Model 的其他方法 evaluate() 等, 与 fit() 的结构类似.
Sequential: 构建模型的外层接口
Sequential 对象是 Model 对象的进一步封装, 也是用户直接面对的接口, 其 compile(), fit(), predict() 等方法与 Model 几乎一致, 所不同的是添加了 add() 方法, 也是我们用于构建网络的最基本操作.
Sequential.add() 方法的源码如下:
- def add(self, layer):
- # 第一层必须是 InputLayer 对象
- if not self.outputs:
- if not layer.inbound_nodes:
- x = Input(batch_shape=layer.batch_input_shape,
- dtype=layer.dtype, name=layer.name + '_input')
- layer(x)
- self.outputs = [layer.inbound_nodes[0].output_tensors[0]]
- self.inputs = topology.get_source_inputs(self.outputs[0])
- topology.Node(outbound_layer=self, ...)
- else:
- output_tensor = layer(self.outputs[0])
- self.outputs = [output_tensor]
- self.inbound_nodes[0].output_tensors = self.outputs
- self.layers.append(layer)
可以看到, add() 方法总是确保网络的第一层为 InputLayer 对象, 并将新加入的层应用于 outputs, 使之更新. 因此, 从本质上讲, 在 Model 中添加新层还是在更新模型的 outputs.
来源: https://juejin.im/entry/5bcab9cae51d450e4a1c2e55