# TensorFlow evaluation metrics and summary statistics ## Evaluation metrics Metrics are used in evaluation to assess the quality of a model. Most are "streaming" ops, meaning they create variables to accumulate a running total, and return an update tensor to update these variables, and a value tensor to read the accumulated value. Example: value, update_op = metrics.streaming_mean_squared_error( predictions, targets, weight) Most metric functions take a pair of tensors, `predictions` and ground truth `targets` (`streaming_mean` is an exception, it takes a single value tensor, usually a loss). It is assumed that the shape of both these tensors is of the form `[batch_size, d1, ... dN]` where `batch_size` is the number of samples in the batch and `d1` ... `dN` are the remaining dimensions. The `weight` parameter can be used to adjust the relative weight of samples within the batch. The result of each loss is a scalar average of all sample losses with non-zero weights. The result is 2 tensors that should be used like the following for each eval run: ```python predictions = ... labels = ... value, update_op = some_metric(predictions, labels) for step_num in range(max_steps): update_op.run() print "evaluation score: ", value.eval() ```