diff options
Diffstat (limited to 'tensorflow/docs_src/tutorials/images/image_recognition.md')
-rw-r--r-- | tensorflow/docs_src/tutorials/images/image_recognition.md | 455 |
1 files changed, 455 insertions, 0 deletions
diff --git a/tensorflow/docs_src/tutorials/images/image_recognition.md b/tensorflow/docs_src/tutorials/images/image_recognition.md new file mode 100644 index 0000000000..d545de73df --- /dev/null +++ b/tensorflow/docs_src/tutorials/images/image_recognition.md @@ -0,0 +1,455 @@ +# Image Recognition + +Our brains make vision seem easy. It doesn't take any effort for humans to +tell apart a lion and a jaguar, read a sign, or recognize a human's face. +But these are actually hard problems to solve with a computer: they only +seem easy because our brains are incredibly good at understanding images. + +In the last few years, the field of machine learning has made tremendous +progress on addressing these difficult problems. In particular, we've +found that a kind of model called a deep +[convolutional neural network](https://colah.github.io/posts/2014-07-Conv-Nets-Modular/) +can achieve reasonable performance on hard visual recognition tasks -- +matching or exceeding human performance in some domains. + +Researchers have demonstrated steady progress +in computer vision by validating their work against +[ImageNet](http://www.image-net.org) -- an academic benchmark for computer vision. +Successive models continue to show improvements, each time achieving +a new state-of-the-art result: +[QuocNet], [AlexNet], [Inception (GoogLeNet)], [BN-Inception-v2]. +Researchers both internal and external to Google have published papers describing all +these models but the results are still hard to reproduce. +We're now taking the next step by releasing code for running image recognition +on our latest model, [Inception-v3]. + +[QuocNet]: https://static.googleusercontent.com/media/research.google.com/en//archive/unsupervised_icml2012.pdf +[AlexNet]: https://www.cs.toronto.edu/~fritz/absps/imagenet.pdf +[Inception (GoogLeNet)]: https://arxiv.org/abs/1409.4842 +[BN-Inception-v2]: https://arxiv.org/abs/1502.03167 +[Inception-v3]: https://arxiv.org/abs/1512.00567 + +Inception-v3 is trained for the [ImageNet] Large Visual Recognition Challenge +using the data from 2012. This is a standard task in computer vision, +where models try to classify entire +images into [1000 classes], like "Zebra", "Dalmatian", and "Dishwasher". +For example, here are the results from [AlexNet] classifying some images: + +<div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;"> +<img style="width:100%" src="https://www.tensorflow.org/images/AlexClassification.png"> +</div> + +To compare models, we examine how often the model fails to predict the +correct answer as one of their top 5 guesses -- termed "top-5 error rate". +[AlexNet] achieved by setting a top-5 error rate of 15.3% on the 2012 +validation data set; [Inception (GoogLeNet)] achieved 6.67%; +[BN-Inception-v2] achieved 4.9%; [Inception-v3] reaches 3.46%. + +> How well do humans do on ImageNet Challenge? There's a [blog post] by +Andrej Karpathy who attempted to measure his own performance. He reached +5.1% top-5 error rate. + +[ImageNet]: http://image-net.org/ +[1000 classes]: http://image-net.org/challenges/LSVRC/2014/browse-synsets +[blog post]: https://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/ + +This tutorial will teach you how to use [Inception-v3]. You'll learn how to +classify images into [1000 classes] in Python or C++. We'll also discuss how to +extract higher level features from this model which may be reused for other +vision tasks. + +We're excited to see what the community will do with this model. + + +##Usage with Python API + +`classify_image.py` downloads the trained model from `tensorflow.org` +when the program is run for the first time. You'll need about 200M of free space +available on your hard disk. + +Start by cloning the [TensorFlow models repo](https://github.com/tensorflow/models) from GitHub. Run the following commands: + + cd models/tutorials/image/imagenet + python classify_image.py + +The above command will classify a supplied image of a panda bear. + +<div style="width:15%; margin:auto; margin-bottom:10px; margin-top:20px;"> + <img style="width:100%" src="https://www.tensorflow.org/images/cropped_panda.jpg"> +</div> + +If the model runs correctly, the script will produce the following output: + + giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493) + indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878) + lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317) + custard apple (score = 0.00149) + earthstar (score = 0.00127) + +If you wish to supply other JPEG images, you may do so by editing +the `--image_file` argument. + +> If you download the model data to a different directory, you +will need to point `--model_dir` to the directory used. + +## Usage with the C++ API + +You can run the same [Inception-v3] model in C++ for use in production +environments. You can download the archive containing the GraphDef that defines +the model like this (running from the root directory of the TensorFlow +repository): + +```bash +curl -L "https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz" | + tar -C tensorflow/examples/label_image/data -xz +``` + +Next, we need to compile the C++ binary that includes the code to load and run the graph. +If you've followed +@{$install_sources$the instructions to download the source installation of TensorFlow} +for your platform, you should be able to build the example by +running this command from your shell terminal: + +```bash +bazel build tensorflow/examples/label_image/... +``` + +That should create a binary executable that you can then run like this: + +```bash +bazel-bin/tensorflow/examples/label_image/label_image +``` + +This uses the default example image that ships with the framework, and should +output something similar to this: + +``` +I tensorflow/examples/label_image/main.cc:206] military uniform (653): 0.834306 +I tensorflow/examples/label_image/main.cc:206] mortarboard (668): 0.0218692 +I tensorflow/examples/label_image/main.cc:206] academic gown (401): 0.0103579 +I tensorflow/examples/label_image/main.cc:206] pickelhaube (716): 0.00800814 +I tensorflow/examples/label_image/main.cc:206] bulletproof vest (466): 0.00535088 +``` +In this case, we're using the default image of +[Admiral Grace Hopper](https://en.wikipedia.org/wiki/Grace_Hopper), and you can +see the network correctly identifies she's wearing a military uniform, with a high +score of 0.8. + + +<div style="width:45%; margin:auto; margin-bottom:10px; margin-top:20px;"> + <img style="width:100%" src="https://www.tensorflow.org/images/grace_hopper.jpg"> +</div> + +Next, try it out on your own images by supplying the --image= argument, e.g. + +```bash +bazel-bin/tensorflow/examples/label_image/label_image --image=my_image.png +``` + +If you look inside the [`tensorflow/examples/label_image/main.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/main.cc) +file, you can find out +how it works. We hope this code will help you integrate TensorFlow into +your own applications, so we will walk step by step through the main functions: + +The command line flags control where the files are loaded from, and properties of the input images. +The model expects to get square 299x299 RGB images, so those are the `input_width` +and `input_height` flags. We also need to scale the pixel values from integers that +are between 0 and 255 to the floating point values that the graph operates on. +We control the scaling with the `input_mean` and `input_std` flags: we first subtract +`input_mean` from each pixel value, then divide it by `input_std`. + +These values probably look somewhat magical, but they are just defined by the +original model author based on what he/she wanted to use as input images for +training. If you have a graph that you've trained yourself, you'll just need +to adjust the values to match whatever you used during your training process. + +You can see how they're applied to an image in the +[`ReadTensorFromImageFile()`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/main.cc#L88) +function. + +```C++ +// Given an image file name, read in the data, try to decode it as an image, +// resize it to the requested size, and then scale the values as desired. +Status ReadTensorFromImageFile(string file_name, const int input_height, + const int input_width, const float input_mean, + const float input_std, + std::vector<Tensor>* out_tensors) { + tensorflow::GraphDefBuilder b; +``` +We start by creating a `GraphDefBuilder`, which is an object we can use to +specify a model to run or load. + +```C++ + string input_name = "file_reader"; + string output_name = "normalized"; + tensorflow::Node* file_reader = + tensorflow::ops::ReadFile(tensorflow::ops::Const(file_name, b.opts()), + b.opts().WithName(input_name)); +``` +We then start creating nodes for the small model we want to run +to load, resize, and scale the pixel values to get the result the main model +expects as its input. The first node we create is just a `Const` op that holds a +tensor with the file name of the image we want to load. That's then passed as the +first input to the `ReadFile` op. You might notice we're passing `b.opts()` as the last +argument to all the op creation functions. The argument ensures that the node is added to +the model definition held in the `GraphDefBuilder`. We also name the `ReadFile` +operator by making the `WithName()` call to `b.opts()`. This gives a name to the node, +which isn't strictly necessary since an automatic name will be assigned if you don't +do this, but it does make debugging a bit easier. + +```C++ + // Now try to figure out what kind of file it is and decode it. + const int wanted_channels = 3; + tensorflow::Node* image_reader; + if (tensorflow::StringPiece(file_name).ends_with(".png")) { + image_reader = tensorflow::ops::DecodePng( + file_reader, + b.opts().WithAttr("channels", wanted_channels).WithName("png_reader")); + } else { + // Assume if it's not a PNG then it must be a JPEG. + image_reader = tensorflow::ops::DecodeJpeg( + file_reader, + b.opts().WithAttr("channels", wanted_channels).WithName("jpeg_reader")); + } + // Now cast the image data to float so we can do normal math on it. + tensorflow::Node* float_caster = tensorflow::ops::Cast( + image_reader, tensorflow::DT_FLOAT, b.opts().WithName("float_caster")); + // The convention for image ops in TensorFlow is that all images are expected + // to be in batches, so that they're four-dimensional arrays with indices of + // [batch, height, width, channel]. Because we only have a single image, we + // have to add a batch dimension of 1 to the start with ExpandDims(). + tensorflow::Node* dims_expander = tensorflow::ops::ExpandDims( + float_caster, tensorflow::ops::Const(0, b.opts()), b.opts()); + // Bilinearly resize the image to fit the required dimensions. + tensorflow::Node* resized = tensorflow::ops::ResizeBilinear( + dims_expander, tensorflow::ops::Const({input_height, input_width}, + b.opts().WithName("size")), + b.opts()); + // Subtract the mean and divide by the scale. + tensorflow::ops::Div( + tensorflow::ops::Sub( + resized, tensorflow::ops::Const({input_mean}, b.opts()), b.opts()), + tensorflow::ops::Const({input_std}, b.opts()), + b.opts().WithName(output_name)); +``` +We then keep adding more nodes, to decode the file data as an image, to cast the +integers into floating point values, to resize it, and then finally to run the +subtraction and division operations on the pixel values. + +```C++ + // This runs the GraphDef network definition that we've just constructed, and + // returns the results in the output tensor. + tensorflow::GraphDef graph; + TF_RETURN_IF_ERROR(b.ToGraphDef(&graph)); +``` +At the end of this we have +a model definition stored in the b variable, which we turn into a full graph +definition with the `ToGraphDef()` function. + +```C++ + std::unique_ptr<tensorflow::Session> session( + tensorflow::NewSession(tensorflow::SessionOptions())); + TF_RETURN_IF_ERROR(session->Create(graph)); + TF_RETURN_IF_ERROR(session->Run({}, {output_name}, {}, out_tensors)); + return Status::OK(); +``` +Then we create a @{tf.Session} +object, which is the interface to actually running the graph, and run it, +specifying which node we want to get the output from, and where to put the +output data. + +This gives us a vector of `Tensor` objects, which in this case we know will only be a +single object long. You can think of a `Tensor` as a multi-dimensional array in this +context, and it holds a 299 pixel high, 299 pixel wide, 3 channel image as float +values. If you have your own image-processing framework in your product already, you +should be able to use that instead, as long as you apply the same transformations +before you feed images into the main graph. + +This is a simple example of creating a small TensorFlow graph dynamically in C++, +but for the pre-trained Inception model we want to load a much larger definition from +a file. You can see how we do that in the `LoadGraph()` function. + +```C++ +// Reads a model graph definition from disk, and creates a session object you +// can use to run it. +Status LoadGraph(string graph_file_name, + std::unique_ptr<tensorflow::Session>* session) { + tensorflow::GraphDef graph_def; + Status load_graph_status = + ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def); + if (!load_graph_status.ok()) { + return tensorflow::errors::NotFound("Failed to load compute graph at '", + graph_file_name, "'"); + } +``` +If you've looked through the image loading code, a lot of the terms should seem familiar. Rather than +using a `GraphDefBuilder` to produce a `GraphDef` object, we load a protobuf file that +directly contains the `GraphDef`. + +```C++ + session->reset(tensorflow::NewSession(tensorflow::SessionOptions())); + Status session_create_status = (*session)->Create(graph_def); + if (!session_create_status.ok()) { + return session_create_status; + } + return Status::OK(); +} +``` +Then we create a Session object from that `GraphDef` and +pass it back to the caller so that they can run it at a later time. + +The `GetTopLabels()` function is a lot like the image loading, except that in this case +we want to take the results of running the main graph, and turn it into a sorted list +of the highest-scoring labels. Just like the image loader, it creates a +`GraphDefBuilder`, adds a couple of nodes to it, and then runs the short graph to get a +pair of output tensors. In this case they represent the sorted scores and index +positions of the highest results. + +```C++ +// Analyzes the output of the Inception graph to retrieve the highest scores and +// their positions in the tensor, which correspond to categories. +Status GetTopLabels(const std::vector<Tensor>& outputs, int how_many_labels, + Tensor* indices, Tensor* scores) { + tensorflow::GraphDefBuilder b; + string output_name = "top_k"; + tensorflow::ops::TopK(tensorflow::ops::Const(outputs[0], b.opts()), + how_many_labels, b.opts().WithName(output_name)); + // This runs the GraphDef network definition that we've just constructed, and + // returns the results in the output tensors. + tensorflow::GraphDef graph; + TF_RETURN_IF_ERROR(b.ToGraphDef(&graph)); + std::unique_ptr<tensorflow::Session> session( + tensorflow::NewSession(tensorflow::SessionOptions())); + TF_RETURN_IF_ERROR(session->Create(graph)); + // The TopK node returns two outputs, the scores and their original indices, + // so we have to append :0 and :1 to specify them both. + std::vector<Tensor> out_tensors; + TF_RETURN_IF_ERROR(session->Run({}, {output_name + ":0", output_name + ":1"}, + {}, &out_tensors)); + *scores = out_tensors[0]; + *indices = out_tensors[1]; + return Status::OK(); +``` +The `PrintTopLabels()` function takes those sorted results, and prints them out in a +friendly way. The `CheckTopLabel()` function is very similar, but just makes sure that +the top label is the one we expect, for debugging purposes. + +At the end, [`main()`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/main.cc#L252) +ties together all of these calls. + +```C++ +int main(int argc, char* argv[]) { + // We need to call this to set up global state for TensorFlow. + tensorflow::port::InitMain(argv[0], &argc, &argv); + Status s = tensorflow::ParseCommandLineFlags(&argc, argv); + if (!s.ok()) { + LOG(ERROR) << "Error parsing command line flags: " << s.ToString(); + return -1; + } + + // First we load and initialize the model. + std::unique_ptr<tensorflow::Session> session; + string graph_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_graph); + Status load_graph_status = LoadGraph(graph_path, &session); + if (!load_graph_status.ok()) { + LOG(ERROR) << load_graph_status; + return -1; + } +``` +We load the main graph. + +```C++ + // Get the image from disk as a float array of numbers, resized and normalized + // to the specifications the main graph expects. + std::vector<Tensor> resized_tensors; + string image_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_image); + Status read_tensor_status = ReadTensorFromImageFile( + image_path, FLAGS_input_height, FLAGS_input_width, FLAGS_input_mean, + FLAGS_input_std, &resized_tensors); + if (!read_tensor_status.ok()) { + LOG(ERROR) << read_tensor_status; + return -1; + } + const Tensor& resized_tensor = resized_tensors[0]; +``` +Load, resize, and process the input image. + +```C++ + // Actually run the image through the model. + std::vector<Tensor> outputs; + Status run_status = session->Run({{FLAGS_input_layer, resized_tensor}}, + {FLAGS_output_layer}, {}, &outputs); + if (!run_status.ok()) { + LOG(ERROR) << "Running model failed: " << run_status; + return -1; + } +``` +Here we run the loaded graph with the image as an input. + +```C++ + // This is for automated testing to make sure we get the expected result with + // the default settings. We know that label 866 (military uniform) should be + // the top label for the Admiral Hopper image. + if (FLAGS_self_test) { + bool expected_matches; + Status check_status = CheckTopLabel(outputs, 866, &expected_matches); + if (!check_status.ok()) { + LOG(ERROR) << "Running check failed: " << check_status; + return -1; + } + if (!expected_matches) { + LOG(ERROR) << "Self-test failed!"; + return -1; + } + } +``` +For testing purposes we can check to make sure we get the output we expect here. + +```C++ + // Do something interesting with the results we've generated. + Status print_status = PrintTopLabels(outputs, FLAGS_labels); +``` +Finally we print the labels we found. + +```C++ + if (!print_status.ok()) { + LOG(ERROR) << "Running print failed: " << print_status; + return -1; + } +``` + +The error handling here is using TensorFlow's `Status` +object, which is very convenient because it lets you know whether any error has +occurred with the `ok()` checker, and then can be printed out to give a readable error +message. + +In this case we are demonstrating object recognition, but you should be able to +use very similar code on other models you've found or trained yourself, across +all +sorts of domains. We hope this small example gives you some ideas on how to use +TensorFlow within your own products. + +> **EXERCISE**: Transfer learning is the idea that, if you know how to solve a task well, you +should be able to transfer some of that understanding to solving related +problems. One way to perform transfer learning is to remove the final +classification layer of the network and extract +the [next-to-last layer of the CNN](https://arxiv.org/abs/1310.1531), in this case a 2048 dimensional vector. + + +## Resources for Learning More + +To learn about neural networks in general, Michael Nielsen's +[free online book](http://neuralnetworksanddeeplearning.com/chap1.html) +is an excellent resource. For convolutional neural networks in particular, +Chris Olah has some +[nice blog posts](https://colah.github.io/posts/2014-07-Conv-Nets-Modular/), +and Michael Nielsen's book has a +[great chapter](http://neuralnetworksanddeeplearning.com/chap6.html) +covering them. + +To find out more about implementing convolutional neural networks, you can jump +to the TensorFlow @{$deep_cnn$deep convolutional networks tutorial}, +or start a bit more gently with our [Estimator MNIST tutorial](../estimators/cnn.md). +Finally, if you want to get up to speed on research in this area, you can +read the recent work of all the papers referenced in this tutorial. + |