diff options
Diffstat (limited to 'tensorflow/g3doc/how_tos/using_gpu/index.md')
-rw-r--r-- | tensorflow/g3doc/how_tos/using_gpu/index.md | 174 |
1 files changed, 174 insertions, 0 deletions
diff --git a/tensorflow/g3doc/how_tos/using_gpu/index.md b/tensorflow/g3doc/how_tos/using_gpu/index.md new file mode 100644 index 0000000000..c0bdc5a7cb --- /dev/null +++ b/tensorflow/g3doc/how_tos/using_gpu/index.md @@ -0,0 +1,174 @@ +# Using GPUs + +## Supported devices + +On a typical system, there are multiple computing devices. In TensorFlow, the +supported device types are `CPU` and `GPU`. They are represented as +`strings`. For example: + +* `"/cpu:0"`: The CPU of your machine. +* `"/gpu:0"`: The GPU of your machine, if you have one. +* `"/gpu:1"`: The second GPU of your machine, etc. + +If a TensorFlow operation has both CPU and GPU implementations, the +GPU devices will be given priority when the operation is assigned to +a device. For example, `matmul` has both CPU and GPU kernels. On a +system with devices `cpu:0` and `gpu:0`, `gpu:0` will be selected to run +`matmul`. + +## Logging Device placement + +To find out which devices your operations and tensors are assigned to, create +the session with `log_device_placement` configuration option set to `True`. + +```python +# Creates a graph. +a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') +b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') +c = tf.matmul(a, b) +# Creates a session with log_device_placement set to True. +sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) +# Runs the op. +print sess.run(c) +``` + +You should see the following output: + +``` +Device mapping: +/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus +id: 0000:05:00.0 +b: /job:localhost/replica:0/task:0/gpu:0 +a: /job:localhost/replica:0/task:0/gpu:0 +MatMul: /job:localhost/replica:0/task:0/gpu:0 +[[ 22. 28.] + [ 49. 64.]] + +``` + +## Manual device placement + +If you would like a particular operation to run on a device of your +choice instead of what's automatically selected for you, you can use +`with tf.device` to create a device context such that all the operations +within that context will have the same device assignment. + +```python +# Creates a graph. +with tf.device('/cpu:0'): + a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') + b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') +c = tf.matmul(a, b) +# Creates a session with log_device_placement set to True. +sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) +# Runs the op. +print sess.run(c) +``` + +You will see that now `a` and `b` are assigned to `cpu:0`. + +``` +Device mapping: +/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus +id: 0000:05:00.0 +b: /job:localhost/replica:0/task:0/cpu:0 +a: /job:localhost/replica:0/task:0/cpu:0 +MatMul: /job:localhost/replica:0/task:0/gpu:0 +[[ 22. 28.] + [ 49. 64.]] +``` + +## Using a single GPU on a multi-GPU system + +If you have more than one GPU in your system, the GPU with the lowest ID will be +selected by default. If you would like to run on a different GPU, you will need +to specify the preference explicitly: + +```python +# Creates a graph. +with tf.device('/gpu:2'): + a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') + b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') + c = tf.matmul(a, b) +# Creates a session with log_device_placement set to True. +sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) +# Runs the op. +print sess.run(c) +``` + +If the device you have specified does not exist, you will get +`InvalidArgumentError`: + +``` +InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b': +Could not satisfy explicit device specification '/gpu:2' + [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2] + values: 1 2 3...>, _device="/gpu:2"]()]] +``` + +If you would like TensorFlow to automatically choose an existing and +supported device to run the operations in case the specified one doesn't +exist, you can set `allow_soft_placement` to `True` in the configuration +option when creating the session. + +```python +# Creates a graph. +with tf.device('/gpu:2'): + a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') + b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') + c = tf.matmul(a, b) +# Creates a session with allow_soft_placement and log_device_placement set +# to True. +sess = tf.Session(config=tf.ConfigProto( + allow_soft_placement=True, log_device_placement=True)) +# Runs the op. +print sess.run(c) +``` + +## Using multiple GPUs + +If you would like to run TensorFlow on multiple GPUs, you can construct your +model in a multi-tower fashion where each tower is assigned to a different GPU. +For example: + +``` +# Creates a graph. +c = [] +for d in ['/gpu:2', '/gpu:3']: + with tf.device(d): + a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3]) + b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2]) + c.append(tf.matmul(a, b)) +with tf.device('/cpu:0'): + sum = tf.add_n(c) +# Creates a session with log_device_placement set to True. +sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) +# Runs the op. +print sess.run(sum) +``` + +You will see the following output. + +``` +Device mapping: +/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus +id: 0000:02:00.0 +/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K20m, pci bus +id: 0000:03:00.0 +/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K20m, pci bus +id: 0000:83:00.0 +/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K20m, pci bus +id: 0000:84:00.0 +Const_3: /job:localhost/replica:0/task:0/gpu:3 +Const_2: /job:localhost/replica:0/task:0/gpu:3 +MatMul_1: /job:localhost/replica:0/task:0/gpu:3 +Const_1: /job:localhost/replica:0/task:0/gpu:2 +Const: /job:localhost/replica:0/task:0/gpu:2 +MatMul: /job:localhost/replica:0/task:0/gpu:2 +AddN: /job:localhost/replica:0/task:0/cpu:0 +[[ 44. 56.] + [ 98. 128.]] +``` + +The [cifar10 tutorial](../../tutorials/deep_cnn/index.md) is a good example +demonstrating how to do training with multiple GPUs. |