aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/core/common_runtime/gpu/process_state.cc
Commit message (Collapse)AuthorAge
* Refactor ProcessState in support of NUMA.Gravatar A. Unique TensorFlower2018-07-02
| | | | | | | | | | | | | | | | | | | ProcessState is a singleton that anchors per-process resources. Up until now that meant only GPU-related memory allocators since CPU allocation was usually done directly from Allocator::cpu_allocator. Accordingly process_state.h was in common_runtime/gpu and ProcesState was only used in GPU builds. With the upcoming introduction of NUMA node specific CPU allocators it will be important that most of the TF runtime switch to requesting the proper NUMA-specific CPU allocator. These allocators will be owned by and obtained from the ProcessState singleton which will exist in all builds. The GPU-specific functions are moved to a new GPUProcessState, also a singleton. PoolAllocator is also migrated out of common_rumntime/gpu into common_runtime. PiperOrigin-RevId: 203002666
* Fail gracefully with a helpful error message when provided with conflictingGravatar Asim Shankar2018-05-14
| | | | | | | | | | | | visible_devices_list. See #19083 See #18861 More generally, this change avoids assertion failures (that will bring the whole process down) on a few code-paths that can be triggerred by user input. PiperOrigin-RevId: 196572013
* Stop using gpu:: as an alias for stream_executor::.Gravatar Justin Lebar2018-04-25
| | | | | | Also do a few related namespace cleanups. PiperOrigin-RevId: 194252437
* Add a test only method to reset ProcessState.Gravatar Guangda Lai2018-02-22
| | | | PiperOrigin-RevId: 186647005
* Split gpu_id.h and GpuIdManager out from build target ↵Gravatar Guangda Lai2018-02-09
| | | | | | //tensorflow/core:gpu_runtime, to reduce the size of dependencies, so when other lightweight libraries like grappler utils needs the TfToCudaGpuId translation function it doesn't need to depend on things like stream executor and cuda libraries. PiperOrigin-RevId: 185175757
* Cleanup: Ran clang-format on files in tensorflow/core/.../*.{cc,h}.Gravatar A. Unique TensorFlower2018-01-30
| | | | PiperOrigin-RevId: 183848459
* Performance improvements to some GPU code to use shared locks instead of ↵Gravatar Rohan Jain2018-01-26
| | | | | | unique locks for some hotspot cases. PiperOrigin-RevId: 183418559
* Remove a few unused constructions and simplfy some codeGravatar A. Unique TensorFlower2017-12-22
| | | | PiperOrigin-RevId: 179978470
* Added virtual gpu support.Gravatar Guangda Lai2017-12-18
| | | | PiperOrigin-RevId: 179504116
* Add simple cudaMalloc based allocator useful for memory debugging with existingGravatar A. Unique TensorFlower2017-08-29
| | | | | | cuda toolchain PiperOrigin-RevId: 166911293
* Merge changes from github.Gravatar A. Unique TensorFlower2017-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | END_PUBLIC --- Commit 9f81374c3 authored by raymondxyang<zihao.yang@microsoft.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Add option for build more python tests in Cmake (#11853) * Ignore Windows built project * Fix deprecated methods in tf.contrib.python * Fix regex match for Windows build in contrib.keras * Fix Regex match for Windows build in session_bundle * * Fix deprecated methods * Fix regex match for Windows * Fix compatibility issue with Python 3.x * Add missing ops into Windows build for test * Enabled more testcases for Windows build * Clean code and fix typo * Add conditional cmake mode for enabling more unit testcase * Add Cmake mode for major Contrib packages * Add supplementary info in RAEDME for new cmake option * * Update tf_tests after testing with TF 1.3 * Clean code and resolve conflicts * Fix unsafe regex matches and format code * Update exclude list after testing with latest master branch * Fix missing module --- Commit 98f0e1efe authored by Yong Tang<yong.tang.github@outlook.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Dynamic ksize and strides with MaxPool (#11875) * Dynamic ksize with max_pool This fix tries to fix the issue raised in 4746 where ksize is static (attr) with max_pool. This fix changes ksize to input tensor so that it is dynamic now. This fix fixes 4746. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add dynamic ksize to MaxPoolGrad and MaxPoolGradGrad Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for max_pool_v2 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix GPU Jenkins issue. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Enable MaxPoolV2 in GPU Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Hide MaxPoolV2 and other fixes. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 02d6bc185 authored by Bairen Yi<byronyi@users.noreply.github.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: remove useless variable (#12212) --- Commit ed6b0d905 authored by namrata-ibm<bhavenamrata@gmail.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Adding support for s390x in calculation of cpu_frequency (#12201) --- Commit 627dfc9dd authored by Taehoon Lee<taehoonlee@snu.ac.kr> Committed by Taehoon Lee<taehoonlee@snu.ac.kr>: Fix typos --- Commit c0f9b0a91 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: In fast-math mode emit a tanh that has a faster min/max. PiperOrigin-RevId: 164943597 --- Commit 87605f3d6 authored by Kay Zhu<kayzhu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Use HloEvaluator for ComputeConstant, remove the need of a dedicated compute constant backend. PiperOrigin-RevId: 164940970 --- Commit 881de45c2 authored by Taehoon Lee<me@taehoonlee.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Add bool type supports for GPU kernels (#11927) * Add bool type supports for GPU kernels * Add bool type test codes for GPU kernels --- Commit eeacdcdb1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add missing "CPU" suffix in registrations. PiperOrigin-RevId: 164939527 --- Commit de01be952 authored by namrata-ibm<bhavenamrata@gmail.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Adding support for Big Endian in graph_constructor_test and wav_io (#12179) --- Commit 26719d29f authored by QingYing Chen<pkudysj@126.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Implement CRF decode (Viterbi decode) for tensor (#12056) * Implement CRF decoding for tensors * add test code for tensor version's CRF decoding * made modifications according to pylint * add some comments for crf decode * remove useless code * add comments at the top comment of crf module and add more comments in crf_test * capitalize first char of first word in comments * replace crf_decode test code with a deterministic example --- Commit f9a81ca2f authored by Pete Warden<pete@petewarden.com> Committed by gunan<gunan@google.com>: Create CI build script for Raspberry Pi (#12190) * Create CI build script for Raspberry Pi * Moved location of Pi build script --- Commit e2a163a90 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Merge code from PR #11940 with internal changes from cl/164796436, and update Python tests to also run on GPU. PiperOrigin-RevId: 164929133 --- Commit 08bbfa187 authored by Taehoon Lee<me@taehoonlee.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Fix typos (#12195) --- Commit ab96f41fb authored by Luke Iwanski<luke@codeplay.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: [OpenCL] Extends matmul_benchmark.py to cover SYCL (#11697) * [OpenCL] Extends matmul_benchmark.py to cover SYCL * Fixed typo * /gpu:0 -> /device:GPU:0 * Fixes control_flow_ops_py_test * /gpu: -> /device:GPU: * Fixes //tensorflow/python/profiler/internal:run_metadata_test * gpu: -> GPU: * Fixes tfprof_node * [OpenCL] Fixes device path to name with many colons (#123) The device path is constructed from a device name by replacing all colons with underscores. Some device names contain more than one colon, for example 'device:SYCL:0' which gives a path 'device_SYCL_0'. The previous code would not convert this back to the original device name, but rather to 'device:SYCL_0'. An alternative fix would be to convert all underscores to colons in the device name (i.e. remove the restriction inside `replace("_", ":", 1)`), however I'm not sure if there are any device names which contain underscores. * If no gpu device aviable fake one * gpu: -> device:GPU * Fixes profiler test * /gpu:x -> /device:GPU:x * Fixes debug_io_utils_test.cc test * Fixes device_name_utils_test.cc --- Commit 35e7a3665 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Remove unneeded casting of int64 for reverse_sequence (#12192) This fix remove unneeded cast of int64 for reverse_sequence: ``` lengths = math_ops.to_int64(lengths) ``` as int32 has already been enabled for reverse_sequence. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 9fba8c185 authored by Anna R<annarev@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add benchmark dashboard link to benchmarks doc. Also, I added a link and description for Benchmarks page to Community index page. PiperOrigin-RevId: 164924906 --- Commit bb6f32fa7 authored by Mark Heffernan<meheff@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make HloAliasAnalysis updatable after changes to the HLO graph. As part of this change make HloAliasAnalysis a thinner layer which basically only holds a map from HloValue to HloBuffer and vice versa. PiperOrigin-RevId: 164923041 --- Commit 9103096c1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by Thomas K?ppe<tkoeppe@google.com>: Merged commit includes the following changes: 164923041 by meheff: Make HloAliasAnalysis updatable after changes to the HLO graph. As part of this change make HloAliasAnalysis a thinner layer which basically only holds a map from HloValue to HloBuffer and vice versa. -- PiperOrigin-RevId: 164923041 --- Commit 822603aed authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Merging sibling fusion instruction using multi_output_fusion PiperOrigin-RevId: 164920220 --- Commit c035aa2a8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 164917891 --- Commit e1e81d9ba authored by Luke Iwanski<luke@codeplay.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: [OpenCL] Fixes double memcpy bug (#151) (#12173) * [OpenCL] Fixes double memcpy bug (#151) As the debg CopyOp is called on a Tensor without type, we need to use the DataType enum to get type information, and use this to pass the type on to Eigen. This is a workaround Eigen's need to have a type when calling memcpy. If the Eigen memcpy can be provided without a type requirement, then the memcpy in sycl_util is unnecessary. * Acts on feedback from: #12173/files/32cb12a9001b672425867b5a3110fd98e737a20b#r132496277 --- Commit d9ca2d86d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change PiperOrigin-RevId: 164916465 --- Commit b8d13d218 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove more parts of DCASGD missed in the first pass. (47949b) PiperOrigin-RevId: 164914552 --- Commit 73b3d52c7 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: cmake fix PiperOrigin-RevId: 164911656 --- Commit 2173b5b0a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Allow TFE_TensorHandleCopyToDevice to have the same device as src and destination. It will reuse the same underlying buffer in those cases. PiperOrigin-RevId: 164909906 --- Commit 13eb3b90e authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Experimental C and Python APIs to invoke TensorFlow kernels on concrete values. PiperOrigin-RevId: 164902588 --- Commit 7dfabcc01 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Initialize ExecutionOptions in ComputeConstant to default values. PiperOrigin-RevId: 164894867 --- Commit c8897e9bc authored by Benoit Steiner<bsteiner@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Static required time computation PiperOrigin-RevId: 164894645 --- Commit 076158f9b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Enable implicit->explicit conversion by default. PiperOrigin-RevId: 164890915 --- Commit 58c4a4cb1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Bugfix: number of input channels is not necessarily in the last dimension, after introduction of data_format param. PiperOrigin-RevId: 164889729 --- Commit 8f9b1af8a authored by Igor Saprykin<isaprykin@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Recover MonitoredSession when the Coordinator is requested to stop with one of the _PREEMPTION_ERRORS. When SyncReplicasOptimizer is used, a preemption in the Coordinator may result in two cases: Case 1) the session gets silently marked as complete Case 2) the session gets stuck This CL aims to solve and verify solutions for both of these problems. Fix 1 changes the should_stop logic. Fix 2 changes the CoordinatedSession.run() logic. SyncReplicasOptimizer runs a separate set of threads using a Coordinator instance. Those threads do FIFOQueue.enqueue; the main thread does a blocking FIFOQueue.dequeue. `sync_token_q` FIFOQueue is on parameter-servers. When one of the PS instances gets preempted, an AbortedError causes the Coordinator to stop via request_stop(ex). That by itself changes the state of MonitoredSession.should_stop() to True (Fix 1). Results of the blocking Dequeue operation are sent to the chief worker via Recv. What happens next depends on the amount of tokens in `sync_token_q`. If there are enough for the next call to Dequeue to return, then the low-level "tf session run() call" returns. The next iteration of the `while not MonitoredSession.should_stop()` loop decides that the training is complete (Case 1). If there are not enough tokens in `sync_token_q`, then the blocking Dequeue is going to keep waiting for them. This results in the graph execution getting stuck and the whole session getting garbage collected after 10 minutes (Case 2). We decided to fix that by re-creating a session after it gets garbage collected (Fix 2). An alternative was to try to cancel the pending Dequeue operation, but it's not clear that it is the right thing to do and it is also not easy. PiperOrigin-RevId: 164888390 --- Commit 46e4de6e5 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Undo loop fusion changes for now as they seem to be altering a few results. END_PUBLIC RELNOTES: n/a BEGIN_PUBLIC BEGIN_PUBLIC Automated g4 rollback of changelist 164825735 PiperOrigin-RevId: 165340331
* Performance-related tweaks: Don't copy loop variables; remove ineffective ↵Gravatar A. Unique TensorFlower2017-06-05
| | | | | | std::move casts. PiperOrigin-RevId: 158017670
* Add env variable to use BFCAllocator as a CPU allocator in ProcessStateGravatar A. Unique TensorFlower2017-05-12
| | | | PiperOrigin-RevId: 155912848
* Fix all 64/32 bit warning in core/common_runtime.Gravatar Suharsh Sivakumar2017-04-04
| | | | Change: 152141388
* Switch GetCUDAHostAllocator() return the standard CPU allocator whenGravatar Vijay Vasudevan2017-01-06
| | | | | | | | | | | | | | | | | | | there are no GPUS, instead of using PoolAllocator. I believe this allocator was being used in Non-GPU situations in the following conditions: 1) GPUCompatibleCPUDevice is linked in. 2) Its GetAllocator() function returns GetCUDAHostAllocator(0) if AllocationAttr.gpu_compatible() is true. 3) No GPU device is available in the process This changes the code so that, under these legit, non-GPU situations, we use the normal CPU allocator. Also remove the dead code since we use the BFCAllocator for CudaHostMemory now. Change: 143823507
* Merge changes from github.Gravatar Xiaoqiang Zheng2016-10-28
| | | | Change: 137532946
* Replace enum BusAdjacency with protobuf DeviceLocality for describing theGravatar A. Unique TensorFlower2016-10-19
| | | | | topological neighborhood of a device. Change: 136663586
* Adding an environmental variable to control the cuda host memory limit.Gravatar Xiaoqiang Zheng2016-10-12
| | | | Change: 135978018
* Make CUDA host allocator fetch one of the available stream executorsGravatar Vijay Vasudevan2016-09-02
| | | | | | | in the process, not the 0th one, which may not be visible to the process. Fixes #1888 (for real this time?) Change: 132128469
* Update copyright for 3p/tf/core.Gravatar A. Unique TensorFlower2016-06-02
| | | | Change: 123900938
* Change Cuda pinned memory allocation to BFC allocator.Gravatar Xiaoqiang Zheng2016-03-18
| | | | | Move the GPU-neutral code to common_runtime. Change: 117591254
* TensorFlow: allow growth in the GPU BFC allocator. This allowsGravatar Vijay Vasudevan2016-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | an option to start the BFC allocator small and grow it over time as needed. This can lead to increased fragmentation, but the benefit is that only as much memory as "needed" is reserved. This option defaults to off, but can be turned on by passing an option to the first Session. This is done by adding one more layer of indirection between mapping a ChunkHandle to a pointer by introducing the concept of AllocationRegions, which are contiguous memory regions that mimic the previous implementation in their indexing (constant time indexing within an AllocationRegion). The drawback is that we must introduce one more lookup to find out which allocation region a pointer is a part of. This implementation uses a sorted vector and upper_bound to do a binary search based on end_ptr. Its impact is relatively low based on the microbenchmarks below, and if it were a cause for later concern, we can try to map the 'page tables' of multiple regions into one very large AllocationRegion, and hope that there are no holes between address spaces so that the ChunkHandle map is not too large for memory. That being said, this change appears to not slow down the ptb_word_lm benchmark, which was initial impetus for most of the recent changes to this class, so this appears safe. Microbenchmarks I had ran showed no real difference, even when there were multiple regions, and ptb_word_lm benchmark also didn't change. The following numbers bear this out: At HEAD: (consumes 5.8GiB on my Titan Black) Epoch: 1 Learning rate: 1.000 0.004 perplexity: 6119.287 speed: 679 wps 0.104 perplexity: 849.526 speed: 5743 wps 0.204 perplexity: 629.677 speed: 6935 wps 0.304 perplexity: 509.189 speed: 7461 wps 0.404 perplexity: 438.585 speed: 7760 wps 0.504 perplexity: 392.459 speed: 7953 wps 0.604 perplexity: 352.998 speed: 8081 wps 0.703 perplexity: 325.909 speed: 8182 wps 0.803 perplexity: 304.531 speed: 8261 wps 0.903 perplexity: 284.988 speed: 8322 wps Epoch: 1 Train Perplexity: 270.398 Epoch: 1 Valid Perplexity: 178.860 Epoch: 2 Learning rate: 1.000 0.004 perplexity: 212.458 speed: 8836 wps 0.104 perplexity: 151.131 speed: 9039 wps 0.204 perplexity: 158.768 speed: 8950 wps 0.304 perplexity: 153.650 speed: 8925 wps 0.404 perplexity: 150.586 speed: 8910 wps 0.504 perplexity: 148.136 speed: 8817 wps 0.604 perplexity: 143.511 speed: 8778 wps 0.703 perplexity: 141.382 speed: 8773 wps 0.803 perplexity: 139.401 speed: 8775 wps 0.903 perplexity: 135.706 speed: 8777 wps Epoch: 2 Train Perplexity: 133.618 Epoch: 2 Valid Perplexity: 143.462 Epoch: 3 Learning rate: 1.000 0.004 perplexity: 146.292 speed: 8947 wps 0.104 perplexity: 104.901 speed: 9325 wps 0.204 perplexity: 114.335 speed: 9108 wps 0.304 perplexity: 111.434 speed: 9046 wps 0.404 perplexity: 110.328 speed: 9014 wps 0.504 perplexity: 109.455 speed: 8995 wps 0.604 perplexity: 106.877 speed: 8984 wps 0.703 perplexity: 106.158 speed: 8978 wps 0.803 perplexity: 105.532 speed: 8966 wps 0.903 perplexity: 103.284 speed: 8965 wps Epoch: 3 Train Perplexity: 102.326 Epoch: 3 Valid Perplexity: 132.332 Epoch: 4 Learning rate: 1.000 0.004 perplexity: 116.748 speed: 8990 wps 0.104 perplexity: 85.032 speed: 9172 wps 0.204 perplexity: 93.827 speed: 9051 wps 0.304 perplexity: 91.716 speed: 9010 wps 0.404 perplexity: 91.088 speed: 8966 wps 0.504 perplexity: 90.654 speed: 8955 wps 0.604 perplexity: 88.841 speed: 8952 wps 0.703 perplexity: 88.550 speed: 8943 wps 0.803 perplexity: 88.268 speed: 8932 wps 0.903 perplexity: 86.610 speed: 8924 wps Epoch: 4 Train Perplexity: 86.030 Epoch: 4 Valid Perplexity: 127.415 Epoch: 5 Learning rate: 1.000 0.004 perplexity: 98.907 speed: 8952 wps 0.104 perplexity: 73.707 speed: 9238 wps 0.204 perplexity: 81.525 speed: 9112 wps 0.304 perplexity: 79.768 speed: 9074 wps 0.404 perplexity: 79.366 speed: 9060 wps 0.504 perplexity: 79.199 speed: 9039 wps 0.604 perplexity: 77.728 speed: 9037 wps 0.703 perplexity: 77.630 speed: 9037 wps 0.803 perplexity: 77.596 speed: 9033 wps 0.903 perplexity: 76.270 speed: 9005 wps Epoch: 5 Train Perplexity: 75.907 Epoch: 5 Valid Perplexity: 126.183 Epoch: 6 Learning rate: 0.500 0.004 perplexity: 88.458 speed: 8816 wps 0.104 perplexity: 64.231 speed: 9143 wps 0.204 perplexity: 69.896 speed: 9050 wps 0.304 perplexity: 67.342 speed: 9016 wps 0.404 perplexity: 66.162 speed: 8989 wps 0.504 perplexity: 65.290 speed: 8952 wps 0.604 perplexity: 63.331 speed: 8945 wps 0.703 perplexity: 62.617 speed: 8942 wps 0.803 perplexity: 61.883 speed: 8943 wps 0.903 perplexity: 60.149 speed: 8934 wps Epoch: 6 Train Perplexity: 59.222 Epoch: 6 Valid Perplexity: 119.635 Epoch: 7 Learning rate: 0.250 0.004 perplexity: 73.009 speed: 8941 wps 0.104 perplexity: 53.369 speed: 9241 wps 0.204 perplexity: 58.193 speed: 9115 wps 0.304 perplexity: 55.957 speed: 9091 wps 0.404 perplexity: 54.885 speed: 9073 wps 0.504 perplexity: 54.052 speed: 9059 wps 0.604 perplexity: 52.298 speed: 9053 wps 0.703 perplexity: 51.598 speed: 9036 wps 0.803 perplexity: 50.858 speed: 9024 wps With this change: (Consumes 700MiB on my TitanBlack) Epoch: 1 Learning rate: 1.000 0.004 perplexity: 6220.805 speed: 649 wps 0.104 perplexity: 847.498 speed: 5631 wps 0.204 perplexity: 628.919 speed: 6853 wps 0.304 perplexity: 506.395 speed: 7391 wps 0.404 perplexity: 435.559 speed: 7675 wps 0.504 perplexity: 389.903 speed: 7883 wps 0.604 perplexity: 351.013 speed: 8033 wps 0.703 perplexity: 324.474 speed: 8144 wps 0.803 perplexity: 303.551 speed: 8230 wps 0.903 perplexity: 284.267 speed: 8300 wps Epoch: 1 Train Perplexity: 269.826 Epoch: 1 Valid Perplexity: 178.575 Epoch: 2 Learning rate: 1.000 0.004 perplexity: 214.660 speed: 8880 wps 0.104 perplexity: 152.258 speed: 9222 wps 0.204 perplexity: 159.331 speed: 9072 wps 0.304 perplexity: 154.358 speed: 9036 wps 0.404 perplexity: 151.455 speed: 9019 wps 0.504 perplexity: 148.906 speed: 9008 wps 0.604 perplexity: 144.203 speed: 8990 wps 0.703 perplexity: 142.134 speed: 8979 wps 0.803 perplexity: 140.096 speed: 8971 wps 0.903 perplexity: 136.424 speed: 8968 wps Epoch: 2 Train Perplexity: 134.372 Epoch: 2 Valid Perplexity: 144.896 Epoch: 3 Learning rate: 1.000 0.004 perplexity: 146.571 speed: 9008 wps 0.104 perplexity: 105.991 speed: 9277 wps 0.204 perplexity: 114.965 speed: 9151 wps 0.304 perplexity: 112.041 speed: 9101 wps 0.404 perplexity: 110.948 speed: 9057 wps 0.504 perplexity: 110.141 speed: 9050 wps 0.604 perplexity: 107.539 speed: 9043 wps 0.703 perplexity: 106.877 speed: 9040 wps 0.803 perplexity: 106.181 speed: 9040 wps 0.903 perplexity: 103.940 speed: 9025 wps Epoch: 3 Train Perplexity: 103.023 Epoch: 3 Valid Perplexity: 132.966 Epoch: 4 Learning rate: 1.000 0.004 perplexity: 117.296 speed: 8990 wps 0.104 perplexity: 85.532 speed: 9764 wps 0.204 perplexity: 94.076 speed: 9784 wps 0.304 perplexity: 91.875 speed: 9773 wps 0.404 perplexity: 91.423 speed: 9689 wps 0.504 perplexity: 91.090 speed: 9546 wps 0.604 perplexity: 89.244 speed: 9460 wps 0.703 perplexity: 89.004 speed: 9399 wps 0.803 perplexity: 88.732 speed: 9352 wps 0.903 perplexity: 87.097 speed: 9312 wps Epoch: 4 Train Perplexity: 86.571 Epoch: 4 Valid Perplexity: 128.440 Epoch: 5 Learning rate: 1.000 0.004 perplexity: 100.152 speed: 8973 wps 0.104 perplexity: 74.050 speed: 9271 wps 0.204 perplexity: 81.658 speed: 9157 wps 0.304 perplexity: 79.822 speed: 9115 wps 0.404 perplexity: 79.594 speed: 9061 wps 0.504 perplexity: 79.486 speed: 9020 wps 0.604 perplexity: 78.066 speed: 8990 wps 0.703 perplexity: 78.046 speed: 8974 wps 0.803 perplexity: 77.968 speed: 8963 wps 0.903 perplexity: 76.702 speed: 8946 wps Epoch: 5 Train Perplexity: 76.386 Epoch: 5 Valid Perplexity: 127.245 Change: 117032081
* Add optional comprehensive logging of memory allocation/deallocation events. ↵Gravatar A. Unique TensorFlower2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When enabled, the following events are recorded: The start of a step, with the numerical step_id and a textual handle describing the step. A Tensor allocation, including the step_id, the name of the OpKernel, the data type, shape, allocation size, allocation_id, data pointer location, and allocator used (the allocation_id is local to an allocator). A Tensor deallocation, including the allocation_id and allocator used. A raw memory allocation, including the step_id, the name of the component (e.g. Eigen), the number of bytes, data pointer location, allocation_id and allocator used. A raw memory deallocation, including the step_id, the name of the component (e.g. Eigen), allocation_id and allocator used. For now many Tensor allocations show 'unknown' for the kernel and step_id. These mostly come from Tensors allocated by the system from protocol buffers, and Tensors allocated by Ops using the Tensor constructor directly instead of calling OpKernelContext::allocate_temp. The latter can in principle be cleaned up one by one as necessary. The former would require some plumbing to associate an allocation with the appropriate step_id. With this CL memory logging is enabled by raising the VLOG level to 1. Once there is an ability to set process-wide options programmatically it would make sense to update the machinery to do that. Currently recorded events are logged as INFO, and they can all be retrieved by filtering the log for lines including __LOG_MEMORY__. Some example lines are as follows: I0301 13:38:55.797563 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown (from Proto)" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 4 allocator_name: "cuda_host" allocation_id: 2 has_single_reference: true ptr: 8717861408 } } } I0301 13:38:55.802245 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 256 allocator_name: "gpu_bfc" allocation_id: 1 has_single_reference: true ptr: 47378989056 } } } I0301 13:38:55.802347 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 2 allocator_name: "cuda_host" } [...] I0301 13:38:55.806454 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogStep { step_id: 1 handle: "->/init;0" } I0301 13:38:55.806659 81220 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "random_normal/shape" tensor { dtype: DT_INT32 shape { dim { size: 4 } } allocation_description { requested_bytes: 16 allocated_bytes: 16 allocator_name: "cuda_host" allocation_id: 1 ptr: 8717860896 } } } [...] I0301 13:38:56.362898 81218 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: 1 kernel_name: "conv1/truncated_normal" tensor { dtype: DT_FLOAT shape { dim { size: 11 } dim { size: 11 } dim { size: 3 } dim { size: 96 } } allocation_description { requested_bytes: 139392 allocated_bytes: 139520 allocator_name: "gpu_bfc" allocation_id: 36 has_single_reference: true ptr: 47379030016 } } } I0301 13:38:56.362894 81217 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 24 allocator_name: "gpu_bfc" } I0301 13:38:56.362903 81213 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "conv5/truncated_normal/mul" tensor { dtype: DT_FLOAT shape { dim { size: 3 } dim { size: 3 } dim { size: 1024 } dim { size: 1024 } } allocation_description { requested_bytes: 37748736 allocated_bytes: 37748736 allocator_name: "gpu_bfc" allocation_id: 34 ptr: 48512711168 } } } [...] I0229 16:39:57.482980 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawAllocation { step_id: 13 operation: "xentropy/EigenAllocator" num_bytes: 64 ptr: 47386857472 allocation_id: 625 allocator_name: "gpu_bfc" } I0229 16:39:57.483147 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" deferred: true } I0229 16:39:57.483197 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" } Change: 116065112
* Removes GPURegionAllocator and turns BFC allocator the default.Gravatar A. Unique TensorFlower2016-02-16
| | | | Change: 114828490
* * Change gpu_count to gpu_device_enabled.Gravatar Xiaoqiang Zheng2016-02-10
| | | | | | | | | * Switch to have the BaseGPUDevice constructor to enable GPU support, instead of relying on the entry point. This make it possible for TensorFlow to use pinned memory for GPU/CPU memory copies for all entry points. Change: 114364920
* Global search & replace to move to the new location forGravatar Josh Levenberg2016-01-26
| | | | | tensorflow/core/ files and build targets. Change: 113080048
* Running our linter on a lot of files.Gravatar Vijay Vasudevan2016-01-24
| | | | Change: 112920860
* Move #include <vector> out of port.h to users of std::vector<>.Gravatar Josh Levenberg2016-01-21
| | | | | After this we can replace port.h with types.h. Change: 112727463
* Many tensorflow/core build clean ups.Gravatar Josh Levenberg2016-01-20
| | | | Change: 112523833
* Replacing reference 'names' variable with 'example_names' variable.Gravatar A. Unique TensorFlower2016-01-20
| | | | Change: 112481326
* TensorFlow: Get rid of legacy command line flags use in TensorFlow.Gravatar Vijay Vasudevan2016-01-13
| | | | Change: 112105282
* #include "tensorflow/core/platform/mutex.h"Gravatar Josh Levenberg2016-01-07
| | | | | directly so we can drop it from port.h. Change: 111613643
* Change: 110286830Gravatar A. Unique TensorFlower2015-12-15
|
* TensorFlow: Improve performance of AlexnetGravatar Manjunath Kudlur2015-11-20
| | | | | | | | | | | | | | | | | | | | | | Changes: * error message that refers to removed `DefaultSession` method. * -Wnull-conversion warnings * the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set. * typo in tutorial data download progress message. * a typo ("however their installing"=>"however installing"). * typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website. * a typo ("subtact"=>"subtract"). * protobuf examples in comments in tensorflow::Example.proto. * formula formatting in MNIST beginner tutorial * negative fraction-of-queue-full stats * protobuf inclusion path so that Android demo will build under Blaze. * small typo (moderatly > moderately) * Session.run() to check that tensor arguments come from the session's graph. * another six import * seq2seq typo in bazel command Base CL: 108349164
* TensorFlow: Doc and linter fixes, some additional tests andGravatar Vijay Vasudevan2015-11-16
| | | | | | | | | | | | | | error handling, updates to website. Changes: - Removes redundant reshape from image models by @mrry - Default TensorBoard to localhost by @danmane - Reformatting of tensorflow/core by @josh11b - Make tutorials backwards compatible to 0.5.0 by @girving - Improve print documentation (md files not updated). - Add proper scrolling to sitemap by @martinwicke Base CL: 107956254
* TensorFlow: Upstream changes from afternoon.Gravatar Vijay Vasudevan2015-11-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: - Ptrdiff -> DenseIndex change by @jiayq - Fix to scoping the logging in logging.py by @dga - Improvement to Conv2DBackpropFilter on CPU by Andy - Remove lookup table wrappers for the time being (wasn't in our public API yet) by Yukata - Add a check similar to numpy to make sure the user isn't in the tensorflow src directory by @vrv - More changes for python 3 compat by @girving - Make dropout preserve shape info from input (@mrry) - Significant speed improvements by @zheng-xq to BFC allocator to bring on par (CPU overhead-wise) to the region allocator. Make BFC allocator the default now that it's working well for a variety of models. - Fix a bunch of typos reported by users (@vrv) - Enable concat for bfloat16 on GPU by Ashish. Base CL: 107733123
* TensorFlow: upstream changes from the afternoon.Gravatar Vijay Vasudevan2015-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | Changes: - futurize --stage2 changes for Python 3 compatibility by @girving. - Small updates to documentation by @vrv, schuster and others - Account for failure of std::thread::hardware_concurrency by @ebrevdo. - More changes for backwards-compatibility tests by Josh - Updates to python op doc generation by Josh - Added support for using the best-fit allocator via ConfigProto by @vrv. - Rename LocalSession to DirectSession, since local was a bad name for it. - Enable tf.nn.moments() to work with tensors of unknown shape by @mrry. GITHUB_ISSUE: 139 - Changes for Android build by Andrew. Base CL: 107645181
* TensorFlow: Initial commit of TensorFlow library.Gravatar Manjunath Kudlur2015-11-06
TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108