[OpenCL] Implementation improvements (#9117)

* OpenCL Improvements * Registers Scatter and ScatterNd Ops for SYCL * Registers Stack op for SYCL * Fixes No sycl buffer found error for debug ops * Registers MatMul and Transpose Ops to SYCL device for double * Extends analyzer_cli_test.py test to cover SYCL * Fixes Transpose Op for double when on SYCL * Bumps Eigen version to fix double precision issue on SYCL * Extends SessionDebugTestBase to cover SYCL * Register SYCL implementations for random ops * Avoid functions that might not be defined on SYCL device (#51) * Avoid functions that might not be defined on SYCL device * Simplify by using Eigen math functions * OpenCL improvements - Bumps Eigen Version - Refactors Ops registration - Introduces workaround for Const Op related to the difference between CUDA which uses pointers and OpenCL that uses buffers/accessors - Extends memory types to cover DEVICE_SYCL as well - Introduces GetSYCLDevice() method that returns list of supported devices with GPU device having the highest priority ( doesn't include blacklisted devices ) - ::internal::Transpose -> tensorflow::internal::Transpose in order to avoid compilation reported error - re-introduces fix for bugged string replacement causing a lot of compilation warnings -c -> --include - Adds sycl_runtime to bazels ARRAY_DEPS - Replicates TF_CALL_GPU_PROXY_TYPES for SYCL * [OpenCL] Fixes an issue caused by switch to aligned allocator for sycl buffer (#53) * [Build] Use gcc/g++ as a host compiler to avoid https://github.com/tensorflow/tensorflow/issues/8394 (#54) * [OpenCL] Fixes Scatter Op * Fix testSimple and testConst in stack_op_test (#3) * Fix testSimple and testConst in stack_op_test * Create a specialisation of DoParallelConcatUpdate for SyclDevice and register it * Guard all code in TENSORFLOW_USE_SYCL * Do not use sycl device for int32 * Registration of the Sycl version is now looking like the one for the GPU * Remove added empty line * Register batch normalization kernels for OpenCL (#61) * [OpenCL] RandomGamma has no GPU friendly implementation (#57) * [OpenCL] Compatibility fixes for TensorFlow 1.1.0-rc1 * [OpenCL] Implements BatchMatmul Op for SYCL * Lowercase the device name when GPU or SYCL returned * [OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device * [Eigen] Version bump * GPU device name string manipulation is not needed anymore * [OpenCL] Adds SYCL to device backwards compatibility * [OpenCL] Extends core_rnn_test.py to run for SYCL device * [OpenCL] Minor optimizations for build script * [OpenCL] Enables skip folder list in build script * [OpenCL] Fixes ApplyAdamOp for Sycl device * [OpenCL] SYCL device improvements * [OpenCL] Fixes debug_ops's SEGFAULT for SYCL device * [Build] Adds hexagon to skipped folders list * [OpenCL] Removes EnterLameDuckMode from SYCL device and allocator * [OpenCL] Registers Unique Op for SYCL device * [OpenCL][Temporary] Disables tests for SYCL target due to features not being implemented yet Tests affected: - tensorflow/contrib/memory_stats/python/kernel_tests/memory_stats_ops_test.py - tensorflow/contrib/rnn/python/kernel_tests/core_rnn_test.py - tensorflow/python/kernel_tests/conv_ops_test.py - tensorflow/python/kernel_tests/depthwise_conv_op_test.py - tensorflow/python/kernel_tests/pooling_ops_3d_test.py - tensorflow/python/kernel_tests/pooling_ops_test.py - tensorflow/python/kernel_tests/scatter_nd_ops_test.py - tensorflow/python/training/adam_test.py - tensorflow/python/training/localhost_cluster_performance_test.py - tensorflow/python/training/training_ops_test.py * [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline Tests affected: - tensorflow/python/debug/cli/analyzer_cli_test.py - tensorflow/python/debug/lib/session_debug_testlib.py - tensorflow/python/debug/lib/stepper_test.py - tensorflow/python/kernel_tests/unstack_op_test.py - tensorflow/python/ops/image_ops_test.py * [OpenCL] Take options.config.device_count() into consideration * [OpenCL] Fixes compilation warning * [OpenCL] device:SYCL:0 -> sycl:0 * [OpenCL] Removes unwanted flags in building script Removes flags given to computecpp that enable SIMD instructions Removes duplicate flags * bool -> const bool * [OpenCL] sycl in test_util.gpu_device_name() -> is_sycl_enabled() * [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline Test affected: - tensorflow/contrib/stateless/python/kernel_tests/stateless_random_ops_test.py * Imports test_util from tensorflow.python.framework * [OpenCL] Fixes formatting in Python code * [OpenCL] Extends session_test.py to cover SYCL device * [OpenCL] Cleans singleton class * [OpenCL] Keeping CUDA happy * [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline Test affected: - tensorflow/contrib/rnn/python/kernel_tests/core_rnn_cell_test.py - tensorflow/contrib/seq2seq/python/kernel_tests/beam_search_ops_test.py * Added support for building with SYCL on ARM. * Acts on the review feedback from: - https://github.com/tensorflow/tensorflow/pull/9117#discussion_r113608975 - https://github.com/tensorflow/tensorflow/pull/9117#discussion_r113609173 * [OpenCL] Fixes scatter_nd_op_test * Fixes auto-merge mistake * [OpenCL] struct SyclDevice -> class SyclDevice * Revert "[OpenCL] struct SyclDevice -> class SyclDevice" This reverts commit addd43348c374a5379f67bb1e5ad084715722fc2. * [OpenCL] Reverting refactoring commit. As requested in the review https://github.com/tensorflow/tensorflow/pull/9117#issuecomment-298454466 This change set will be re-introduced in smaller chunks. * Revert "[OpenCL] device:SYCL:0 -> sycl:0" This reverts commit cf16e60340b62d16c3764d71b716fe03d35f87a9. * Revert "[OpenCL] Adds SYCL to device backwards compatibility" This reverts commit b8401b5164199b7a169be1c1d8dea5001195c390. * Acts on the feedback from https://github.com/tensorflow/tensorflow/pull/9117#discussion_r115036905 * control_flow_ops_py_test.py expects device name to be lower cased * Acts on the feedback from https://github.com/tensorflow/tensorflow/pull/9117#discussion_r115037222 * Removes debug print * Removes not needed partial specialisation * [OpenCL] Registers ScatterNdFunctor for SYCL device * [OpenCL] Make it compile * [OpenCL] Follow gpu_device changes * [OpenCL] Adds cxx_builtin_include_directory for python lib Fixes bazels missing undeclared inclusions that appeared after merge with TensorFlow upstream * [OpenCL] Fixes Constant Op * [OpenCL] gXX-4.8 -> gXX * [OpenCL] Removes -D_GLIBCXX_USE_CXX11_ABI=0 as it breaks default compiler setup for Ubuntu 16.04 * Revert "[OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device" This reverts commit 06c50c0a485f40c30a436f02c3fa7794e370c49d. * [OpenCL] CPU allocator is a singleton we should not delete it
author: Luke Iwanski <luke@codeplay.com> 2017-05-31 06:15:23 +0100
committer: Benoit Steiner <benoitsteiner@users.noreply.github.com> 2017-05-30 22:15:23 -0700
commit: fe589d9e7caea582beed160ad47e55f537a8403e (patch)
tree: f4ba7a7e9a53e2d6bea83685daedf5b59f1993cb /third_party/sycl/crosstool/computecpp.tpl
parent: a365082c40875a218635b37333827b4fe64eae37 (diff)
1 files changed, 69 insertions, 63 deletions
diff --git a/third_party/sycl/crosstool/computecpp.tpl b/third_party/sycl/crosstool/computecpp.tpl
index 595e7136a6..94c5e6aaad 100755
--- a/third_party/sycl/crosstool/computecpp.tpl
+++ b/third_party/sycl/crosstool/computecpp.tpl
@@ -1,8 +1,9 @@
 #!/usr/bin/env python
 
 import os
-import subprocess
 import sys
+import tempfile
+from subprocess import call, Popen, PIPE
 
 CPU_CXX_COMPILER = ('%{host_cxx_compiler}')
 CPU_C_COMPILER = ('%{host_c_compiler}')
@@ -13,76 +14,81 @@ COMPUTECPP_DRIVER= COMPUTECPP_ROOT + 'bin/compute++'
 COMPUTECPP_INCLUDE = COMPUTECPP_ROOT + 'include'
 
 def main():
-  compiler_flags = []
-
-  # remove -fsamotoze-coverage from string
-  if CPU_CXX_COMPILER.find("g++") != -1:
-    compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(('-Wl,--no-undefined', '-fsanitize-coverage', '-Wno-unused-but-set-variable', '-Wignored-attributes'))]
-  else:
-    compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(('-Wl,--no-undefined', '-Wno-unused-but-set-variable', '-Wignored-attributes'))]
+  remove_flags = ('-Wl,--no-undefined', '-Wno-unused-but-set-variable', '-Wignored-attributes')
+    # remove -fsamotoze-coverage from string with g++
+  if 'g++' in CPU_CXX_COMPILER:
+    remove_flags += ('-fsanitize-coverage',)
+  compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(remove_flags)]
 
   output_file_index = compiler_flags.index('-o') + 1
   output_file_name = compiler_flags[output_file_index]
 
-  if(output_file_index == 1):
+  if output_file_index == 1:
     # we are linking
-    return subprocess.call([CPU_CXX_COMPILER] + compiler_flags + ['-Wl,--no-undefined'])
+    return call([CPU_CXX_COMPILER] + compiler_flags + ['-Wl,--no-undefined'])
 
   # find what we compile
-  compiling_cpp = 0
-  if('-c' in compiler_flags):
-      compiled_file_index = compiler_flags.index('-c') + 1
-      compited_file_name = compiler_flags[compiled_file_index]
-      if(compited_file_name.endswith(('.cc', '.c++', '.cpp', '.CPP', '.C', '.cxx'))):
-          compiling_cpp = 1;
-
-  compiler_flags = compiler_flags + ['-D_GLIBCXX_USE_CXX11_ABI=0', '-DEIGEN_USE_SYCL=1', '-DTENSORFLOW_USE_SYCL', '-DEIGEN_HAS_C99_MATH']
-
-  if(compiling_cpp == 1):
-      # create a blacklist of folders that will be skipped when compiling with ComputeCpp
-      _skip = ["external", "llvm", ".cu.cc"]
-      # if compiling external project skip computecpp
-      if any(_folder in _skip for _folder in output_file_name):
-        return subprocess.call([CPU_CXX_COMPILER] + compiler_flags)
-
-  if(compiling_cpp == 1):
-      # this is an optimisation that will check if compiled file has to be compiled with ComputeCpp
-
-      _tmp_flags = [flag for flag in compiler_flags if not flag.startswith(('-o', output_file_name))]
-      # create preprocessed of the file
-      _cmd = " ".join([CPU_CXX_COMPILER] + _tmp_flags + ["-E"])
-      # check if it has parallel_for< in it
-      _cmd += " | grep \".parallel_for\" > /dev/null"
-      ps = subprocess.call(_cmd, shell=True)
-      # if not call CXX compiler
-      if(ps != 0):
-          return subprocess.call([CPU_CXX_COMPILER] + compiler_flags)
-
-  if(compiling_cpp == 1):
-      filename, file_extension = os.path.splitext(output_file_name)
-      bc_out = filename + '.sycl'
-
-      # strip asan for the device
-      computecpp_device_compiler_flags = ['-sycl-compress-name', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-isystem',
-          COMPUTECPP_INCLUDE, '-std=c++11', '-sycl', '-emit-llvm', '-no-serial-memop', '-Xclang', '-cl-denorms-are-zero', '-Xclang', '-cl-fp32-correctly-rounded-divide-sqrt']
-      computecpp_device_compiler_flags += [flag for flag in compiler_flags if not flag.startswith(('-fsanitize', '-march=native', '-mavx'))]
-
-      x = subprocess.call([COMPUTECPP_DRIVER] + computecpp_device_compiler_flags )
-      if(x == 0):
-          # dont want that in case of compiling with computecpp first
-          host_compiler_flags = [flag for flag in compiler_flags
-                                    if not flag.startswith(('-MF', '-MD',))
-                                    if not '.d' in flag
-                                ]
-
-          host_compiler_flags[host_compiler_flags.index('-c')] = "--include"
-
-          host_compiler_flags = ['-xc++', '-D_GLIBCXX_USE_CXX11_ABI=0', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-c', bc_out] + host_compiler_flags
-          x = subprocess.call([CPU_CXX_COMPILER] + host_compiler_flags)
-      return x
-  else:
+  compiling_cpp = False
+  if '-c' in compiler_flags:
+    compiled_file_index = compiler_flags.index('-c') + 1
+    compiled_file_name = compiler_flags[compiled_file_index]
+    compiling_cpp = compiled_file_name.endswith(('.cc', '.c++', '.cpp', '.CPP', '.C', '.cxx'))
+
+  # add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line if you have custom installation of GCC/Clang
+  compiler_flags = compiler_flags + ['-DEIGEN_USE_SYCL=1', '-DTENSORFLOW_USE_SYCL', '-DEIGEN_HAS_C99_MATH']
+
+  if not compiling_cpp:
     # compile for C
-    return subprocess.call([CPU_C_COMPILER] + compiler_flags)
+    return call([CPU_C_COMPILER] + compiler_flags)
+
+  # create a blacklist of folders that will be skipped when compiling with ComputeCpp
+  skip_extensions = [".cu.cc"]
+  skip_folders = ["tensorflow/compiler", "tensorflow/docs_src", "tensorflow/tensorboard", "third_party", "external", "hexagon"]
+  skip_folders = [(folder + '/') for folder in skip_folders]
+  # if compiling external project skip computecpp
+  if any(compiled_file_name.endswith(_ext) for _ext in skip_extensions) or any(_folder in output_file_name for _folder in skip_folders):
+    return call([CPU_CXX_COMPILER] + compiler_flags)
+
+  # this is an optimisation that will check if compiled file has to be compiled with ComputeCpp
+  flags_without_output = list(compiler_flags)
+  del flags_without_output[output_file_index]   # remove output_file_name
+  del flags_without_output[output_file_index - 1] # remove '-o'
+  # create preprocessed of the file and store it for later use
+  pipe = Popen([CPU_CXX_COMPILER] + flags_without_output + ["-E"], stdout=PIPE)
+  preprocessed_file_str = pipe.communicate()[0]
+  if pipe.returncode != 0:
+    return pipe.returncode
+
+  # check if it has parallel_for in it
+  if not '.parallel_for' in preprocessed_file_str:
+    # call CXX compiler like usual
+    with tempfile.NamedTemporaryFile(suffix=".ii") as preprocessed_file: # Force '.ii' extension so that g++ does not preprocess the file again
+      preprocessed_file.write(preprocessed_file_str)
+      preprocessed_file.flush()
+      compiler_flags[compiled_file_index] = preprocessed_file.name
+      return call([CPU_CXX_COMPILER] + compiler_flags)
+  del preprocessed_file_str   # save some memory as this string can be quite big
+
+  filename, file_extension = os.path.splitext(output_file_name)
+  bc_out = filename + '.sycl'
+
+  # strip asan for the device
+  computecpp_device_compiler_flags = ['-sycl-compress-name', '-Wno-unused-variable',
+                                      '-I', COMPUTECPP_INCLUDE, '-isystem', COMPUTECPP_INCLUDE,
+                                      '-std=c++11', '-sycl', '-emit-llvm', '-no-serial-memop',
+                                      '-Xclang', '-cl-denorms-are-zero', '-Xclang', '-cl-fp32-correctly-rounded-divide-sqrt']
+  # disable flags enabling SIMD instructions
+  computecpp_device_compiler_flags += [flag for flag in compiler_flags if \
+    not any(x in flag.lower() for x in ('-fsanitize', '=native', '=core2', 'msse', 'vectorize', 'mavx', 'mmmx', 'm3dnow', 'fma'))]
+
+  x = call([COMPUTECPP_DRIVER] + computecpp_device_compiler_flags)
+  if x == 0:
+    # dont want that in case of compiling with computecpp first
+    host_compiler_flags = [flag for flag in compiler_flags if (not flag.startswith(('-MF', '-MD',)) and not '.d' in flag)]
+    host_compiler_flags[host_compiler_flags.index('-c')] = "--include"
+    host_compiler_flags = ['-xc++', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-c', bc_out] + host_compiler_flags
+    x = call([CPU_CXX_COMPILER] + host_compiler_flags)
+  return x
 
 if __name__ == '__main__':
   sys.exit(main())
author	Luke Iwanski <luke@codeplay.com>	2017-05-31 06:15:23 +0100
committer	Benoit Steiner <benoitsteiner@users.noreply.github.com>	2017-05-30 22:15:23 -0700
commit	fe589d9e7caea582beed160ad47e55f537a8403e (patch)
tree	f4ba7a7e9a53e2d6bea83685daedf5b59f1993cb /third_party/sycl/crosstool/computecpp.tpl
parent	a365082c40875a218635b37333827b4fe64eae37 (diff)