aboutsummaryrefslogtreecommitdiffhomepage
path: root/third_party/sycl/crosstool/computecpp.tpl
diff options
context:
space:
mode:
authorGravatar Luke Iwanski <luke@codeplay.com>2017-05-31 06:15:23 +0100
committerGravatar Benoit Steiner <benoitsteiner@users.noreply.github.com>2017-05-30 22:15:23 -0700
commitfe589d9e7caea582beed160ad47e55f537a8403e (patch)
treef4ba7a7e9a53e2d6bea83685daedf5b59f1993cb /third_party/sycl/crosstool/computecpp.tpl
parenta365082c40875a218635b37333827b4fe64eae37 (diff)
[OpenCL] Implementation improvements (#9117)
* OpenCL Improvements * Registers Scatter and ScatterNd Ops for SYCL * Registers Stack op for SYCL * Fixes No sycl buffer found error for debug ops * Registers MatMul and Transpose Ops to SYCL device for double * Extends analyzer_cli_test.py test to cover SYCL * Fixes Transpose Op for double when on SYCL * Bumps Eigen version to fix double precision issue on SYCL * Extends SessionDebugTestBase to cover SYCL * Register SYCL implementations for random ops * Avoid functions that might not be defined on SYCL device (#51) * Avoid functions that might not be defined on SYCL device * Simplify by using Eigen math functions * OpenCL improvements - Bumps Eigen Version - Refactors Ops registration - Introduces workaround for Const Op related to the difference between CUDA which uses pointers and OpenCL that uses buffers/accessors - Extends memory types to cover DEVICE_SYCL as well - Introduces GetSYCLDevice() method that returns list of supported devices with GPU device having the highest priority ( doesn't include blacklisted devices ) - ::internal::Transpose -> tensorflow::internal::Transpose in order to avoid compilation reported error - re-introduces fix for bugged string replacement causing a lot of compilation warnings -c -> --include - Adds sycl_runtime to bazels ARRAY_DEPS - Replicates TF_CALL_GPU_PROXY_TYPES for SYCL * [OpenCL] Fixes an issue caused by switch to aligned allocator for sycl buffer (#53) * [Build] Use gcc/g++ as a host compiler to avoid https://github.com/tensorflow/tensorflow/issues/8394 (#54) * [OpenCL] Fixes Scatter Op * Fix testSimple and testConst in stack_op_test (#3) * Fix testSimple and testConst in stack_op_test * Create a specialisation of DoParallelConcatUpdate for SyclDevice and register it * Guard all code in TENSORFLOW_USE_SYCL * Do not use sycl device for int32 * Registration of the Sycl version is now looking like the one for the GPU * Remove added empty line * Register batch normalization kernels for OpenCL (#61) * [OpenCL] RandomGamma has no GPU friendly implementation (#57) * [OpenCL] Compatibility fixes for TensorFlow 1.1.0-rc1 * [OpenCL] Implements BatchMatmul Op for SYCL * Lowercase the device name when GPU or SYCL returned * [OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device * [Eigen] Version bump * GPU device name string manipulation is not needed anymore * [OpenCL] Adds SYCL to device backwards compatibility * [OpenCL] Extends core_rnn_test.py to run for SYCL device * [OpenCL] Minor optimizations for build script * [OpenCL] Enables skip folder list in build script * [OpenCL] Fixes ApplyAdamOp for Sycl device * [OpenCL] SYCL device improvements * [OpenCL] Fixes debug_ops's SEGFAULT for SYCL device * [Build] Adds hexagon to skipped folders list * [OpenCL] Removes EnterLameDuckMode from SYCL device and allocator * [OpenCL] Registers Unique Op for SYCL device * [OpenCL][Temporary] Disables tests for SYCL target due to features not being implemented yet Tests affected: - tensorflow/contrib/memory_stats/python/kernel_tests/memory_stats_ops_test.py - tensorflow/contrib/rnn/python/kernel_tests/core_rnn_test.py - tensorflow/python/kernel_tests/conv_ops_test.py - tensorflow/python/kernel_tests/depthwise_conv_op_test.py - tensorflow/python/kernel_tests/pooling_ops_3d_test.py - tensorflow/python/kernel_tests/pooling_ops_test.py - tensorflow/python/kernel_tests/scatter_nd_ops_test.py - tensorflow/python/training/adam_test.py - tensorflow/python/training/localhost_cluster_performance_test.py - tensorflow/python/training/training_ops_test.py * [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline Tests affected: - tensorflow/python/debug/cli/analyzer_cli_test.py - tensorflow/python/debug/lib/session_debug_testlib.py - tensorflow/python/debug/lib/stepper_test.py - tensorflow/python/kernel_tests/unstack_op_test.py - tensorflow/python/ops/image_ops_test.py * [OpenCL] Take options.config.device_count() into consideration * [OpenCL] Fixes compilation warning * [OpenCL] device:SYCL:0 -> sycl:0 * [OpenCL] Removes unwanted flags in building script Removes flags given to computecpp that enable SIMD instructions Removes duplicate flags * bool -> const bool * [OpenCL] sycl in test_util.gpu_device_name() -> is_sycl_enabled() * [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline Test affected: - tensorflow/contrib/stateless/python/kernel_tests/stateless_random_ops_test.py * Imports test_util from tensorflow.python.framework * [OpenCL] Fixes formatting in Python code * [OpenCL] Extends session_test.py to cover SYCL device * [OpenCL] Cleans singleton class * [OpenCL] Keeping CUDA happy * [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline Test affected: - tensorflow/contrib/rnn/python/kernel_tests/core_rnn_cell_test.py - tensorflow/contrib/seq2seq/python/kernel_tests/beam_search_ops_test.py * Added support for building with SYCL on ARM. * Acts on the review feedback from: - https://github.com/tensorflow/tensorflow/pull/9117#discussion_r113608975 - https://github.com/tensorflow/tensorflow/pull/9117#discussion_r113609173 * [OpenCL] Fixes scatter_nd_op_test * Fixes auto-merge mistake * [OpenCL] struct SyclDevice -> class SyclDevice * Revert "[OpenCL] struct SyclDevice -> class SyclDevice" This reverts commit addd43348c374a5379f67bb1e5ad084715722fc2. * [OpenCL] Reverting refactoring commit. As requested in the review https://github.com/tensorflow/tensorflow/pull/9117#issuecomment-298454466 This change set will be re-introduced in smaller chunks. * Revert "[OpenCL] device:SYCL:0 -> sycl:0" This reverts commit cf16e60340b62d16c3764d71b716fe03d35f87a9. * Revert "[OpenCL] Adds SYCL to device backwards compatibility" This reverts commit b8401b5164199b7a169be1c1d8dea5001195c390. * Acts on the feedback from https://github.com/tensorflow/tensorflow/pull/9117#discussion_r115036905 * control_flow_ops_py_test.py expects device name to be lower cased * Acts on the feedback from https://github.com/tensorflow/tensorflow/pull/9117#discussion_r115037222 * Removes debug print * Removes not needed partial specialisation * [OpenCL] Registers ScatterNdFunctor for SYCL device * [OpenCL] Make it compile * [OpenCL] Follow gpu_device changes * [OpenCL] Adds cxx_builtin_include_directory for python lib Fixes bazels missing undeclared inclusions that appeared after merge with TensorFlow upstream * [OpenCL] Fixes Constant Op * [OpenCL] gXX-4.8 -> gXX * [OpenCL] Removes -D_GLIBCXX_USE_CXX11_ABI=0 as it breaks default compiler setup for Ubuntu 16.04 * Revert "[OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device" This reverts commit 06c50c0a485f40c30a436f02c3fa7794e370c49d. * [OpenCL] CPU allocator is a singleton we should not delete it
Diffstat (limited to 'third_party/sycl/crosstool/computecpp.tpl')
-rwxr-xr-xthird_party/sycl/crosstool/computecpp.tpl132
1 files changed, 69 insertions, 63 deletions
diff --git a/third_party/sycl/crosstool/computecpp.tpl b/third_party/sycl/crosstool/computecpp.tpl
index 595e7136a6..94c5e6aaad 100755
--- a/third_party/sycl/crosstool/computecpp.tpl
+++ b/third_party/sycl/crosstool/computecpp.tpl
@@ -1,8 +1,9 @@
#!/usr/bin/env python
import os
-import subprocess
import sys
+import tempfile
+from subprocess import call, Popen, PIPE
CPU_CXX_COMPILER = ('%{host_cxx_compiler}')
CPU_C_COMPILER = ('%{host_c_compiler}')
@@ -13,76 +14,81 @@ COMPUTECPP_DRIVER= COMPUTECPP_ROOT + 'bin/compute++'
COMPUTECPP_INCLUDE = COMPUTECPP_ROOT + 'include'
def main():
- compiler_flags = []
-
- # remove -fsamotoze-coverage from string
- if CPU_CXX_COMPILER.find("g++") != -1:
- compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(('-Wl,--no-undefined', '-fsanitize-coverage', '-Wno-unused-but-set-variable', '-Wignored-attributes'))]
- else:
- compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(('-Wl,--no-undefined', '-Wno-unused-but-set-variable', '-Wignored-attributes'))]
+ remove_flags = ('-Wl,--no-undefined', '-Wno-unused-but-set-variable', '-Wignored-attributes')
+ # remove -fsamotoze-coverage from string with g++
+ if 'g++' in CPU_CXX_COMPILER:
+ remove_flags += ('-fsanitize-coverage',)
+ compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(remove_flags)]
output_file_index = compiler_flags.index('-o') + 1
output_file_name = compiler_flags[output_file_index]
- if(output_file_index == 1):
+ if output_file_index == 1:
# we are linking
- return subprocess.call([CPU_CXX_COMPILER] + compiler_flags + ['-Wl,--no-undefined'])
+ return call([CPU_CXX_COMPILER] + compiler_flags + ['-Wl,--no-undefined'])
# find what we compile
- compiling_cpp = 0
- if('-c' in compiler_flags):
- compiled_file_index = compiler_flags.index('-c') + 1
- compited_file_name = compiler_flags[compiled_file_index]
- if(compited_file_name.endswith(('.cc', '.c++', '.cpp', '.CPP', '.C', '.cxx'))):
- compiling_cpp = 1;
-
- compiler_flags = compiler_flags + ['-D_GLIBCXX_USE_CXX11_ABI=0', '-DEIGEN_USE_SYCL=1', '-DTENSORFLOW_USE_SYCL', '-DEIGEN_HAS_C99_MATH']
-
- if(compiling_cpp == 1):
- # create a blacklist of folders that will be skipped when compiling with ComputeCpp
- _skip = ["external", "llvm", ".cu.cc"]
- # if compiling external project skip computecpp
- if any(_folder in _skip for _folder in output_file_name):
- return subprocess.call([CPU_CXX_COMPILER] + compiler_flags)
-
- if(compiling_cpp == 1):
- # this is an optimisation that will check if compiled file has to be compiled with ComputeCpp
-
- _tmp_flags = [flag for flag in compiler_flags if not flag.startswith(('-o', output_file_name))]
- # create preprocessed of the file
- _cmd = " ".join([CPU_CXX_COMPILER] + _tmp_flags + ["-E"])
- # check if it has parallel_for< in it
- _cmd += " | grep \".parallel_for\" > /dev/null"
- ps = subprocess.call(_cmd, shell=True)
- # if not call CXX compiler
- if(ps != 0):
- return subprocess.call([CPU_CXX_COMPILER] + compiler_flags)
-
- if(compiling_cpp == 1):
- filename, file_extension = os.path.splitext(output_file_name)
- bc_out = filename + '.sycl'
-
- # strip asan for the device
- computecpp_device_compiler_flags = ['-sycl-compress-name', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-isystem',
- COMPUTECPP_INCLUDE, '-std=c++11', '-sycl', '-emit-llvm', '-no-serial-memop', '-Xclang', '-cl-denorms-are-zero', '-Xclang', '-cl-fp32-correctly-rounded-divide-sqrt']
- computecpp_device_compiler_flags += [flag for flag in compiler_flags if not flag.startswith(('-fsanitize', '-march=native', '-mavx'))]
-
- x = subprocess.call([COMPUTECPP_DRIVER] + computecpp_device_compiler_flags )
- if(x == 0):
- # dont want that in case of compiling with computecpp first
- host_compiler_flags = [flag for flag in compiler_flags
- if not flag.startswith(('-MF', '-MD',))
- if not '.d' in flag
- ]
-
- host_compiler_flags[host_compiler_flags.index('-c')] = "--include"
-
- host_compiler_flags = ['-xc++', '-D_GLIBCXX_USE_CXX11_ABI=0', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-c', bc_out] + host_compiler_flags
- x = subprocess.call([CPU_CXX_COMPILER] + host_compiler_flags)
- return x
- else:
+ compiling_cpp = False
+ if '-c' in compiler_flags:
+ compiled_file_index = compiler_flags.index('-c') + 1
+ compiled_file_name = compiler_flags[compiled_file_index]
+ compiling_cpp = compiled_file_name.endswith(('.cc', '.c++', '.cpp', '.CPP', '.C', '.cxx'))
+
+ # add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line if you have custom installation of GCC/Clang
+ compiler_flags = compiler_flags + ['-DEIGEN_USE_SYCL=1', '-DTENSORFLOW_USE_SYCL', '-DEIGEN_HAS_C99_MATH']
+
+ if not compiling_cpp:
# compile for C
- return subprocess.call([CPU_C_COMPILER] + compiler_flags)
+ return call([CPU_C_COMPILER] + compiler_flags)
+
+ # create a blacklist of folders that will be skipped when compiling with ComputeCpp
+ skip_extensions = [".cu.cc"]
+ skip_folders = ["tensorflow/compiler", "tensorflow/docs_src", "tensorflow/tensorboard", "third_party", "external", "hexagon"]
+ skip_folders = [(folder + '/') for folder in skip_folders]
+ # if compiling external project skip computecpp
+ if any(compiled_file_name.endswith(_ext) for _ext in skip_extensions) or any(_folder in output_file_name for _folder in skip_folders):
+ return call([CPU_CXX_COMPILER] + compiler_flags)
+
+ # this is an optimisation that will check if compiled file has to be compiled with ComputeCpp
+ flags_without_output = list(compiler_flags)
+ del flags_without_output[output_file_index] # remove output_file_name
+ del flags_without_output[output_file_index - 1] # remove '-o'
+ # create preprocessed of the file and store it for later use
+ pipe = Popen([CPU_CXX_COMPILER] + flags_without_output + ["-E"], stdout=PIPE)
+ preprocessed_file_str = pipe.communicate()[0]
+ if pipe.returncode != 0:
+ return pipe.returncode
+
+ # check if it has parallel_for in it
+ if not '.parallel_for' in preprocessed_file_str:
+ # call CXX compiler like usual
+ with tempfile.NamedTemporaryFile(suffix=".ii") as preprocessed_file: # Force '.ii' extension so that g++ does not preprocess the file again
+ preprocessed_file.write(preprocessed_file_str)
+ preprocessed_file.flush()
+ compiler_flags[compiled_file_index] = preprocessed_file.name
+ return call([CPU_CXX_COMPILER] + compiler_flags)
+ del preprocessed_file_str # save some memory as this string can be quite big
+
+ filename, file_extension = os.path.splitext(output_file_name)
+ bc_out = filename + '.sycl'
+
+ # strip asan for the device
+ computecpp_device_compiler_flags = ['-sycl-compress-name', '-Wno-unused-variable',
+ '-I', COMPUTECPP_INCLUDE, '-isystem', COMPUTECPP_INCLUDE,
+ '-std=c++11', '-sycl', '-emit-llvm', '-no-serial-memop',
+ '-Xclang', '-cl-denorms-are-zero', '-Xclang', '-cl-fp32-correctly-rounded-divide-sqrt']
+ # disable flags enabling SIMD instructions
+ computecpp_device_compiler_flags += [flag for flag in compiler_flags if \
+ not any(x in flag.lower() for x in ('-fsanitize', '=native', '=core2', 'msse', 'vectorize', 'mavx', 'mmmx', 'm3dnow', 'fma'))]
+
+ x = call([COMPUTECPP_DRIVER] + computecpp_device_compiler_flags)
+ if x == 0:
+ # dont want that in case of compiling with computecpp first
+ host_compiler_flags = [flag for flag in compiler_flags if (not flag.startswith(('-MF', '-MD',)) and not '.d' in flag)]
+ host_compiler_flags[host_compiler_flags.index('-c')] = "--include"
+ host_compiler_flags = ['-xc++', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-c', bc_out] + host_compiler_flags
+ x = call([CPU_CXX_COMPILER] + host_compiler_flags)
+ return x
if __name__ == '__main__':
sys.exit(main())