diff options
author | 2017-05-31 06:15:23 +0100 | |
---|---|---|
committer | 2017-05-30 22:15:23 -0700 | |
commit | fe589d9e7caea582beed160ad47e55f537a8403e (patch) | |
tree | f4ba7a7e9a53e2d6bea83685daedf5b59f1993cb /third_party/sycl/crosstool/computecpp.tpl | |
parent | a365082c40875a218635b37333827b4fe64eae37 (diff) |
[OpenCL] Implementation improvements (#9117)
* OpenCL Improvements
* Registers Scatter and ScatterNd Ops for SYCL
* Registers Stack op for SYCL
* Fixes No sycl buffer found error for debug ops
* Registers MatMul and Transpose Ops to SYCL device for double
* Extends analyzer_cli_test.py test to cover SYCL
* Fixes Transpose Op for double when on SYCL
* Bumps Eigen version to fix double precision issue on SYCL
* Extends SessionDebugTestBase to cover SYCL
* Register SYCL implementations for random ops
* Avoid functions that might not be defined on SYCL device (#51)
* Avoid functions that might not be defined on SYCL device
* Simplify by using Eigen math functions
* OpenCL improvements
- Bumps Eigen Version
- Refactors Ops registration
- Introduces workaround for Const Op related to the difference between
CUDA which uses pointers and OpenCL that uses buffers/accessors
- Extends memory types to cover DEVICE_SYCL as well
- Introduces GetSYCLDevice() method that returns list of supported devices
with GPU device having the highest priority ( doesn't include blacklisted devices )
- ::internal::Transpose -> tensorflow::internal::Transpose in order to
avoid compilation reported error
- re-introduces fix for bugged string replacement causing a lot of compilation
warnings -c -> --include
- Adds sycl_runtime to bazels ARRAY_DEPS
- Replicates TF_CALL_GPU_PROXY_TYPES for SYCL
* [OpenCL] Fixes an issue caused by switch to aligned allocator for sycl buffer (#53)
* [Build] Use gcc/g++ as a host compiler to avoid https://github.com/tensorflow/tensorflow/issues/8394 (#54)
* [OpenCL] Fixes Scatter Op
* Fix testSimple and testConst in stack_op_test (#3)
* Fix testSimple and testConst in stack_op_test
* Create a specialisation of DoParallelConcatUpdate for SyclDevice and
register it
* Guard all code in TENSORFLOW_USE_SYCL
* Do not use sycl device for int32
* Registration of the Sycl version is now looking like the one for the GPU
* Remove added empty line
* Register batch normalization kernels for OpenCL (#61)
* [OpenCL] RandomGamma has no GPU friendly implementation (#57)
* [OpenCL] Compatibility fixes for TensorFlow 1.1.0-rc1
* [OpenCL] Implements BatchMatmul Op for SYCL
* Lowercase the device name when GPU or SYCL returned
* [OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device
* [Eigen] Version bump
* GPU device name string manipulation is not needed anymore
* [OpenCL] Adds SYCL to device backwards compatibility
* [OpenCL] Extends core_rnn_test.py to run for SYCL device
* [OpenCL] Minor optimizations for build script
* [OpenCL] Enables skip folder list in build script
* [OpenCL] Fixes ApplyAdamOp for Sycl device
* [OpenCL] SYCL device improvements
* [OpenCL] Fixes debug_ops's SEGFAULT for SYCL device
* [Build] Adds hexagon to skipped folders list
* [OpenCL] Removes EnterLameDuckMode from SYCL device and allocator
* [OpenCL] Registers Unique Op for SYCL device
* [OpenCL][Temporary] Disables tests for SYCL target due to features not being implemented yet
Tests affected:
- tensorflow/contrib/memory_stats/python/kernel_tests/memory_stats_ops_test.py
- tensorflow/contrib/rnn/python/kernel_tests/core_rnn_test.py
- tensorflow/python/kernel_tests/conv_ops_test.py
- tensorflow/python/kernel_tests/depthwise_conv_op_test.py
- tensorflow/python/kernel_tests/pooling_ops_3d_test.py
- tensorflow/python/kernel_tests/pooling_ops_test.py
- tensorflow/python/kernel_tests/scatter_nd_ops_test.py
- tensorflow/python/training/adam_test.py
- tensorflow/python/training/localhost_cluster_performance_test.py
- tensorflow/python/training/training_ops_test.py
* [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline
Tests affected:
- tensorflow/python/debug/cli/analyzer_cli_test.py
- tensorflow/python/debug/lib/session_debug_testlib.py
- tensorflow/python/debug/lib/stepper_test.py
- tensorflow/python/kernel_tests/unstack_op_test.py
- tensorflow/python/ops/image_ops_test.py
* [OpenCL] Take options.config.device_count() into consideration
* [OpenCL] Fixes compilation warning
* [OpenCL] device:SYCL:0 -> sycl:0
* [OpenCL] Removes unwanted flags in building script
Removes flags given to computecpp that enable SIMD instructions
Removes duplicate flags
* bool -> const bool
* [OpenCL] sycl in test_util.gpu_device_name() -> is_sycl_enabled()
* [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline
Test affected:
- tensorflow/contrib/stateless/python/kernel_tests/stateless_random_ops_test.py
* Imports test_util from tensorflow.python.framework
* [OpenCL] Fixes formatting in Python code
* [OpenCL] Extends session_test.py to cover SYCL device
* [OpenCL] Cleans singleton class
* [OpenCL] Keeping CUDA happy
* [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline
Test affected:
- tensorflow/contrib/rnn/python/kernel_tests/core_rnn_cell_test.py
- tensorflow/contrib/seq2seq/python/kernel_tests/beam_search_ops_test.py
* Added support for building with SYCL on ARM.
* Acts on the review feedback from:
- https://github.com/tensorflow/tensorflow/pull/9117#discussion_r113608975
- https://github.com/tensorflow/tensorflow/pull/9117#discussion_r113609173
* [OpenCL] Fixes scatter_nd_op_test
* Fixes auto-merge mistake
* [OpenCL] struct SyclDevice -> class SyclDevice
* Revert "[OpenCL] struct SyclDevice -> class SyclDevice"
This reverts commit addd43348c374a5379f67bb1e5ad084715722fc2.
* [OpenCL] Reverting refactoring commit.
As requested in the review https://github.com/tensorflow/tensorflow/pull/9117#issuecomment-298454466
This change set will be re-introduced in smaller chunks.
* Revert "[OpenCL] device:SYCL:0 -> sycl:0"
This reverts commit cf16e60340b62d16c3764d71b716fe03d35f87a9.
* Revert "[OpenCL] Adds SYCL to device backwards compatibility"
This reverts commit b8401b5164199b7a169be1c1d8dea5001195c390.
* Acts on the feedback from https://github.com/tensorflow/tensorflow/pull/9117#discussion_r115036905
* control_flow_ops_py_test.py expects device name to be lower cased
* Acts on the feedback from https://github.com/tensorflow/tensorflow/pull/9117#discussion_r115037222
* Removes debug print
* Removes not needed partial specialisation
* [OpenCL] Registers ScatterNdFunctor for SYCL device
* [OpenCL] Make it compile
* [OpenCL] Follow gpu_device changes
* [OpenCL] Adds cxx_builtin_include_directory for python lib
Fixes bazels missing undeclared inclusions that appeared after
merge with TensorFlow upstream
* [OpenCL] Fixes Constant Op
* [OpenCL] gXX-4.8 -> gXX
* [OpenCL] Removes -D_GLIBCXX_USE_CXX11_ABI=0 as it breaks default compiler setup for Ubuntu 16.04
* Revert "[OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device"
This reverts commit 06c50c0a485f40c30a436f02c3fa7794e370c49d.
* [OpenCL] CPU allocator is a singleton we should not delete it
Diffstat (limited to 'third_party/sycl/crosstool/computecpp.tpl')
-rwxr-xr-x | third_party/sycl/crosstool/computecpp.tpl | 132 |
1 files changed, 69 insertions, 63 deletions
diff --git a/third_party/sycl/crosstool/computecpp.tpl b/third_party/sycl/crosstool/computecpp.tpl index 595e7136a6..94c5e6aaad 100755 --- a/third_party/sycl/crosstool/computecpp.tpl +++ b/third_party/sycl/crosstool/computecpp.tpl @@ -1,8 +1,9 @@ #!/usr/bin/env python import os -import subprocess import sys +import tempfile +from subprocess import call, Popen, PIPE CPU_CXX_COMPILER = ('%{host_cxx_compiler}') CPU_C_COMPILER = ('%{host_c_compiler}') @@ -13,76 +14,81 @@ COMPUTECPP_DRIVER= COMPUTECPP_ROOT + 'bin/compute++' COMPUTECPP_INCLUDE = COMPUTECPP_ROOT + 'include' def main(): - compiler_flags = [] - - # remove -fsamotoze-coverage from string - if CPU_CXX_COMPILER.find("g++") != -1: - compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(('-Wl,--no-undefined', '-fsanitize-coverage', '-Wno-unused-but-set-variable', '-Wignored-attributes'))] - else: - compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(('-Wl,--no-undefined', '-Wno-unused-but-set-variable', '-Wignored-attributes'))] + remove_flags = ('-Wl,--no-undefined', '-Wno-unused-but-set-variable', '-Wignored-attributes') + # remove -fsamotoze-coverage from string with g++ + if 'g++' in CPU_CXX_COMPILER: + remove_flags += ('-fsanitize-coverage',) + compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(remove_flags)] output_file_index = compiler_flags.index('-o') + 1 output_file_name = compiler_flags[output_file_index] - if(output_file_index == 1): + if output_file_index == 1: # we are linking - return subprocess.call([CPU_CXX_COMPILER] + compiler_flags + ['-Wl,--no-undefined']) + return call([CPU_CXX_COMPILER] + compiler_flags + ['-Wl,--no-undefined']) # find what we compile - compiling_cpp = 0 - if('-c' in compiler_flags): - compiled_file_index = compiler_flags.index('-c') + 1 - compited_file_name = compiler_flags[compiled_file_index] - if(compited_file_name.endswith(('.cc', '.c++', '.cpp', '.CPP', '.C', '.cxx'))): - compiling_cpp = 1; - - compiler_flags = compiler_flags + ['-D_GLIBCXX_USE_CXX11_ABI=0', '-DEIGEN_USE_SYCL=1', '-DTENSORFLOW_USE_SYCL', '-DEIGEN_HAS_C99_MATH'] - - if(compiling_cpp == 1): - # create a blacklist of folders that will be skipped when compiling with ComputeCpp - _skip = ["external", "llvm", ".cu.cc"] - # if compiling external project skip computecpp - if any(_folder in _skip for _folder in output_file_name): - return subprocess.call([CPU_CXX_COMPILER] + compiler_flags) - - if(compiling_cpp == 1): - # this is an optimisation that will check if compiled file has to be compiled with ComputeCpp - - _tmp_flags = [flag for flag in compiler_flags if not flag.startswith(('-o', output_file_name))] - # create preprocessed of the file - _cmd = " ".join([CPU_CXX_COMPILER] + _tmp_flags + ["-E"]) - # check if it has parallel_for< in it - _cmd += " | grep \".parallel_for\" > /dev/null" - ps = subprocess.call(_cmd, shell=True) - # if not call CXX compiler - if(ps != 0): - return subprocess.call([CPU_CXX_COMPILER] + compiler_flags) - - if(compiling_cpp == 1): - filename, file_extension = os.path.splitext(output_file_name) - bc_out = filename + '.sycl' - - # strip asan for the device - computecpp_device_compiler_flags = ['-sycl-compress-name', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-isystem', - COMPUTECPP_INCLUDE, '-std=c++11', '-sycl', '-emit-llvm', '-no-serial-memop', '-Xclang', '-cl-denorms-are-zero', '-Xclang', '-cl-fp32-correctly-rounded-divide-sqrt'] - computecpp_device_compiler_flags += [flag for flag in compiler_flags if not flag.startswith(('-fsanitize', '-march=native', '-mavx'))] - - x = subprocess.call([COMPUTECPP_DRIVER] + computecpp_device_compiler_flags ) - if(x == 0): - # dont want that in case of compiling with computecpp first - host_compiler_flags = [flag for flag in compiler_flags - if not flag.startswith(('-MF', '-MD',)) - if not '.d' in flag - ] - - host_compiler_flags[host_compiler_flags.index('-c')] = "--include" - - host_compiler_flags = ['-xc++', '-D_GLIBCXX_USE_CXX11_ABI=0', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-c', bc_out] + host_compiler_flags - x = subprocess.call([CPU_CXX_COMPILER] + host_compiler_flags) - return x - else: + compiling_cpp = False + if '-c' in compiler_flags: + compiled_file_index = compiler_flags.index('-c') + 1 + compiled_file_name = compiler_flags[compiled_file_index] + compiling_cpp = compiled_file_name.endswith(('.cc', '.c++', '.cpp', '.CPP', '.C', '.cxx')) + + # add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line if you have custom installation of GCC/Clang + compiler_flags = compiler_flags + ['-DEIGEN_USE_SYCL=1', '-DTENSORFLOW_USE_SYCL', '-DEIGEN_HAS_C99_MATH'] + + if not compiling_cpp: # compile for C - return subprocess.call([CPU_C_COMPILER] + compiler_flags) + return call([CPU_C_COMPILER] + compiler_flags) + + # create a blacklist of folders that will be skipped when compiling with ComputeCpp + skip_extensions = [".cu.cc"] + skip_folders = ["tensorflow/compiler", "tensorflow/docs_src", "tensorflow/tensorboard", "third_party", "external", "hexagon"] + skip_folders = [(folder + '/') for folder in skip_folders] + # if compiling external project skip computecpp + if any(compiled_file_name.endswith(_ext) for _ext in skip_extensions) or any(_folder in output_file_name for _folder in skip_folders): + return call([CPU_CXX_COMPILER] + compiler_flags) + + # this is an optimisation that will check if compiled file has to be compiled with ComputeCpp + flags_without_output = list(compiler_flags) + del flags_without_output[output_file_index] # remove output_file_name + del flags_without_output[output_file_index - 1] # remove '-o' + # create preprocessed of the file and store it for later use + pipe = Popen([CPU_CXX_COMPILER] + flags_without_output + ["-E"], stdout=PIPE) + preprocessed_file_str = pipe.communicate()[0] + if pipe.returncode != 0: + return pipe.returncode + + # check if it has parallel_for in it + if not '.parallel_for' in preprocessed_file_str: + # call CXX compiler like usual + with tempfile.NamedTemporaryFile(suffix=".ii") as preprocessed_file: # Force '.ii' extension so that g++ does not preprocess the file again + preprocessed_file.write(preprocessed_file_str) + preprocessed_file.flush() + compiler_flags[compiled_file_index] = preprocessed_file.name + return call([CPU_CXX_COMPILER] + compiler_flags) + del preprocessed_file_str # save some memory as this string can be quite big + + filename, file_extension = os.path.splitext(output_file_name) + bc_out = filename + '.sycl' + + # strip asan for the device + computecpp_device_compiler_flags = ['-sycl-compress-name', '-Wno-unused-variable', + '-I', COMPUTECPP_INCLUDE, '-isystem', COMPUTECPP_INCLUDE, + '-std=c++11', '-sycl', '-emit-llvm', '-no-serial-memop', + '-Xclang', '-cl-denorms-are-zero', '-Xclang', '-cl-fp32-correctly-rounded-divide-sqrt'] + # disable flags enabling SIMD instructions + computecpp_device_compiler_flags += [flag for flag in compiler_flags if \ + not any(x in flag.lower() for x in ('-fsanitize', '=native', '=core2', 'msse', 'vectorize', 'mavx', 'mmmx', 'm3dnow', 'fma'))] + + x = call([COMPUTECPP_DRIVER] + computecpp_device_compiler_flags) + if x == 0: + # dont want that in case of compiling with computecpp first + host_compiler_flags = [flag for flag in compiler_flags if (not flag.startswith(('-MF', '-MD',)) and not '.d' in flag)] + host_compiler_flags[host_compiler_flags.index('-c')] = "--include" + host_compiler_flags = ['-xc++', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-c', bc_out] + host_compiler_flags + x = call([CPU_CXX_COMPILER] + host_compiler_flags) + return x if __name__ == '__main__': sys.exit(main()) |