diff options
author | 2017-02-21 11:00:19 -0800 | |
---|---|---|
committer | 2017-02-21 11:00:19 -0800 | |
commit | 2c8d0dca978a246f54c506aae4587dbce5d3bcf0 (patch) | |
tree | 9efcc4097cce2224d5cd0bb83698d52d5a5a5819 /third_party | |
parent | 43c71a03380d8de18202cc399563814b2f438cd2 (diff) |
OpenCL Improvements (#7596)
* OpenCL improvements
Added Tile, Transpose and Range Ops double support for SYCL device.
Moved gpu_device_name() to test_util.py so now it can be used in force_gpu to pull either GPU or SYCL depending on what is available in the system.
* Improvements to the SYCL device support
- Registration of Type Traits required for stride slice op
- Registration of ConcatOffset, _ListToArray, _ArrayToList
Pad, Reverse ( CPU ), ReverseV2 ( CPU ), Size, ExpandDims,
Squeeze, StridedSlice, StridedSliceGrad, StridedSliceAssign,
TileGrad, InvertPermutation, Transpose
- Registration of Sycl kernels only for essential data types
- Floor_div_real has been disabled for SYCL device
- Device in control_flow_ops_py_test.py needed to be lower cased
* SYCL support improvements (#31)
* Improvements to the SYCL device support
This commit reduces number of failing tests when TensorFlow compiles
for OpenCL support.
- Registration of Type Traits required for stride slice op
- Registration of ConcatOffset, _ListToArray, _ArrayToList
Pad, Reverse ( CPU ), ReverseV2 ( CPU ), Size, ExpandDims,
Squeeze, StridedSlice, StridedSliceGrad, StridedSliceAssign,
TileGrad, InvertPermutation, Transpose
- Registration of Sycl kernels only for essential data types
- Floor_div_real has been disabled for SYCL device
- Device in control_flow_ops_py_test.py needed to be lower cased
* Fixes & Version bump (#33)
* Fix Unbuntu typo. (#38)
unbuntu -> ubuntu
* Add problem descriptions and solutions (#35)
* Add ComputeCpp lib folder to LD_LIBRARY_PATH
* Add ImportError problem + solution
If you get the error message "ImportError: libComputeCpp.so: cannot open shared
object file: No such file or directory", make sure you have added the
path to ComputeCpp's lib folder to your `LD_LIBRARY_PATH`.
* Add another ImportError problem + solution
If you get the error message "ImportError: cannot import name
'pywrap_tensorflow'" you may be standing in the TensorFlow directory.
* Improvements to the SYCL device support
* Registers FloorDiv, FloorMod and SoftMax Ops for SYCL device
* Workaround for 0 bytes allocation for SYCL device (#42)
* Sycl improvements (#44)
- Eigen version bump
- Extends Cast and Cwise ops benchmark to cover Sycl device
- Extends device_lib_test.py to cover Sycl device
- Registers int32, string and ResourceHandler to run on host for
Enter and RefEnter Sycl Ops
- Enables RecudeMax op for Sycl since Eigen implementation is ready
- Registers Less op for Sycl device
* Improved the formatting of the SYCL code
* Fixed compilation error.
* Made sure that using test sessions with force_gpu=True forces the
placement on a gpu device even if none is detected.
Diffstat (limited to 'third_party')
-rwxr-xr-x | third_party/sycl/crosstool/computecpp.tpl | 33 |
1 files changed, 28 insertions, 5 deletions
diff --git a/third_party/sycl/crosstool/computecpp.tpl b/third_party/sycl/crosstool/computecpp.tpl index a5e6b9fe93..66dd9aea7b 100755 --- a/third_party/sycl/crosstool/computecpp.tpl +++ b/third_party/sycl/crosstool/computecpp.tpl @@ -26,9 +26,7 @@ def main(): if(output_file_index == 1): # we are linking - return subprocess.call([CPU_CXX_COMPILER] + compiler_flags) - - compiler_flags = compiler_flags + ['-D_GLIBCXX_USE_CXX11_ABI=0', '-DEIGEN_USE_SYCL=1'] + return subprocess.call([CPU_CXX_COMPILER] + compiler_flags + ['-Wl,--no-undefined']) # find what we compile compiling_cpp = 0 @@ -38,6 +36,28 @@ def main(): if(compited_file_name.endswith(('.cc', '.c++', '.cpp', '.CPP', '.C', '.cxx'))): compiling_cpp = 1; + compiler_flags = compiler_flags + ['-D_GLIBCXX_USE_CXX11_ABI=0', '-DEIGEN_USE_SYCL=1', '-DTENSORFLOW_USE_SYCL', '-DEIGEN_HAS_C99_MATH'] + + if(compiling_cpp == 1): + # create a blacklist of folders that will be skipped when compiling with ComputeCpp + _skip = ["external", "llvm", ".cu.cc"] + # if compiling external project skip computecpp + if any(_folder in _skip for _folder in output_file_name): + return subprocess.call([CPU_CXX_COMPILER] + compiler_flags) + + if(compiling_cpp == 1): + # this is an optimisation that will check if compiled file has to be compiled with ComputeCpp + + _tmp_flags = [flag for flag in compiler_flags if not flag.startswith(('-o', output_file_name))] + # create preprocessed of the file + _cmd = " ".join([CPU_CXX_COMPILER] + _tmp_flags + ["-E"]) + # check if it has parallel_for< in it + _cmd += " | grep \".parallel_for\" > /dev/null" + ps = subprocess.call(_cmd, shell=True) + # if not call CXX compiler + if(ps != 0): + return subprocess.call([CPU_CXX_COMPILER] + compiler_flags) + if(compiling_cpp == 1): filename, file_extension = os.path.splitext(output_file_name) bc_out = filename + '.sycl' @@ -52,9 +72,12 @@ def main(): # dont want that in case of compiling with computecpp first host_compiler_flags = [flag for flag in compiler_flags if not flag.startswith(('-MF', '-MD',)) - if not '.d' in flag] + if not '.d' in flag + ] + + host_compiler_flags[host_compiler_flags.index('-c')] = "--include" - host_compiler_flags = ['-D_GLIBCXX_USE_CXX11_ABI=0', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '--include', bc_out] + host_compiler_flags + host_compiler_flags = ['-xc++', '-D_GLIBCXX_USE_CXX11_ABI=0', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-c', bc_out] + host_compiler_flags x = subprocess.call([CPU_CXX_COMPILER] + host_compiler_flags) return x else: |