aboutsummaryrefslogtreecommitdiffhomepage
path: root/third_party
diff options
context:
space:
mode:
authorGravatar Benoit Steiner <benoitsteiner@users.noreply.github.com>2017-02-21 11:00:19 -0800
committerGravatar Vijay Vasudevan <vrv@google.com>2017-02-21 11:00:19 -0800
commit2c8d0dca978a246f54c506aae4587dbce5d3bcf0 (patch)
tree9efcc4097cce2224d5cd0bb83698d52d5a5a5819 /third_party
parent43c71a03380d8de18202cc399563814b2f438cd2 (diff)
OpenCL Improvements (#7596)
* OpenCL improvements Added Tile, Transpose and Range Ops double support for SYCL device. Moved gpu_device_name() to test_util.py so now it can be used in force_gpu to pull either GPU or SYCL depending on what is available in the system. * Improvements to the SYCL device support - Registration of Type Traits required for stride slice op - Registration of ConcatOffset, _ListToArray, _ArrayToList Pad, Reverse ( CPU ), ReverseV2 ( CPU ), Size, ExpandDims, Squeeze, StridedSlice, StridedSliceGrad, StridedSliceAssign, TileGrad, InvertPermutation, Transpose - Registration of Sycl kernels only for essential data types - Floor_div_real has been disabled for SYCL device - Device in control_flow_ops_py_test.py needed to be lower cased * SYCL support improvements (#31) * Improvements to the SYCL device support This commit reduces number of failing tests when TensorFlow compiles for OpenCL support. - Registration of Type Traits required for stride slice op - Registration of ConcatOffset, _ListToArray, _ArrayToList Pad, Reverse ( CPU ), ReverseV2 ( CPU ), Size, ExpandDims, Squeeze, StridedSlice, StridedSliceGrad, StridedSliceAssign, TileGrad, InvertPermutation, Transpose - Registration of Sycl kernels only for essential data types - Floor_div_real has been disabled for SYCL device - Device in control_flow_ops_py_test.py needed to be lower cased * Fixes & Version bump (#33) * Fix Unbuntu typo. (#38) unbuntu -> ubuntu * Add problem descriptions and solutions (#35) * Add ComputeCpp lib folder to LD_LIBRARY_PATH * Add ImportError problem + solution If you get the error message "ImportError: libComputeCpp.so: cannot open shared object file: No such file or directory", make sure you have added the path to ComputeCpp's lib folder to your `LD_LIBRARY_PATH`. * Add another ImportError problem + solution If you get the error message "ImportError: cannot import name 'pywrap_tensorflow'" you may be standing in the TensorFlow directory. * Improvements to the SYCL device support * Registers FloorDiv, FloorMod and SoftMax Ops for SYCL device * Workaround for 0 bytes allocation for SYCL device (#42) * Sycl improvements (#44) - Eigen version bump - Extends Cast and Cwise ops benchmark to cover Sycl device - Extends device_lib_test.py to cover Sycl device - Registers int32, string and ResourceHandler to run on host for Enter and RefEnter Sycl Ops - Enables RecudeMax op for Sycl since Eigen implementation is ready - Registers Less op for Sycl device * Improved the formatting of the SYCL code * Fixed compilation error. * Made sure that using test sessions with force_gpu=True forces the placement on a gpu device even if none is detected.
Diffstat (limited to 'third_party')
-rwxr-xr-xthird_party/sycl/crosstool/computecpp.tpl33
1 files changed, 28 insertions, 5 deletions
diff --git a/third_party/sycl/crosstool/computecpp.tpl b/third_party/sycl/crosstool/computecpp.tpl
index a5e6b9fe93..66dd9aea7b 100755
--- a/third_party/sycl/crosstool/computecpp.tpl
+++ b/third_party/sycl/crosstool/computecpp.tpl
@@ -26,9 +26,7 @@ def main():
if(output_file_index == 1):
# we are linking
- return subprocess.call([CPU_CXX_COMPILER] + compiler_flags)
-
- compiler_flags = compiler_flags + ['-D_GLIBCXX_USE_CXX11_ABI=0', '-DEIGEN_USE_SYCL=1']
+ return subprocess.call([CPU_CXX_COMPILER] + compiler_flags + ['-Wl,--no-undefined'])
# find what we compile
compiling_cpp = 0
@@ -38,6 +36,28 @@ def main():
if(compited_file_name.endswith(('.cc', '.c++', '.cpp', '.CPP', '.C', '.cxx'))):
compiling_cpp = 1;
+ compiler_flags = compiler_flags + ['-D_GLIBCXX_USE_CXX11_ABI=0', '-DEIGEN_USE_SYCL=1', '-DTENSORFLOW_USE_SYCL', '-DEIGEN_HAS_C99_MATH']
+
+ if(compiling_cpp == 1):
+ # create a blacklist of folders that will be skipped when compiling with ComputeCpp
+ _skip = ["external", "llvm", ".cu.cc"]
+ # if compiling external project skip computecpp
+ if any(_folder in _skip for _folder in output_file_name):
+ return subprocess.call([CPU_CXX_COMPILER] + compiler_flags)
+
+ if(compiling_cpp == 1):
+ # this is an optimisation that will check if compiled file has to be compiled with ComputeCpp
+
+ _tmp_flags = [flag for flag in compiler_flags if not flag.startswith(('-o', output_file_name))]
+ # create preprocessed of the file
+ _cmd = " ".join([CPU_CXX_COMPILER] + _tmp_flags + ["-E"])
+ # check if it has parallel_for< in it
+ _cmd += " | grep \".parallel_for\" > /dev/null"
+ ps = subprocess.call(_cmd, shell=True)
+ # if not call CXX compiler
+ if(ps != 0):
+ return subprocess.call([CPU_CXX_COMPILER] + compiler_flags)
+
if(compiling_cpp == 1):
filename, file_extension = os.path.splitext(output_file_name)
bc_out = filename + '.sycl'
@@ -52,9 +72,12 @@ def main():
# dont want that in case of compiling with computecpp first
host_compiler_flags = [flag for flag in compiler_flags
if not flag.startswith(('-MF', '-MD',))
- if not '.d' in flag]
+ if not '.d' in flag
+ ]
+
+ host_compiler_flags[host_compiler_flags.index('-c')] = "--include"
- host_compiler_flags = ['-D_GLIBCXX_USE_CXX11_ABI=0', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '--include', bc_out] + host_compiler_flags
+ host_compiler_flags = ['-xc++', '-D_GLIBCXX_USE_CXX11_ABI=0', '-DTENSORFLOW_USE_SYCL', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-c', bc_out] + host_compiler_flags
x = subprocess.call([CPU_CXX_COMPILER] + host_compiler_flags)
return x
else: