Fix alignment crashes in AVX512 builds (#19121)

* Fix issue #15588 by simplifying the code The allocator.h code tried to be clever and use 32 byte alignment for SSE/AVX2/etc use, and 64 byte alignment for AVX512. Unfortunately, the #ifdef in use (from EIGEN) is not useful; the bazel BUILD files do not propagate the tf_copts() compiler flags when the allocator.cc/allocator.h files get compiled, to EIGEN does not see the actual AVX512 using compiler flags... Rather than changing compiler flag propagation throughout a whole bunch of code, there's an opportunity to just simplify the code and always use 64 byte alignment. Yes it wastes a bit of space, but on the other hand now these allocations are cache line aligned which isn't a bad thing... and an ifdef can be dropped Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> * Set EIGEN_MAX_ALIGN_BYTES=64 This patch sets a 64 byte upper bound on the alignment of memory allocated by eigen. This is necessary to prevent crashes during the execution of the unit tests when they are compiled with AVX512 support. Signed-off-by: Mark Ryan <mark.d.ryan@intel.com> * Update the tensorflow/compiler/aot tests for 64 byte alignment Modifications to the tensorflow/core/framework/allocator.h to always use 64 byte alignment causes failures in the tensorflow/compiler/aot unit tests. This patch updates these tests so that they pass with 64 byte aligned allocated memory. Signed-off-by: Mark Ryan <mark.d.ryan@intel.com> * Update Tensor.Slice_Basic for 64 byte alignment The test case //tensorflow/core:framework_tensor_test:Tensor.Slice_Basic fails with EIGEN_MAX_ALIGN_BYTES set to 64. The reason is that the slices it takes of the sample tensor are 32 byte and not 64 byte aligned. This commit increases one of the dimensions of the original tensor to ensure that the slices taken by the test cases are indeed 64 byte aligned. Signed-off-by: Mark Ryan <mark.d.ryan@intel.com> * Update ScopedAllocatorConcatOpTest.Reshape for 64 byte alignment The ScopedAllocatorConcatOpTest.Reshape test requires that the elements of the field_shapes parameter of ExecOp are multiples of Allocator::kAllocatorAlignment in size. If they are not, the backing tensor allocated by PrepOp will have too many elements and reshaping will fail. This commit modifies the test case, making the elements 64 bytes in size, the new value for Allocator::kAllocatorAlignment. Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
author: Mark Ryan <mark.d.ryan@intel.com> 2018-05-17 18:17:39 +0100
committer: Rasmus Munk Larsen <rmlarsen@google.com> 2018-05-17 10:17:39 -0700
commit: ba30ba07b213687d0014a2149963780a26c59e64 (patch)
tree: 49ab6db0aa671ac60fd08cb4ea2d3fb77f788c12 /tensorflow/compiler/aot
parent: 9b41e5158e8bcc8e1853161dd24738afbd3573f5 (diff)
3 files changed, 12 insertions, 12 deletions
diff --git a/tensorflow/compiler/aot/codegen_test_h.golden b/tensorflow/compiler/aot/codegen_test_h.golden
index 6e050cf564..6641d45e83 100644
--- a/tensorflow/compiler/aot/codegen_test_h.golden
+++ b/tensorflow/compiler/aot/codegen_test_h.golden
@@ -56,9 +56,9 @@ namespace bar {
 //
 // Memory stats:
 //   arg bytes total:    104
-//   arg bytes aligned:  128
+//   arg bytes aligned:  192
 //   temp bytes total:   126
-//   temp bytes aligned: 224
+//   temp bytes aligned: 320
 class MyClass : public tensorflow::XlaCompiledCpuFunction {
  public:
   // Number of input arguments for the compiled computation.
diff --git a/tensorflow/compiler/aot/runtime.h b/tensorflow/compiler/aot/runtime.h
index d085864f00..d1a669ceb1 100644
--- a/tensorflow/compiler/aot/runtime.h
+++ b/tensorflow/compiler/aot/runtime.h
@@ -25,8 +25,8 @@ namespace tensorflow {
 namespace tfcompile {
 namespace runtime {
 
-// Align to 32-bytes, to mimic tensorflow::Allocator::kAllocatorAlignment.
-static constexpr size_t kAlign = 32;
+// Align to 64-bytes, to mimic tensorflow::Allocator::kAllocatorAlignment.
+static constexpr size_t kAlign = 64;
 
 // aligned_buffer_bytes returns the sum of each size in `sizes`, skipping -1
 // values.  There are `n` entries in `sizes`.  Each buffer is aligned to kAlign
diff --git a/tensorflow/compiler/aot/runtime_test.cc b/tensorflow/compiler/aot/runtime_test.cc
index 6d603a02eb..06ec623eb2 100644
--- a/tensorflow/compiler/aot/runtime_test.cc
+++ b/tensorflow/compiler/aot/runtime_test.cc
@@ -24,7 +24,7 @@ namespace runtime {
 namespace {
 
 TEST(Runtime, AlignmentValue) {
-  // We've chosen 32 byte alignment for the tfcompile runtime to mimic the
+  // We've chosen 64 byte alignment for the tfcompile runtime to mimic the
   // regular tensorflow allocator, which was chosen to play nicely with Eigen.
   // The tfcompile runtime also has a requirement that comes from the xla
   // generated code, on the relation: buffer_size >= 16 ? 2 * sizeof(void*) : 8
@@ -39,13 +39,13 @@ TEST(Runtime, AlignedBufferBytes) {
   EXPECT_EQ(aligned_buffer_bytes(sizesA, 1), 0);
 
   static constexpr intptr_t sizesB[1] = {3};
-  EXPECT_EQ(aligned_buffer_bytes(sizesB, 1), 32);
+  EXPECT_EQ(aligned_buffer_bytes(sizesB, 1), 64);
 
   static constexpr intptr_t sizesC[1] = {32};
-  EXPECT_EQ(aligned_buffer_bytes(sizesC, 1), 32);
+  EXPECT_EQ(aligned_buffer_bytes(sizesC, 1), 64);
 
   static constexpr intptr_t sizesD[7] = {1, -1, 32, -1, 64, 2, 3};
-  EXPECT_EQ(aligned_buffer_bytes(sizesD, 7), 192);
+  EXPECT_EQ(aligned_buffer_bytes(sizesD, 7), 320);
 }
 
 void* add_ptr(void* base, uintptr_t delta) {
@@ -101,11 +101,11 @@ TEST(Runtime, MallocFreeContiguousBuffers) {
   EXPECT_NE(base, nullptr);
   EXPECT_EQ(bufD[0], add_ptr(base, 0));
   EXPECT_EQ(bufD[1], nullptr);
-  EXPECT_EQ(bufD[2], add_ptr(base, 32));
+  EXPECT_EQ(bufD[2], add_ptr(base, 64));
   EXPECT_EQ(bufD[3], nullptr);
-  EXPECT_EQ(bufD[4], add_ptr(base, 64));
-  EXPECT_EQ(bufD[5], add_ptr(base, 128));
-  EXPECT_EQ(bufD[6], add_ptr(base, 160));
+  EXPECT_EQ(bufD[4], add_ptr(base, 128));
+  EXPECT_EQ(bufD[5], add_ptr(base, 192));
+  EXPECT_EQ(bufD[6], add_ptr(base, 256));
   for (int i = 0; i < 7; ++i) {
     const intptr_t size = sizesD[i];
     if (size != -1) {
author	Mark Ryan <mark.d.ryan@intel.com>	2018-05-17 18:17:39 +0100
committer	Rasmus Munk Larsen <rmlarsen@google.com>	2018-05-17 10:17:39 -0700
commit	ba30ba07b213687d0014a2149963780a26c59e64 (patch)
tree	49ab6db0aa671ac60fd08cb4ea2d3fb77f788c12 /tensorflow/compiler/aot
parent	9b41e5158e8bcc8e1853161dd24738afbd3573f5 (diff)