| Commit message (Collapse) | Author | Age |
|
|
|
|
| |
Reason: Allow `LOG(ERROR) << shape` (currently disallowed).
PiperOrigin-RevId: 189687162
|
|
|
|
| |
PiperOrigin-RevId: 186674197
|
|
|
|
|
|
| |
convenience, some kernels call RemoveDim multiple times, or even in a loop, which ends up being an O(dims()*k) operation instead of O(dims()).
PiperOrigin-RevId: 169473109
|
|
|
|
|
|
|
| |
The goal is to make kernels mostly independent of proto headers, which will let us lock down our .so imports.
RELNOTES: Remove proto.h includes from tensorflow/core headers. This may break users who has written custom c++ ops.
PiperOrigin-RevId: 166237236
|
|
|
|
|
|
|
|
|
|
|
|
| |
The goal is to make kernels mostly independent of proto headers, which will let
us lock down our .so imports. This CL does not remove any actual headers, but
changes a bunch of files so that header removal is possible in a followup CL.
It also marks the headers that will be removed with
// TODO(b/62899350): Remove
RELNOTES: n/a
PiperOrigin-RevId: 160552878
|
|
|
|
|
|
|
|
| |
The current behavior, which relies on a TensorShape to store the dense shape,
can lead to CHECK failures if a SparseTensor is created with a dense_shape that is
too lare.
PiperOrigin-RevId: 158521473
|
|
|
|
|
|
|
|
|
| |
Once we use a custom NodeInfo class to represent NodeDef in memory, shape attrs can be represented using a native C++ class. Unfortunately, NodeInfo won't know whether a shape attr is supposed to be partial or not. After this change, both classes have the same memory representation as their common base class TensorShapeRep. TensorShapeRep is always safely static_castable to PartialTensorShape, and can be safely cast to TensorShape as long as num_elements() != -1.
As a benefit, users of PartialTensorShape will be faster.
RELNOTES: Unify memory representations of TensorShape and PartialTensorShape. As a consequence, tensors now have a maximum of 254 dimensions, not 255.
PiperOrigin-RevId: 156628598
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The limit was preventing valid uses of TensorShape as the dense shape of very
large sparse tensors. There's no security advantage to the limit, since a
memory allocation of 2**40 bytes is already far beyond a reasonable machine
size. The new limit is std::numeric_limits<int64>::max().
In addition, the previous TensorShape code did not check for overflow when
multiplying, which meant an operation as simple as
tf.gather(tf.zeros([2**5, 2**60 + 1]), 7).eval()
would appear as valid during TensorShape construction and then crash.
A new MultiplyWithoutOverflow function does the correct overflow checking.
Fixes #8494.
Change: 151778176
|
|
|
|
|
|
| |
than the maximum allowed number of elements
Change: 150898785
|
|
|
|
|
|
|
|
|
|
| |
. grow the table when it is full.
. add support for checkpointing and restoring.
. allow vectors as values.
. allow vectors as keys, including corresponding changes to common lookup code.
. make sure the op is placed on parameter servers.
. register the kernel for more data types.
Change: 136541899
|
|
|
|
| |
Change: 135747447
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a chance of compiling error that the second argument is compiled as "int" while the first one is "unsigned int".
Modify it to "256u" to avoid the potential compiling issue.
Error log (partial):
./base/logging.h:592:32: error: comparison of integers of different signs: 'const unsigned int' and 'const int' [-Werror,-Wsign-compare]
DEFINE_CHECK_OP_IMPL(Check_LT, < )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
./third_party/tensorflow/core/framework/tensor_shape.h:203:5: note: in instantiation of function template specialization 'Check_LTImpl<unsigned int, int>' requested here
DCHECK_LT(static_cast<uint32>(dt), 256);
^
Change: 132219253
|
|
|
|
|
|
| |
than using the private data_type() of TensorShape.
Change: 130441713
|
|
|
|
|
|
|
|
|
|
|
| |
make it more flexible by allowing matrix operations with any number of input and output arguments. Get rid of binary_linalg_ops_common.* which was mostly code duplicated from UnaryLinearAlgebraOp.
By providing commonly used code for shape validation and cost model a lot of duplicated code in the individual ops can be deleted.
Update all linear algebra ops to use the new interface.
This is in preparation for adding linear algebra ops with multiple outputs, such as SVD.
Change: 128502636
|
|
|
|
|
|
| |
conv3d_backprop_{input,filter}, respectively. Fixes bug #2467.
Change: 124848019
|
|
|
|
| |
Change: 123900938
|
|
|
|
|
|
|
|
| |
(MakeShape now takes an int64 instead of an int, avoiding
some of the casting ugliness and reducing the need for callers
to do their own, redundant checks).
Fixing additional int32->64 warnings
Change: 121498517
|
|
|
|
|
|
|
|
|
| |
- Cache result of LogMemory::IsEnabled.
- Init Tensor for logging only if logging is enabled.
- Add move constructor to Tensor and use it in ProcessOutputs.
Also added benchmark for copy vs move in tensor_test.cc
Change: 121487202
|
|
|
|
| |
Change: 119423048
|
|
|
|
| |
Change: 119416366
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for mobile apps):
(1) Change many interfaces in node_def_builder.h, node_def_util.h,
op_kernel.h, node_builder.h, mirror_pad_mode.h, padding.h to use
'StringPiece', rather than 'const string&'. The interfaces that
were changed tend to be heavily used in the registration of ops and
kernels, and often caused extra string construction code to be emitted
in the macro expansion of each op or kernel registration.
(2) Move some repetitive CHECK operations into non-inlined routines in
tensor.cc, rather than having them in inlined or templated routines in
tensor.h (new Tensor::CheckDataType, Tensor::CheckTypeAndIsAligned,
and Tensor::CheckIsAlignedAndSingleElement routines)
(3) Factored out internal template<size_t NDIMS>
Tensor::FillDimsAndValidateCompatibleShape routine, to be shared
across more specialized templated routines (typically specialized on
both DataType and NDIMS).
(4) Added new non-inlined TensorShape::CheckDimsMatch(int NDIMS) routine in
tensor_shape.cc, that can be called from various TensorShape routines templated
on NDIMS.
(5) Don't inline single-argument StrCat, since it involves a string
creation, etc.
(6) Remove inline keyword from template <typename... AV> StrCat
version that handles 5 or more arguments.
Reduces text size for
third_party/tensorflow/core/libandroid_tensorflow_lib.so built in
Google build environment by 1.43%, as
measured by:
% blaze build -c opt --config=android_arm \
third_party/tensorflow/core:android_tensorflow_lib
% size blaze-bin/third_party/tensorflow/core/libandroid_tensorflow_lib.so
Change: 118036659
|
|
|
|
|
| |
Also move some functions from header to C++.
Change: 117255894
|
|
|
|
|
| |
Also move some functions from header to C++.
Change: 117167529
|
|
|
|
|
| |
Also move some functions from header to C++.
Change: 117148807
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(1) Don't inline KernelDefBuilder::TypeConstraint
(2) Don't inline AttrSlice constructors
(3) Don't inline OpKernelConstruction::SetStatus (it's only called on error
paths anyway)
(4) Moved the slow path of OpKernelContext::record_tensor_reference into
an out-of-line really_record_tensor_reference helper
(5) Don't inline large OpKernelContext routines:
OpKernelContext::input(int index);
OpKernelContext::mutable_input(int index, bool lock_held);
OpKernelContext::replace_ref_input(int index, const Tensor& tensor,
bool lock_held);
OpKernelContext::forward_ref_input_to_ref_output(int input_index,
int output_index);
OpKernelContext::delete_ref_input(int index, bool lock_held)
(6) Add CtxFailure and CtxFailureWithWarning helper routines, and call
these from the OP_REQUIRES macros. Since these macros are called
in lots of places, this signficantly reduces code space. This
involved moving the OP_REQUIRES macros from lib/core/errors.h to
lib/framework/op_kernel.h.
(7) Don't inline some of the templated routines in Tensor for getting various
Eigen views of the Tensor data.
(8) Made the uncommon part of the TensorShape destructor
(for handling REP_OUT_OF_LINE case) be out of line.
(9) Don't inline constructors for NodeBuilder::NodeOut nested type.
(10) Don't inline NodeBuilder::Attr templated routine
(11) Moved more shared code into BinaryOpShare::BinaryOpState helper struct,
to avoid it being replicated in every BinaryOp for every numeric type.
(12) Moved some op validation routines to be out of line helper static functions
rather than being inlined into templated code bodies
(relu_op.cc, scatter_op.cc, segment_reduction_ops.cc,
(13) Moved QueueBase destructor out of line.
(14) Reworked some of the template <int NDIMS> Operate methods for gradients
that did not depend on NDIMS to call an OperateNoTemplate routine,
reducing the amount of duplication by NDIMS:
(softplus_op.cc, softsign_op.cc, relu_op.cc)
Text size reductions of 2.1% for
third_party/tensorflow/cc/tutorials_example_trainer binary when built
using Google build rules (which links in lots of other Google-related
libraries). Reduction of symbols in the tensorflow:: namespace is
12.2% (8814545 bytes down to 7740233 bytes).
Performance differences are in the noise when measured using
ptb_word_lm:
GPU build (--config=cuda -c opt --copt=-mavx):
Baseline: 8237 words per second
This cl: 8245 words per second
CPU build (-c opt --copt=-mavx):
Baseline: 3568 words per second
This cl: 3551 words per second
Change: 116040603
|
|
|
|
|
| |
considerably. Shrinks text size of example_trainer binary by ~1.5%.
Change: 115578002
|
|
|
|
| |
Change: 115243253
|
|
|
|
|
| |
Fixes #655.
Change: 114542161
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
representation to the Tensor class, and use this ability to store the
DataType in the extra byte that is unused inside the TensorShape
representation. This makes sizeof(Tensor) == 32 bytes on a machine
with 8-byte pointers, instead of 40 bytes.
Change the value field in AllocatorAttributes from uint32 to uint8, and then
rearranged the order of the fields in ExecutorState::Entry so that this fits
in the same word as the has_value boolean. After the Tensor change above,
Entry was 72 bytes, and this change makes it 64 bytes.
Change: 114358788
|
|
|
|
| |
Change: 114224625
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for the common cases (not very many dimensions, dimension sizes fairly
small).
The old TensorShape implementation was 56 bytes, while this new
implementation is 24 bytes in the common cases (and Tensor objects
have a contained TensorShape object).
The implementation has a 16 byte buffer, plus an int64 num_elements_ field.
It switches between three representations that use the 16 byte buffer
in different ways.
The last two bytes of the 16 bytes are always a tag byte, indicating
which representation is in use, and a number of dimensions byte that
indicates the number of dimensions in the shape. This change also
introduces a new restriction that tensor shapes can be at most 256
distinct dimensions.
The different representations are:
REP16: up to 7 dimensions stored inline, where each dimension value is
less than 32768.
REP32: up to 3 dimensions stored inline, where each dimension value is
less than 2^31 - 1.
REP_OUT_OF_LINE: the fallback case, where the inline bytes are used to
store a pointer to a heap allocated gtl::InlinedVector<int64, 4> that
holds the shape dimensions.
Changed the signature of TensorShape::dim_sizes() to return an actual
gtl::InlinedVector<int64, 4> rather than an ArraySlice<int64> (the old
signature was problematic because it placed unnecessary requirements
on the implementation, namely that it store the dimensions as a
contiguous array of in64 values, which the new representation does not
do for REP16 and REP32).
Preserved the old implementation by renaming it TensorShapeOld and
moving it to the tensor_shape_test.cc unittest, where it used for a
randomized test to ensure that the behavior is the same.
Most of the benefit is that the TensorShape object is smaller. Assignment
is also generally faster, except for the rare REP_OUT_OF_LINE case, which
is a bit slower (benchmark case /4):
Run on machine with (40 X 2801 MHz CPUs); 2016/01/29-11:46:08
CPU: Intel Ivybridge with HyperThreading (20 cores) dL1:32KB dL2:256KB dL3:25MB
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------
BM_TensorShape_Assign/0 1 1 580669339
BM_TensorShape_Assign/1 1 1 588612006
BM_TensorShape_Assign/2 1 1 586977731
BM_TensorShape_Assign/3 1 1 564029390
BM_TensorShape_Assign/4 61 62 10000000
BM_TensorShapeOld_Assign/0 4 4 183910667
BM_TensorShapeOld_Assign/1 6 6 100000000
BM_TensorShapeOld_Assign/2 6 6 100000000
BM_TensorShapeOld_Assign/3 8 8 88330961
BM_TensorShapeOld_Assign/4 39 39 17701395
Change: 114162871
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a test that failed before this change, but passes after.
TensorShapeIter is only used by TensorShape::begin() and end(),
which are only used by AppendShape, which is only used by queue_base.h
on shapes that are unlikely to be 64-bit large, which is why
we never triggered this.
Discovered by jeff
Change: 113105906
|
|
|
|
|
| |
The two functions already have the same behavior.
Change: 112959229
|
|
|
|
|
|
| |
The two functions already have the same behavior, and ShortDebugString
will disappear soon.
Change: 112793490
|
|
we copy the original files to their new location and make the public/
versions #include the new location. Once all references are updated
to point to the new location, we can delete the originals in public/.
Change: 112622561
|