| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 216230391
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- EncodeArg in C instead of python.
- Also caches parsed device specs, and device spec hashes
- Adds a common way to register python types in C.
- Fastpath canonicalize function inputs when no kwargs are passed
- Set the func name attr directly instead of creating an op to wrap it.
- Rewrite IsAttrsHelper without caching
Before:
entry {
name: "MicroBenchmarks.benchmark_defun_matmul_2_by_2_CPU"
iters: 30000
wall_time: 101.803263028
extras {
key: "examples_per_sec"
value {
double_value: 9822.86785562
}
}
}
After:
entry {
name: "MicroBenchmarks.benchmark_defun_matmul_2_by_2_CPU"
iters: 30000
wall_time: 47.2899993261
extras {
key: "examples_per_sec"
value {
double_value: 21146.1199884
}
}
}
PiperOrigin-RevId: 215272962
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tests added to pywrap_tfe_test.py would fail
(segmentation fault / infinite loop)
without corresponding fixes to pywrap_tfe.i and pywrap_tfe_src.cc
Other statements that would fail ungracefully without this fix
(and with eager execution enabled) include:
tf.split(value=0, num_or_size_splits=-1)
tf.dynamic_partition(data=0, partitions=0, num_partitions=-1)
tf.split(value=0, num_or_size_splits=1.23, num=-1)
tf.unstack(value=0, num=-1)
PiperOrigin-RevId: 212731927
|
|
|
|
|
|
|
|
|
| |
GradientTape.
For more complex use cases this allows fine grained control over what is tracked
by the tape.
PiperOrigin-RevId: 211948236
|
|
|
|
|
|
|
|
| |
I don't believe there is currently a use-case for a different VSpace (and it doesn't seem to be controllable through any public method).
If it is a usecase we want to support, it should be simple enough to add an overload of TFE_Py_TapeGradient.
PiperOrigin-RevId: 211917235
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows fine grained control over recording in some cases, for example the
following where we want d2y but not d2z:
x1 = tf.Variable(2.0, trainable=False)
x2 = tf.Variable(2.0, trainable=False)
with tf.GradientTape() as tape1:
with tf.GradientTape() as tape2:
tape1.watch(x1)
tape2.watch([x1, x2])
y = x1 ** 3
z = x2 ** 2
dy, dz = tape2.gradient([y, z], [x1, x2])
d2y, d2z = tape1.gradient([dy, dz], [x1, x2])
assert d2z is None
PiperOrigin-RevId: 211206506
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Adopts a minimal sensible policy for step containers: starting a graident tape
creates a step container; inner tapes do nothing; popping out of the outermost
tape will reset that step container. This should allow us to have reasonable
behavior in the presence of step-container-scoped things for a while. Ideally
we'll move away from them in favor of lists but the infrastructure isn't ready
yet.
PiperOrigin-RevId: 207911091
|
|
|
|
|
|
|
|
|
| |
Any time that the server def is updated, the context is effectively "reset" by clearing all the caches.
- Check that the FLR returned is not a nullptr instead of seg faulting.
- Consolidate caches within the context object.
PiperOrigin-RevId: 207308086
|
|
|
|
|
|
| |
the grpc_tensorflow_server.
PiperOrigin-RevId: 201198350
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change includes the following steps to make
EagerTensor profiler work:
- Add a PaddedShapeFn to XlaDevice::Metadata. We need a
backend-independent way to get a fully-padded shape and
its layout on the device. This function is set during
device construction. CPU and GPU devices effectively get
an identity function since they neither change the layout
nor pad. TPU gets the appropriate function.
- Add TFE_TensorDebugInfo struct and C API methods for it.
These methods are necessary to fetch the shape and layout
from under the C API to the Python level. This can be a home
for more debug information later.
- Make EagerTensor weak referencable. This involves adding a
pointer to the list of current weak references. This addition
should have negligible overhead when profiler is not used.
The only operations on this field are setting it to null on
construction and checking if it is null on destruction.
- Adding C++ functions callable from Python to register an instance
of EagerTensorProfiler and retrieve debug information for a given
EagerTensor. These functions are used in the new "inspect" module.
- Finally, writing the actual profiler.
PiperOrigin-RevId: 198098380
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(_magic_gradient_function was renamed to _gradient_function)
Before:
entry {
name: "MicroBenchmarks.benchmark_tf_gradient_forward_identity"
iters: 30000
wall_time: 5.88456789653
extras {
key: "examples_per_sec"
value {
double_value: 169936.011885
}
}
}
After:
entry {
name: "MicroBenchmarks.benchmark_tf_gradient_forward_identity"
iters: 30000
wall_time: 5.04853725433
extras {
key: "examples_per_sec"
value {
double_value: 198077.175551
}
}
}
PiperOrigin-RevId: 197972668
|
|
|
|
|
|
|
|
| |
- (pywrap_tfe.i) Improve error message for better debugging TFE_Py_Execute failures.
- (pywrap_tfe_src.cc) Accept _value of None
- (base.i) Remove unnecessary temporary
PiperOrigin-RevId: 197073571
|
|
|
|
| |
PiperOrigin-RevId: 196995160
|
|
|
|
|
|
|
|
|
|
| |
It turns out regular functions need to manually copy handle data in
addition to eager GraphModeFunctions, so I moved the C extensions to
python_api.h from eager/c_api.h.
This also cleans up function_test.py to assume the C API is enabled.
PiperOrigin-RevId: 194158700
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
193422827 by yifeif:
Fix buildifier error.
--
193421691 by skyewm:
Make GraphModeFunctions work with _USE_C_SHAPES=True.
Tensor._handle_data is going away. This change adds special hooks for
propagating the resource handle shape information through
EagerTensors.
--
193421473 by A. Unique TensorFlower:
Register dynamic_stitch for DT_VARIANT type.
--
193421175 by nolivia:
disabling flaky tsan test
--
193420117 by nolivia:
disabling flaky test in tensorflow that has no apparent culprit
--
PiperOrigin-RevId: 193422827
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before:
TFE_Context would check nullptr, and the function would fail straight away.
Now:
TFE_Context is nullptr, so it skips down to checking the status, and an error
is raised.
I'm not able to find in SWIG documentation how to order typemaps in the
generated code - ideally, I'd order it to check the status typemap first. This
code makes it not dependent on this ordering either way.
PiperOrigin-RevId: 191905893
|
|
|
|
|
|
| |
Minor fixes to make this work.
PiperOrigin-RevId: 191457070
|
|
|
|
| |
PiperOrigin-RevId: 189101670
|
|
|
|
| |
PiperOrigin-RevId: 187676012
|
|
|
|
| |
PiperOrigin-RevId: 186522240
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MatMul benchmarks:
entry {
name: "MicroBenchmarks.benchmark_gen_math_ops_matmul_2_by_2_CPU"
iters: 30000
wall_time: 11.580435435
extras {
key: "examples_per_sec"
value {
double_value: 86352.538781
}
}
}
entry {
name: "MicroBenchmarks.benchmark_tfe_py_fastpath_execute_matmul_2_by_2_CPU"
iters: 30000
wall_time: 7.02576637268
extras {
key: "examples_per_sec"
value {
double_value: 142333.227004
}
}
}
PiperOrigin-RevId: 184734289
|
|
|
|
|
|
| |
Fixes #16106
PiperOrigin-RevId: 183137298
|
|
|
|
| |
PiperOrigin-RevId: 183093407
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Benchmarks:
(new) tfe_py_fastpath_execute_matmul: 7.86213080088 walltime
(old) gen_math_ops_matmul: 11.2566947937 walltime
(The slowdown is due to adding the record_gradient callback)
This will be a 3-step process:
1. (This CL) Add a function that allows execution of an op on the fastpath.
2. Update python generation code to create 2 new python functions
(_op/_op_gradient_callback and _op_slowpath_fallback):
-- first function (_op) checks if it is in graph mode, if so, do the normal
thing, else call out to the function added in step 1.
-- second function does the else part similar to today (calling out to
args_to_matching_eager etc.)
3. Rename the first function generated above to be the canonical _op function.
PiperOrigin-RevId: 182791741
|
|
|
|
| |
PiperOrigin-RevId: 181317960
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The set of tapes needs to be global to enable multithreaded programming
(when it's natural for tensors to cross threads during reduction operations)
but each thread still needs to be able to locally pause recording while
it does gradient-related bookkeeping (like custom gradients or initialization).
Also removes a mutex from the thread-local structure since it's unnecessary
as we're always holding the GIL while calling across the python-c boundary
unless we explicitly release it.
PiperOrigin-RevId: 181246570
|
|
|
|
|
|
| |
"big hammer" required for reproducibility.
PiperOrigin-RevId: 180961787
|
|
|
|
|
|
| |
Rolls back the rollback with some swiggery to get python3 to work.
PiperOrigin-RevId: 177470328
|
|
|
|
| |
PiperOrigin-RevId: 177418947
|
|
|
|
| |
PiperOrigin-RevId: 177375237
|
|
|
|
| |
PiperOrigin-RevId: 175704617
|
|
|
|
| |
PiperOrigin-RevId: 175531148
|
|
|
|
|
|
| |
Neutral-to-positive on all benchmarks. Also reduces overhead of should_record.
PiperOrigin-RevId: 175057104
|
|
|
|
|
|
|
|
| |
TFE_Py_TensorShapeSlice takes a list of EagerTensors and returns a list
of their i'th dimensions. This utility is fairly niche but it is simple
and reduces SPINN training time by over 12%.
PiperOrigin-RevId: 174065044
|
|
|
|
| |
PiperOrigin-RevId: 172943398
|
|
|
|
| |
PiperOrigin-RevId: 172818175
|
|
|
|
|
|
| |
The tape stack is still in python as is the backprop code.
PiperOrigin-RevId: 172151189
|
|
|
|
|
|
| |
Removes TFE_NewOp and TFE_OpGetAttrType from pywrap_tensorflow, adds TFE_OpNameGetAttrType.
PiperOrigin-RevId: 171302338
|
|
|
|
| |
PiperOrigin-RevId: 170617321
|
|
|
|
| |
PiperOrigin-RevId: 169043830
|
|
|
|
|
|
| |
Python/NumPy to C/C++.
PiperOrigin-RevId: 168983402
|
|
|
|
| |
PiperOrigin-RevId: 165467540
|
|
|
|
| |
PiperOrigin-RevId: 165379414
|
|
|
|
|
|
| |
tf.name_scope, tf.device, tf.control_dependencies EAGER friendly.
PiperOrigin-RevId: 165269572
|
|
PiperOrigin-RevId: 164902588
|