diff options
author | Philip Langdale <philipl@overt.org> | 2018-09-29 18:00:19 -0700 |
---|---|---|
committer | sfan5 <sfan5@live.de> | 2018-10-22 21:35:48 +0200 |
commit | da1073c247523d07d0485348447fcc02000afee8 (patch) | |
tree | 42034a476202e2e03d6dd06a705b3ef1ef95da32 /wscript | |
parent | 621389134afd3026b7e3508dba070442c4eeefa0 (diff) |
vo_gpu: vulkan: hwdec_cuda: Add support for Vulkan interop
Despite their place in the tree, hwdecs can be loaded and used just
fine by the vulkan GPU backend.
In this change we add Vulkan interop support to the cuda/nvdec hwdec.
The overall process is mostly straight forward, so the main observation
here is that I had to implement it using an intermediate Vulkan buffer
because the direct VkImage usage is blocked by a bug in the nvidia
driver. When that gets fixed, I will revist this.
Nevertheless, the intermediate buffer copy is very cheap as it's all
device memory from start to finish. Overall CPU utilisiation is pretty
much the same as with the OpenGL GPU backend.
Note that we cannot use a single intermediate buffer - rather there
is a pool of them. This is done because the cuda memcpys are not
explicitly synchronised with the texture uploads.
In the basic case, this doesn't matter because the hwdec is not
asked to map and copy the next frame until after the previous one
is rendered. In the interpolation case, we need extra future frames
available immediately, so we'll be asked to map/copy those frames
and vulkan will be asked to render them. So far, harmless right? No.
All the vulkan rendering, including the upload steps, are batched
together and end up running very asynchronously from the CUDA copies.
The end result is that all the copies happen one after another, and
only then do the uploads happen, which means all textures are uploaded
the same, final, frame data. Whoops. Unsurprisingly this results in
the jerky motion because every 3/4 frames are identical.
The buffer pool ensures that we do not overwrite a buffer that is
still waiting to be uploaded. The ra_buf_pool implementation
automatically checks if existing buffers are available for use and
only creates a new one if it really has to. It's hard to say for sure
what the maximum number of buffers might be but we believe it won't
be so large as to make this strategy unusable. The highest I've seen
is 12 when using interpolation with tscale=bicubic.
A future optimisation here is to synchronise the CUDA copies with
respect to the vulkan uploads. This can be done with shared semaphores
that would ensure the copy of the second frames only happens after the
upload of the first frame, and so on. This isn't trivial to implement
as I'd have to first adjust the hwdec code to use asynchronous cuda;
without that, there's no way to use the semaphore for synchronisation.
This should result in fewer intermediate buffers being required.
Diffstat (limited to 'wscript')
-rw-r--r-- | wscript | 4 |
1 files changed, 2 insertions, 2 deletions
@@ -846,11 +846,11 @@ hwaccel_features = [ }, { 'name': 'ffnvcodec', 'desc': 'CUDA Headers and dynamic loader', - 'func': check_pkg_config('ffnvcodec >= 8.1.24.1'), + 'func': check_pkg_config('ffnvcodec >= 8.2.15.3'), }, { 'name': '--cuda-hwaccel', 'desc': 'CUDA hwaccel', - 'deps': 'gl && ffnvcodec', + 'deps': '(gl || vulkan) && ffnvcodec', 'func': check_true, } ] |