vo_gpu: vulkan: support split command pools

Instead of using a single primary queue, we generate multiple vk_cmdpools and pick the right one dynamically based on the intent. This has a number of immediate benefits: 1. We can use async texture uploads 2. We can use the DMA engine for buffer updates 3. We can benefit from async compute on AMD GPUs Unfortunately, the major downside is that due to the lack of QF ownership tracking, we need to use CONCURRENT sharing for all resources (buffers *and* images!). In theory, we could try figuring out a way to get rid of the concurrent sharing for buffers (which is only needed for compute shader UBOs), but even so, the concurrent sharing mode doesn't really seem to have a significant impact over here (nvidia). It's possible that other platforms may disagree. Our deadlock-avoidance strategy is stupidly simple: Just flush the command every time we need to switch queues, and make sure all submission and callbacks happen in FIFO order. This required lifting the cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some functions died/got moved as a result, but that's a relatively minor change. On my hardware this is a fairly significant performance boost, mainly due to async transfers. (Nvidia doesn't expose separate compute queues anyway). On AMD, this should be a performance boost as well due to async compute.
author: Niklas Haas <git@haasn.xyz> 2017-09-24 15:05:24 +0200
committer: Martin Herkt <652892+lachs0r@users.noreply.github.com> 2017-12-25 00:47:53 +0100
commit: bded247fb53558dd5cba26560d1f24e9234ae24e (patch)
tree: 1e2c1819fc009acf9eea0d481a003799f7ffda8c /video/out/vulkan/malloc.c
parent: a3c9685257e60e32646bb54a895ef7574a945f69 (diff)
1 files changed, 12 insertions, 0 deletions
diff --git a/video/out/vulkan/malloc.c b/video/out/vulkan/malloc.c
index f6cb1143bb..a9aced33d8 100644
--- a/video/out/vulkan/malloc.c
+++ b/video/out/vulkan/malloc.c
@@ -133,11 +133,23 @@ static struct vk_slab *slab_alloc(struct mpvk_ctx *vk, struct vk_heap *heap,
 
     uint32_t typeBits = heap->typeBits ? heap->typeBits : UINT32_MAX;
     if (heap->usage) {
+        // FIXME: Since we can't keep track of queue family ownership properly,
+        // and we don't know in advance what types of queue families this buffer
+        // will belong to, we're forced to share all of our buffers between all
+        // command pools.
+        uint32_t qfs[3] = {0};
+        for (int i = 0; i < vk->num_pools; i++)
+            qfs[i] = vk->pools[i]->qf;
+
         VkBufferCreateInfo binfo = {
             .sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO,
             .size  = slab->size,
             .usage = heap->usage,
+            .sharingMode = vk->num_pools > 1 ? VK_SHARING_MODE_CONCURRENT
+                                             : VK_SHARING_MODE_EXCLUSIVE,
             .sharingMode = VK_SHARING_MODE_EXCLUSIVE,
+            .queueFamilyIndexCount = vk->num_pools,
+            .pQueueFamilyIndices = qfs,
         };
 
         VK(vkCreateBuffer(vk->dev, &binfo, MPVK_ALLOCATOR, &slab->buffer));
author	Niklas Haas <git@haasn.xyz>	2017-09-24 15:05:24 +0200
committer	Martin Herkt <652892+lachs0r@users.noreply.github.com>	2017-12-25 00:47:53 +0100
commit	bded247fb53558dd5cba26560d1f24e9234ae24e (patch)
tree	1e2c1819fc009acf9eea0d481a003799f7ffda8c /video/out/vulkan/malloc.c
parent	a3c9685257e60e32646bb54a895ef7574a945f69 (diff)