mpv - video player based on MPlayer/mplayer2

	Commit message (Collapse)	Author	Age
...
*	vo_gpu: refactor HDR peak detection algorithm	Niklas Haas	2018-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The major changes are as follows: 1. Use `uint32_t` instead of `unsigned int` for the SSBO size calculation. This doesn't really matter, since a too-big buffer will still work just fine, but since `uint` is a 32-bit integer by definition this is the correct way to do it. 2. Pre-divide the frame_sum by the num_wg immediately at the end of a frame. This change was made to prevent overflow. At 4K screen size, this code is currently already very at risk of overflow, especially once I started playing with longer averaging sizes. Pre-dividing this out makes it just about fit into 32-bit even for worst-case PQ content. (It's technically also faster and easier this way, so I should have done it to begin with). Rename `frame_sum` to `frame_avg` to clearly signal the change in semantics. 3. Implement a scene transition detection algorithm. This basically compares the current frame's average brightness against the (averaged) value of the past frames. If it exceeds a threshold, which I experimentally configured, we reset the peak detection SSBO's state immediately - so that it just contains the current frame. This prevents annoying "eye adaptation"-like effects on scene transitions. 4. As a result of the previous change, we can now use a much larger buffer size by default, which results in a more stable and less flickery result. I experimented with values between 20 and 256 and settled on the new value of 64. (I also switched to a power-of-2 array size, because I like powers of two)
*	wayland_common: fix idle_inhibitor protocol segfault	Rostislav Pehlivanov	2018-02-09
\| \| \| \|	The pointer is used as a state and wasn't zeroed after seeks.
*	drmprime interop : Add frames triple buffering	LongChair	2018-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently using the drmprime interop with external mpv intgration can lead to rendering issues because the current frame is being released too early. Typically using this with Qt results in one frame shift because Qt will do waitforvsync and swap, rather than swap and waitforvsync. This leads to tearing as the frambuffer is released while being displayed on screen. In order to avoid releasing the framebuffer that is displayed, We keep the framebuffer alive for one more frame with triple buffering to make sure that whatever rendering process is used, the framebuffer will not be released when it's still on screen. This was tested on RockChip Rock64
*	vo_gpu: port HDR tone mapping algorithm from libplacebo	Niklas Haas	2018-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current peak detection algorithm was very bugged (which contributed to the excessive cross-frame flicker without long normalization) and also didn't take into account the frame average brightness level. The new algorithm both takes into account frame average brightness (in addition to peak brightness), and also computes the values in a more stable/correct way. (The old path was basically undefined behavior) In addition to improving the algorithm, we also switch to hable tone mapping by default, and try to enable peak computation automatically whever possible (compute shaders + SSBOs supported). We also make the desaturation milder, after extensive testing during libplacebo development. I also had to compensate a bit for the representational differences between mpv and libplacebo (libplacebo treats 1.0 as the reference peak, but mpv treats it as the nominal peak), but it shouldn't have caused any problems. This is still not quite the same as libplacebo, since libplacebo also allows tagging the desired scene average brightness on the output, and it also supports reading the scene average brightness from static metadata (MaxFALL) where available. But those changes are a bit more involved. It's possible we could also read this from metadata in the future, but we have problems communicating with AVFrames as it is and I don't want to touch the mpv colorimetry structs for the time being.
*	vo_gpu: add RA_CAP for gl_NumWorkGroups	Niklas Haas	2018-02-05
\| \| \| \| \|	SPIRV-Cross doesn't support this for the time being. It's possible this could go away again at a later date.
*	vo_gpu: vulkan: correctly enable textureGatherOffset	Niklas Haas	2018-02-05
\| \| \| \|	This also requires a vulkan feature / SPIR-V capability to function
*	vo_gpu: vulkan: don't issue queries for unused timers	Niklas Haas	2018-02-05
\| \| \| \| \| \| \|	The vulkan validation layers warn you if you try requesting a query result from a timer that hasn't even been started yet, so we have to do some extra bit of work to keep track of which indices we've seen so far, and avoid the queries on them.
*	vo_gpu: vulkan: try enabling required features	Niklas Haas	2018-02-05
\| \| \| \| \| \| \|	Instead of enabling every feature under the sun, make an effort to just whitelist the ones we actually might use. Turns out the extended storage format support is needed for some of the storage formats we use, in particular rgba16.
*	vo_gpu: vulkan: add missing buffer barrier fields	Niklas Haas	2018-02-05
\| \| \| \|	These were accidentally omitted.
*	video: rewrite filtering glue code	wm4	2018-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get rid of the old vf.c code. Replace it with a generic filtering framework, which can potentially handle more than just --vf. At least reimplementing --af with this code is planned. This changes some --vf semantics (including runtime behavior and the "vf" command). The most important ones are listed in interface-changes. vf_convert.c is renamed to f_swscale.c. It is now an internal filter that can not be inserted by the user manually. f_lavfi.c is a refactor of player/lavfi.c. The latter will be removed once --lavfi-complex is reimplemented on top of f_lavfi.c. (which is conceptually easy, but a big mess due to the data flow changes). The existing filters are all changed heavily. The data flow of the new filter framework is different. Especially EOF handling changes - EOF is now a "frame" rather than a state, and must be passed through exactly once. Another major thing is that all filters must support dynamic format changes. The filter reconfig() function goes away. (This sounds complex, but since all filters need to handle EOF draining anyway, they can use the same code, and it removes the mess with reconfig() having to predict the output format, which completely breaks with libavfilter anyway.) In addition, there is no automatic format negotiation or conversion. libavfilter's primitive and insufficient API simply doesn't allow us to do this in a reasonable way. Instead, filters can use f_autoconvert as sub-filter, and tell it which formats they support. This filter will in turn add actual conversion filters, such as f_swscale, to perform necessary format changes. vf_vapoursynth.c uses the same basic principle of operation as before, but with worryingly different details in data flow. Still appears to work. The hardware deint filters (vf_vavpp.c, vf_d3d11vpp.c, vf_vdpaupp.c) are heavily changed. Fortunately, they all used refqueue.c, which is for sharing the data flow logic (especially for managing future/past surfaces and such). It turns out it can be used to factor out most of the data flow. Some of these filters accepted software input. Instead of having ad-hoc upload code in each filter, surface upload is now delegated to f_autoconvert, which can use f_hwupload to perform this. Exporting VO capabilities is still a big mess (mp_stream_info stuff). The D3D11 code drops the redundant image formats, and all code uses the hw_subfmt (sw_format in FFmpeg) instead. Although that too seems to be a big mess for now. f_async_queue is unused.
*	vo_gpu: check for RA_CAP_FRAGCOORD in dumb mode too	James Ross-Gowan	2018-01-30
\| \| \| \| \| \| \|	The RA_CAP_FRAGCOORD checks apply to dumb mode as well, but they were after the check for dumb mode, which returns early, so they never ran. Fixes #5436
*	video: fix crash with vdpau when reinitializing rendering	wm4	2018-01-27
\| \| \| \| \| \| \| \| \| \|	Using vdpau will allocate additional textures for the reinterleaving step, which uninit_rendering() will free. This is a problem because the hwdec image remains mapped when reinitializing, so the reinterleaving textures are turned into dangling pointers. Fix this by freeing the reinterleave textures on full uninit instead. Fixes #5447.
*	hwdec: detach d3d and d3d9 hwaccel from angle	myfreeer	2018-01-25
\| \| \| \|	Fix https://github.com/mpv-player/mpv/issues/5420
*	video: change some remaining vo_opengl mentions to vo_gpu	Akemi	2018-01-20
\|
*	osx: code cleanups and cosmetic fixes	Akemi	2018-01-20
\|
*	ta: introduce talloc_dup() and use it in some places	wm4	2018-01-18
\| \| \| \| \| \| \|	It was actually already implemented as ta_dup_ptrtype(), but that seems like a clunky name. Also we still use the talloc_ names throughout the source, and I'd rather use an old name instead of a mixing inconsistent naming conventions.
*	sws_utils: don't force callers to provide option struct	wm4	2018-01-18
\| \| \| \| \| \| \|	mp_sws_set_from_cmdline() has the only purpose to respect the --sws- command line options. Instead of forcing callers to get the option struct containing these, let callers pass mpv_global, and get it from the option core code directly. This avoids minor annoyances later on.
*	vo: log reconfig calls	wm4	2018-01-18
\| \| \| \|	Helpful for debugging, sometimes.
*	video: avoid some unnecessary vf.h includes	wm4	2018-01-18
\|
*	vo_gpu: skip DR for unsupported image formats	wm4	2018-01-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	DR (direct rendering) works by having the decoder decode into the GPU staging buffers, instead of copying the video data on texture upload. We did this even for formats unsupported by the GPU or the renderer. This "worked" because the staging memory is untyped, and the video frame was converted by libswscale to a supported format, and then uploaded with a copy using the normal non-DR texture upload path. Even though it "works", we don't gain anything from using the staging buffers for decoding, since we can't use them for upload anyway. Also, staging memory might be potentially limited (what really happens is up to the driver). It's easy to avoid, so just skip it in these cases.
*	vo_gpu: fix broken 10 bit via integer textures playback	wm4	2018-01-17
\| \| \| \| \| \| \| \| \| \| \|	The check_gl_features(p) call here checks whether dumb mode can be used. It uses the field use_integer_conversion, which is set _after_ the call in the same function. Move check_gl_features() to the end of the function, when use_integer_conversion is finally set. Fixes that it tried to use bilinear filtering with integer textures. The bug disabled the code that is supposed to convert it to non-integer textures.
*	vo_gpu: rpi: defer gl_ctx_resize until after gl_ctx_init	Niklas Haas	2018-01-15
\| \| \| \| \| \| \| \|	This segfaults otherwise. The conditional is needed to break a circular dependency (gl_init depends on mpgl_load_functions which depends on recreate_dispmanx which calls gl_ctx_resize). Fixes #5398
*	video: change some mp_image_pool semantics	wm4	2018-01-13
\| \| \| \| \| \| \| \| \| \|	Remove the max_count creation parameter, because it's pointless and rarely ever did anything. Add a talloc parent parameter instead (which is something completely different, but convenient, and all callers needs to be changed anyway). Instead of clearing the pool when the now removed maximum is reached, clear it on image parameter changes instead.
*	vo_gpu: hwdec_dxva2dxgi: initial implementation	James Ross-Gowan	2018-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	This enables DXVA2 hardware decoding with ra_d3d11. It should be useful for Windows 7, where D3D11VA is not available. Images are transfered from D3D9 to D3D11 using D3D9Ex surface sharing[1]. Following Microsoft's recommendations, it uses a queue of shared surfaces, similar to Microsoft's ISurfaceQueue. This will hopefully prevent surface sharing from impacting parallelism and allow multiple D3D11 frames to be in-flight at once. [1]: https://msdn.microsoft.com/en-us/library/windows/desktop/ee913554.aspx
*	vo_gpu: d3d11: check for NULL backbuffer in start_frame	James Ross-Gowan	2018-01-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	In a lost device scenario, resize() will fail and p->backbuffer will be NULL. We can't recover from lost devices yet, but we should still check for a NULL backbuffer in start_frame() rather than crashing. Also remove a NULL check for p->swapchain. This was a red herring, since p->swapchain never becomes NULL in an error condition, but p->backbuffer actually does. This should fix the crash in #5320, but it doesn't fix the underlying reason for the lost device (which is probably a driver bug.)
*	vo_gpu: d3d11: don't use a bgra8 swapchain	James Ross-Gowan	2018-01-04
\| \| \| \| \| \| \| \| \| \|	Previously, mpv would attempt to use a BGRA swapchain in the hope that it would give better performance, since the Windows desktop is also composited in BGRA. In practice, it seems like there is no noticable performance difference between RGBA and BGRA swapchains and BGRA swapchains cause trouble with a42b8b1142fd, which attempts to use the swapchain format for intermediate FBOs, even though D3D11 does not guarantee BGRA surfaces will work with UAV typed stores.
*	vo_gpu/context_android: replace both options with android-surface-size	sfan5	2018-01-02
\| \| \| \|	This allows us to automatically trigger a VOCTRL_RESIZE (also contained).
*	vo_gpu/android: fallback to EGL_WIDTH/HEIGHT	Aman Gupta	2018-01-01
\| \| \| \| \| \| \| \| \| \|	Uses the EGL width/height by default when the user fails to set the android-surface-width/android-surface-height options. This means the vo-resize command is optional, and does not need to be implemented on android devices which do not support rotation. Signed-off-by: Aman Gupta <aman@tmm1.net>
*	vo_gpu: d3d11: avoid copying staging buffers to cbuffers	James Ross-Gowan	2018-01-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apparently some Intel drivers have a bug where copying from staging buffers to constant buffers does not work. We used to keep a copy of the buffer data in a staging buffer to enable partial constant buffer updates. To work around this bug, keep the copy in talloc-allocated system memory instead. There doesn't seem to be any noticable performance difference from keeping the copy in system memory. Our cbuffers are probably too small for it to matter anyway. See also: https://crbug.com/593024 Fixes #5293
*	player: add internal `vo-resize` command	sfan5	2017-12-27
\| \| \| \|	Intended to be used with the properties from previous commit.
*	vo_gpu/context: Let embedding application handle surface resizes	sfan5	2017-12-27
\| \| \| \| \|	The callbacks for this are Java-only and EGL does not reliably return the correct values.
*	vo_gpu: EGL: provide SwapInterval to generic code	wm4	2017-12-27
\| \| \| \| \| \| \|	This means that we now explicitly set an interval of 1. Although that should be the EGL default, some drivers could possibly ignore this (unconfirmed). In any case, this commit also allows disabling vsync, for users who want it.
*	vo_gpu: vulkan: fix segfault due to index mismatch	Niklas Haas	2017-12-25
\| \| \| \| \| \| \| \|	The queue family index and the queue info index are not necessarily the same, so we're forced to do a check based on the queue family index itself. Fixes #5049
*	vo_gpu: vulkan: fix some image barrier oddities	Niklas Haas	2017-12-25
\| \| \| \| \| \| \| \| \| \| \|	A vulkan validation layer update pointed out that this was wrong; we still need to use the access type corresponding to the stage mask, even if it means our code won't be able to skip the pipeline barrier (which would be wrong anyway). In additiona to this, we're also not allowed to specify any source access mask when transitioning from top_of_pipe, which doesn't make any sense anyway.
*	vo_gpu: vulkan: omit needless #define	Niklas Haas	2017-12-25
\|
*	vo_gpu: vulkan: fix sharing mode on malloc'd buffers	Niklas Haas	2017-12-25
\| \| \| \|	Might explain some of the issues in multi-queue scenarios?
*	vo_gpu: vulkan: fix dummyPass creation	Niklas Haas	2017-12-25
\| \| \| \|	This violates vulkan spec
*	vo_gpu: vulkan: fix the rgb565a1 names -> rgb5a1	Niklas Haas	2017-12-25
\| \| \| \|	This is 5 bits per channel, not 565
*	vo_gpu: vulkan: allow disabling async tf/comp	Niklas Haas	2017-12-25
\| \| \| \| \| \| \| \| \|	Async compute in particular seems to cause problems on some drivers, and even when supprted the benefits are not that massive from the tests I have seen, so it's probably safe to keep off by default. Async transfer on the other hand seems to work better and offers a more substantial improvement, so it's kept on.
*	vo_gpu: vulkan: refine queue family selection algorithm	Niklas Haas	2017-12-25
\| \| \| \| \| \|	This gets confused by e.g. SPARSE_BIT on the TRANSFER_BIT, leading to situations where "more specialized" is ambiguous and the logic breaks down. So to fix it, only compare the subset we care about.
*	vo_gpu: vulkan: prefer vkCmdCopyImage over vkCmdBlitImage	Niklas Haas	2017-12-25
\| \| \| \| \| \|	blit() implies scaling, copy() is the equivalent command to use when the formats are compatible (same pixel size) and the rects have the same dimensions.
*	vo_gpu: attempt re-using the FBO format for p->output_tex	Niklas Haas	2017-12-25
\| \| \| \| \| \| \| \| \|	This allows RAs with support for non-opaque FBO formats to use a more appropriate FBO format for the output tex, possibly enabling a more efficient blit operation. This requires distinguishing between real formats (which can be used to create textures) and fake formats (e.g. ra_gl's FBO hack).
*	vo_gpu: vulkan: properly depend on the swapchain acquire semaphore	Niklas Haas	2017-12-25
\| \| \| \| \|	This is now associated with the ra_tex directly and used in the correct way, rather than hackily done from submit_frame.
*	vo_gpu: vulkan: use correct access flag for present	Niklas Haas	2017-12-25
\| \| \| \|	This needs VK_ACCESS_MEMORY_READ_BIT (spec)
*	vo_gpu: vulkan: make the swapchain more robust	Niklas Haas	2017-12-25
\| \| \| \| \|	Now handles both VK_ERROR_OUT_OF_DATE_KHR and VK_SUBOPTIMAL_KHR for both vkAcquireNextImageKHR and vkQueuePresentKHR in the correct way.
*	vo_gpu: aggressively prefer async compute	Niklas Haas	2017-12-25
\| \| \| \| \| \| \| \| \| \|	On AMD devices, we only get one graphics pipe but several compute pipes which can (in theory) run independently. As such, we should prefer compute shaders over fragment shaders in scenarios where we expect them to be better for parallelism. This is amusingly trivial to do, and actually improves performance even in a single-queue scenario.
*	vo_gpu: vulkan: support split command pools	Niklas Haas	2017-12-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of using a single primary queue, we generate multiple vk_cmdpools and pick the right one dynamically based on the intent. This has a number of immediate benefits: 1. We can use async texture uploads 2. We can use the DMA engine for buffer updates 3. We can benefit from async compute on AMD GPUs Unfortunately, the major downside is that due to the lack of QF ownership tracking, we need to use CONCURRENT sharing for all resources (buffers and images!). In theory, we could try figuring out a way to get rid of the concurrent sharing for buffers (which is only needed for compute shader UBOs), but even so, the concurrent sharing mode doesn't really seem to have a significant impact over here (nvidia). It's possible that other platforms may disagree. Our deadlock-avoidance strategy is stupidly simple: Just flush the command every time we need to switch queues, and make sure all submission and callbacks happen in FIFO order. This required lifting the cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some functions died/got moved as a result, but that's a relatively minor change. On my hardware this is a fairly significant performance boost, mainly due to async transfers. (Nvidia doesn't expose separate compute queues anyway). On AMD, this should be a performance boost as well due to async compute.
*	vo_gpu: invalidate fbotex before drawing	Niklas Haas	2017-12-25
\| \| \| \| \|	Don't discard the OSD or pass_draw_to_screen passes though. Could be faster on some hardware.
*	vo_gpu: allow invalidating FBO in renderpass_run	Niklas Haas	2017-12-25
\| \| \| \| \| \| \| \| \|	This is especially interesting for vulkan since it allows completely skipping the layout transition as part of the renderpass. Unfortunately, that also means it needs to be put into renderpass_params, as opposed to renderpass_run_params (unlike #4777). Closes #4777.
*	vo_gpu: vulkan: properly track image dependencies	Niklas Haas	2017-12-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This uses the new vk_signal mechanism to order all access to textures. This has several advantageS: 1. It allows real synchronization of image access across multiple frames when using multiple queues for parallelism. 2. It allows using events instead of pipeline barriers, which is a finer-grained synchronization primitive that allows for more efficient layout transitions over longer durations. This commit also restructures some of the implicit transition code for renderpasses to be more flexible and correct. (Note: this technically drops the ability to transition the image out of undefined layout when not blending, but that was a bug anyway and needs to be done properly) vo_gpu: vulkan: remove no-longer-true optimization The change to the output_tex format makes this no longer true, and it actually seems to hurt performance now as well. So just don't do it anymore. I also realized it hurts performance when drawing an OSD, so it's probably not a good idea anyway.