summaryrefslogtreecommitdiff
path: root/doc/todo/switch_from_quvi_to_youtube-dl.mdwn
blob: 82d61804ac98541dd80d3dae30ef02cb083be7ba (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
quvi does not seem maintained (last upstream release in 2013)
and it supports many fewer videos than youtube-dl does.

The difficulty with using youtube-dl is it, by design, does not
provide a way to probe if it supports an url, other than running it
and seeing if it finds a video at the url. This would make `git annex
addurl` significantly slower if it ran youtube-dl to probe every url.

It is possible to use youtube-dl to download arbitrary non-video files;
it stores the file to disk just as wget or curl. But, that's well outside
its intended use case, and so it does not feel like a good idea to make
git-annex depend on using youtube-dl to download generic urls.
(Also, youtube-dl has bugs with downloading non-video 
urls, see for example http://bugs.debian.org/874321)

So, switching to youtube-dl would probably need a new switch, like `git
annex addurl --rip` that enables using it.

Currently `git annex importfeed` automatically tests for video urls with
quvi; it would also need to support `--rip`.

Both of those changes would need changes to user's workflows and cron jobs.
git-annex could keep supporting quvi for some time, and warn when it uses
quvi, to help with the transition.

> Alternatively, git-annex addurl could download the url first, and then
> check the file to see if it looks like html. If so, run youtube-dl (which
> unfortunately has to download it again) and see if it manages to rip
> media from it. This way, addurl of non-html files does not have extra
> overhead, and the redundant download is fairly small compared to ripping
> the media. Only the unusual case where addurl is being used on html that
> does not contain media becomes more expensive.

Another gotcha is playlists. youtube-dl downloads playlists automatically.
But, git-annex needs to record an url that downloads a single file so that
`git annex get` works right. So, playlists will need to be disabled when
git-annex runs youtube-dl. But, `--no-playlist` does not always disable
playlists. Best option seems to be `--playlist-items 0` which works for
non-playlists, and downloads only 1 item from playlists (hopefully a fairly
stable item, but who knows..).

Another gotcha is that youtube-dl's -o option does not fully determine the
filename it downloads to. Sometims it will tack on an additional extension
(seen with youtube videos where it added a ".mkv").
And --get-filename does not report the actual filename when that happens.
This seems to be due to format merging by ffmpeg; with -f best, it does
not merge and so does not do that.
<https://github.com/rg3/youtube-dl/issues/14864>