summaryrefslogtreecommitdiff
path: root/doc/todo/switch_from_quvi_to_youtube-dl.mdwn
blob: aa2f6955d193571ba08b925cc7ec637b84164fd5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
quvi does not seem maintained (last upstream release in 2013)
and it supports many fewer videos than youtube-dl does.

The difficulty with using youtube-dl is it, by design, does not
provide a way to probe if it supports an url, other than running it
and seeing if it finds a video at the url. This would make `git annex
addurl` significantly slower if it ran youtube-dl to probe every url.

It is possible to use youtube-dl to download arbitrary non-video files;
it stores the file to disk just as wget or curl. But, that's well outside
its intended use case, and so it does not feel like a good idea to make
git-annex depend on using youtube-dl to download generic urls.
(Also, youtube-dl has bugs with downloading non-video 
urls, see for example http://bugs.debian.org/874321)

So, switching to youtube-dl would probably need a new switch, like `git
annex addurl --rip` that enables using it.

Currently `git annex importfeed` automatically tests for video urls with
quvi; it would also need to support `--rip`.

Both of those changes would need changes to user's workflows and cron jobs.
git-annex could keep supporting quvi for some time, and warn when it uses
quvi, to help with the transition.

> Alternatively, git-annex addurl could download the url first, and then
> check the file to see if it looks like html. If so, run youtube-dl (which
> unfortunately has to download it again) and see if it manages to rip
> media from it. This way, addurl of non-html files does not have extra
> overhead, and the redundant download is fairly small compared to ripping
> the media. Only the unusual case where addurl is being used on html that
> does not contain media becomes more expensive.

Another gotcha is playlists. youtube-dl downloads playlists automatically.
But, git-annex needs to record an url that downloads a single file so that
`git annex get` works right. So, playlists will need to be disabled when
git-annex runs youtube-dl. But, `--no-playlist` does not always disable
playlists. Best option seems to be `--playlist-items 0` which works for
non-playlists, and downloads only 1 item from playlists (hopefully a fairly
stable item, but who knows..).

(`git annex importfeed` handles youtube playlist downloads, but needs the
user to find the url to the rss feed for the playlist. Youtube still has
these, although it makes them hard to find.)

Another gotcha is that youtube-dl's -o option does not fully determine the
filename it downloads to. Sometims it will tack on an additional extension
(seen with youtube videos where it added a ".mkv").
And --get-filename does not report the actual filename when that happens.
This seems to be due to format merging by ffmpeg; with -f best, it does
not merge and so does not do that.
<https://github.com/rg3/youtube-dl/issues/14864>

To do disk free space checking will need a different technique than
git-annex normally uses, because youtube-dl does not provide an easy way to
query for size. Could use --dump-json, but that would require downloading
the web page yet again, so too expensive.. and, the json seems to have
"filesize: null" for youtube videos. What does work is the --max-filesize
option, which makes youtube-dl abort if it's too big.