aboutsummaryrefslogtreecommitdiff
path: root/doc/tips/downloading_podcasts.mdwn
blob: 7416281d25c258813ce6d4952e524f2c7f8876d7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
You can use git-annex as a podcatcher, to download podcast contents.
No additional software is required, but your git-annex must be built
with the Feeds feature (run `git annex version` to check).

All you need to do is put something like this in a cron job:

`cd somerepo && git annex importfeed http://url/to/podcast http://other/podcast/url`

This downloads the urls, and parses them as RSS, Atom, or RDF feeds.
All enclosures are downloaded and added to the repository, the same as if you
had manually run `git annex addurl` on each of them.

git-annex will avoid downloading a file from a feed if its url has already
been stored in the repository before. So once a file is downloaded,
you can move it around, delete it, `git annex drop` its content, etc,
and it will not be downloaded again by repeated runs of
`git annex importfeed`. Just how a podcatcher should behave.  (git-annex versions 
since 2015 also tracks the podcast `guid` values, as metadata, to help avoid 
duplication if the media file url changes; use `git annex metadata ...` to inspect.)

## templates

To control the filenames used for items downloaded from a feed,
there's a --template option. The default is
`--template='${feedtitle}/${itemtitle}${extension}'`

Other available template variables:  
feedauthor, itemauthor, itemsummary, itemdescription, itemrights, itemid,
itempubdate, author, title.

## catching up

To catch up on a feed without downloading its contents,
use `git annex importfeed --relaxed`, and delete the symlinks it creates.
Next time you run `git annex addurl` it will only fetch any new items.

## fast mode

To add a feed without downloading its contents right now,
use `git annex importfeed --fast`. Then you can use `git annex get` as
usual to download the content of an item.

## storing the podcast list in git

You can check the list of podcast urls into git right next to the
files it downloads. Just make a file named feeds and add one podcast url
per line.

Then you can run git-annex on all the feeds:

`xargs git-annex importfeed < feeds`

## recreating lost episodes

If for some reason git-annex refuses to download files you are certain are in the podcast, it is quite possible it is because they have already been downloaded. In any case, you can use `--force` to redownload them:

`git-annex importfeed --force http://example.com/feed`

## distributed podcatching

A nice benefit of using git-annex as a podcatcher is that you can
run `git annex importfeed` on the same url in different clones
of a repository, and `git annex sync` will sync it all up.

## centralized podcatching

You can also have a designated machine which always fetches all podcstas
to local disk and stores them. That way, you can archive podcasts with
time-delayed deletion of upstream content. You can also work around slow
downloads upstream by podcatching to a server with ample bandwidth or work
around a slow local Internet connection by podcatching to your home server
and transferring to your laptop on demand.

## youtube channels

You can also use `git annex importfeed` on youtube channels.
It will use [youtube-dl](https://rg3.github.io/youtube-dl/) to automatically
download the videos.

To download a youtube channel, you need to find the feed associated with that
channel, and pass it to `git annex importfeed`. There does not seem to be
an easy link anywhere to get the feed, but you can construct its url
manually. For a channel url like
"https://www.youtube.com/channel/$foo", the
feed is "https://www.youtube.com/feeds/videos.xml?channel_id=$foo"

## metadata

As well as storing the urls for items imported from a feed, git-annex can
store additional [[metadata]], like the author, and itemdescription.
This can then be looked up later, used in [[metadata_driven_views]], etc.

To make all available metadata from the feed be stored:
`git config annex.genmetadata true`