From e6a7285b64fc5030fc759ba1bafc4071034b83fc Mon Sep 17 00:00:00 2001 From: ewen Date: Tue, 21 Mar 2017 08:48:05 +0000 Subject: Added a comment: Track GUIDs to avoid duplicate downloads --- .../comment_25_211b8f829070021e977c6de9eebf829f._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/tips/downloading_podcasts/comment_25_211b8f829070021e977c6de9eebf829f._comment diff --git a/doc/tips/downloading_podcasts/comment_25_211b8f829070021e977c6de9eebf829f._comment b/doc/tips/downloading_podcasts/comment_25_211b8f829070021e977c6de9eebf829f._comment new file mode 100644 index 000000000..7588598d6 --- /dev/null +++ b/doc/tips/downloading_podcasts/comment_25_211b8f829070021e977c6de9eebf829f._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Track GUIDs to avoid duplicate downloads" + date="2017-03-21T08:48:04Z" + content=""" +While tracking podcast media URLs *usually* works to avoid duplicate downloads, when it fails it usually fails spectacularly. In particular if a podcast feed decides to update *all* the URLs (for old and new podcasts) to use a different URL scheme, then suddenly that looks like a huge volume of new URLs, and all of them get downloaded -- even if the content has actually already been retrieved from a different URL. For instance the `acast.com` service has changed their URL scheme a couple of times in the last 1-2 years, rewriting all the historical URLs, so I have three copies of many of the episodes on podcasts on their service :-( (Many downloaded; some skipped once I caught the bulk download and stopped it/reran with `--fast` or `--relaxed` to make placeholders instead. `acast.com` seem to have managed to cause even more confusion by rewriting many of the older `mp3` files with new `id3) + +Some (all?) podcast feeds also have a `guid` field, which specifies what should be a unique per-episode +"""]] -- cgit v1.2.3