summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2015-07-20 14:56:57 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2015-07-20 14:56:57 -0400
commit25f5ee379cebcc7ea4cb0b338f43f3c0e7477400 (patch)
tree9589c405aa007c24ebd51fb16362e455e93d3795
parent347d025c0930ac7994aa00e92fdfe8b54a2258e2 (diff)
importfeed: Look at not only permalinks, but now also guids to identify previously downloaded files.
I've seen rss feeds that have no permalinks, only guids (which are sometimes in the form of permalinks, argh/sigh). I had previously avoided trusting guids to be globally unique, because my survey of rss feeds that I subscribe to shows a lot of pretty bad "guids" like "2 at http://serialpodcast.org" or even worse "oth20150401-hq". Worry was that two podcasts that are generating guids so badly, that there's no guarantee they're actually globally unique. But, I'm seeing too many url changes that result in redundant files, so let's try this. If feeds are so broken that guids overlap, they could just as well incorrectly call them permalinks too.
-rw-r--r--Command/ImportFeed.hs3
-rw-r--r--debian/changelog5
2 files changed, 6 insertions, 2 deletions
diff --git a/Command/ImportFeed.hs b/Command/ImportFeed.hs
index 5afbb192a..46e1b6dbe 100644
--- a/Command/ImportFeed.hs
+++ b/Command/ImportFeed.hs
@@ -219,8 +219,7 @@ performDownload opts cache todownload = case location todownload of
| otherwise = a
knownitemid = case getItemId (item todownload) of
- -- only when it's a permalink
- Just (True, itemid) -> S.member itemid (knownitems cache)
+ Just (_, itemid) -> S.member itemid (knownitems cache)
_ -> False
rundownload url extension getter = do
diff --git a/debian/changelog b/debian/changelog
index e8064900b..4ffbf1151 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -20,6 +20,11 @@ git-annex (5.20150714) UNRELEASED; urgency=medium
* sync --content: Fix bug that caused files to be uploaded to eg,
more archive remotes than wanted copies, only to later be dropped
to satisfy the preferred content settings.
+ * importfeed: Improve detection of known items whose url has changed,
+ and avoid adding redundant files. Where before this only looked at
+ permalinks in rss feeds, it now also looks at guids.
+ * importfeed: Look at not only permalinks, but now also guids
+ to identify previously downloaded files.
-- Joey Hess <id@joeyh.name> Fri, 10 Jul 2015 16:36:42 -0400