From ede2198520dded21d580a9c199a0909c2b04923a Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 28 Nov 2017 12:50:30 -0400 Subject: add Utility.HtmlDetect This will be used in youtube-dl integration, to tell when a html page has been downloaded by addurl, in which case it is worth running youtube-dl to see if it can extract media from it. tagsoup is an almost free dependency, because yesod depends on it. So, this only really adds a dep when git-annex is built without the webapp. I'd like this to as closely as possible match how browsers decide if a page is html or not. Unfortunately, that is fairly heuristic, in order to support malformed html. And, we don't want to falsely detect something as html just because it has something that looks like a html tag embedded somewhere in it. Probably any major video hosting site is going to be serving html documents that at least start with a tag, so requiring that or a DOCTYPE should be good enough. This commit was sponsored by Jeff Goeke-Smith on Patreon. --- git-annex.cabal | 2 ++ 1 file changed, 2 insertions(+) (limited to 'git-annex.cabal') diff --git a/git-annex.cabal b/git-annex.cabal index 5d46caed3..780961d88 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -347,6 +347,7 @@ Executable git-annex persistent, persistent-template, aeson, + tagsoup, unordered-containers, feed (>= 0.3.9), regex-tdfa, @@ -1001,6 +1002,7 @@ Executable git-annex Utility.Glob Utility.Gpg Utility.Hash + Utility.HtmlDetect Utility.HumanNumber Utility.HumanTime Utility.InodeCache -- cgit v1.2.3