summaryrefslogtreecommitdiff
path: root/doc/bugs/addurl_fails_on_the_internet_archive.mdwn
blob: 556575db64a2b1efdcc8832596d64ae49be2eff0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
### Please describe the problem.

`addurl` doesn't support the internet archive:

1. it doesn't actually accept the proper URL as a secondary source of content
2. it doesn't parse the HTML from the video page (the "details page")

### What steps will reproduce the problem?

    # download eben moglen's excellent re:publica presentation from youtube
    git annex addurl https://www.youtube.com/watch?v=sKOk4Y4inVY
    # copy that file aside
    cp -L re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm-bak
    # drop it so we can try again
    git annex drop re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
    # add the IA URL for the same video, failing
    git annex addurl --file=re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
    # try again with --relaxed to skip some checks
    git annex addurl --relaxed --file=re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
    # observe both files are the same size and checksum

The files should look like this:

[[!format txt """
anarcat@angela:presentations$ ls -alL re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm*
-r--r--r-- 1 anarcat anarcat 419359123 oct  9 23:41 re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
-r--r--r-- 1 anarcat anarcat 419359123 oct 11 19:40 re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm-bak
anarcat@angela:presentations$ md5sum re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm*
7892df24a9e1c40e2587be1035728ef0  re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
7892df24a9e1c40e2587be1035728ef0  re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm-bak
"""]]

There are two separate bugs here: one is the above need to use --relaxed even though the file is the same.

The second is probably simply that quvi doesn't support the internet archive, and maybe that one should be moved to a separate [[todo]]/[[wishlist]]. I was expecting this to "do the right thing" (ie. download the video):

    git annex addurl http://archive.org/details/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia

... but instead it downloads the HTML.
### What version of git-annex are you using? On what operating system?

my good old faithful `4.20130921-g434dc22` i compiled manually some time ago. :)

This is [[done]] in git-annex version: 4.20131011-g2c0badc. Thanks!

### Please provide any additional information below.

[[!format sh """
anarcat@marcos:presentations$ git annex addurl --file=re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm http://archive.org/download/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
addurl re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
  failed to verify url exists: http://archive.org/download/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
failed
git-annex: addurl: 1 failed
anarcat@marcos:presentations$ git annex addurl --debug --file=re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
[2013-10-09 18:26:30 EDT] call: quvi ["-v","mute","--support","http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm"]
addurl re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm [2013-10-09 18:26:30 EDT] read: curl ["-s","--head","-L","http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm","-w","%{http_code}"]

  failed to verify url exists: http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
failed
git-annex: addurl: 1 failed
"""]]

Originally reported in [[tips/Internet_Archive_via_S3]]. --[[anarcat]]

> [[done]] --[[Joey]]