summaryrefslogtreecommitdiff
path: root/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking.mdwn
blob: 1241a0096aa6a6c43f13fb61555a1c761e2a1824 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
I'm trying to upload large files into s3 remote.  I'm using a very recent version of git-annex:

    git-annex version: 5.20150616-g4d7683b
    build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV FsEvents XMPP DNS Feeds Quvi TDFA TorrentParser
    key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E MD5E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 MD5 WORM URL
    remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external

Here's how my chunking is set up:

    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx bucket=mybucket chunk=256MiB cipher=xxxxxx cipherkeys=xxxxxx datacenter=US
    host=s3.amazonaws.com name=mybucket port=80 s3creds=xxxxxx storageclass=STANDARD type=S3 timestamp=xxxxxx

If I run an upload and `^C` it in the middle of the upload, then start it again, it will always resume from the beginning.

I've proven this to myself by using the `--debug` switch, please see blow. I've renamed certain things for security reasons, however GPGHMACSHA1--1111111111 always refers to the same chunk and GPGHMACSHA1--2222222222 always refers to the same chunk, etc.

You can see that even after it uploads the same chunk once, it tries again.

This is consistent with the behavior of letting it sit there for an hour and upload half of the large file, and then interrupting it, and having it start from scratch again.

    $ git annex copy --debug * --to mybucket

    [2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","git-annex"]
    [2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
    [2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","log","refs/heads/git-annex..xxx","-n1","--pretty=%H"]
    [2015-06-23 15:24:07 PDT] chat: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","cat-file","--batch"]
    [2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","ls-files","--cached","-z","--","aaa.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz"]
    copy aaa.tgz [2015-06-23 15:24:07 PDT] chat: gpg ["--quiet","--trust-model","always","--decrypt"]
    (checking mybucket...) [2015-06-23 15:24:07 PDT] String to sign: "HEAD\n\n\nTue, 23 Jun 2015 22:24:07 GMT\n/mybucket/GPGHMACSHA1--1111111111"
    [2015-06-23 15:24:07 PDT] Host: "mybucket.s3.amazonaws.com"
    [2015-06-23 15:24:07 PDT] Path: "/GPGHMACSHA1--1111111111"
    [2015-06-23 15:24:07 PDT] Query string: ""
    [2015-06-23 15:24:07 PDT] Response status: Status {statusCode = 404, statusMessage = "Not Found"}
    [2015-06-23 15:24:07 PDT] Response header 'x-amz-request-id': 'xxx'
    [2015-06-23 15:24:07 PDT] Response header 'x-amz-id-2': 'xxx'
    [2015-06-23 15:24:07 PDT] Response header 'Content-Type': 'application/xml'
    [2015-06-23 15:24:07 PDT] Response header 'Transfer-Encoding': 'chunked'
    [2015-06-23 15:24:07 PDT] Response header 'Date': 'Tue, 23 Jun 2015 22:24:03 GMT'
    [2015-06-23 15:24:07 PDT] Response header 'Server': 'AmazonS3'
    [2015-06-23 15:24:07 PDT] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
    (to mybucket...)
    0%            0.0 B/s 0s[2015-06-23 15:24:07 PDT] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","18","--symmetric","--force-mdc","--no-textmode"]
    [2015-06-23 15:24:19 PDT] String to sign: "PUT\n\n\nTue, 23 Jun 2015 22:24:19 GMT\nx-amz-storage-class:STANDARD\n/mybucket/GPGHMACSHA1--2222222222"
    [2015-06-23 15:24:19 PDT] Host: "mybucket.s3.amazonaws.com"
    [2015-06-23 15:24:19 PDT] Path: "/GPGHMACSHA1--2222222222"
    [2015-06-23 15:24:19 PDT] Query string: ""
    3%        636.3KB/s 3h0m[2015-06-23 15:31:01 PDT] Response status: Status {statusCode = 200, statusMessage = "OK"}
    [2015-06-23 15:31:01 PDT] Response header 'x-amz-id-2': 'xxx'
    [2015-06-23 15:31:01 PDT] Response header 'x-amz-request-id': 'xxx'
    [2015-06-23 15:31:01 PDT] Response header 'Date': 'Tue, 23 Jun 2015 22:24:17 GMT'
    [2015-06-23 15:31:01 PDT] Response header 'ETag': '"xxx"'
    [2015-06-23 15:31:01 PDT] Response header 'Content-Length': '0'
    [2015-06-23 15:31:01 PDT] Response header 'Server': 'AmazonS3'
    [2015-06-23 15:31:01 PDT] Response metadata: S3: request ID=xxx, x-amz-id-2=xxx
    3%        633.2KB/s 3h1m[2015-06-23 15:31:01 PDT] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","18","--symmetric","--force-mdc","--no-textmode"]
    [2015-06-23 15:31:13 PDT] String to sign: "PUT\n\n\nTue, 23 Jun 2015 22:31:13 GMT\nx-amz-storage-class:STANDARD\n/mybucket/GPGHMACSHA1--3333333333"
    [2015-06-23 15:31:13 PDT] Host: "mybucket.s3.amazonaws.com"
    [2015-06-23 15:31:13 PDT] Path: "/GPGHMACSHA1--3333333333"
    [2015-06-23 15:31:13 PDT] Query string: ""
    3%        617.2KB/s 3h6m^C

    $ git annex copy --debug * --to mybucket

    [2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","git-annex"]
    [2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
    [2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","log","refs/heads/git-annex..xxx","-n1","--pretty=%H"]
    [2015-06-23 15:31:25 PDT] chat: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","cat-file","--batch"]
    [2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","ls-files","--cached","-z","--","aaa.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz"]
    copy aaa.tgz [2015-06-23 15:31:25 PDT] chat: gpg ["--quiet","--trust-model","always","--decrypt"]
    (checking mybucket...) [2015-06-23 15:31:25 PDT] String to sign: "HEAD\n\n\nTue, 23 Jun 2015 22:31:25 GMT\n/mybucket/GPGHMACSHA1--1111111111"
    [2015-06-23 15:31:25 PDT] Host: "mybucket.s3.amazonaws.com"
    [2015-06-23 15:31:25 PDT] Path: "/GPGHMACSHA1--1111111111"
    [2015-06-23 15:31:25 PDT] Query string: ""
    [2015-06-23 15:31:25 PDT] Response status: Status {statusCode = 404, statusMessage = "Not Found"}
    [2015-06-23 15:31:25 PDT] Response header 'x-amz-request-id': 'xxx'
    [2015-06-23 15:31:25 PDT] Response header 'x-amz-id-2': 'xxx'
    [2015-06-23 15:31:25 PDT] Response header 'Content-Type': 'application/xml'
    [2015-06-23 15:31:25 PDT] Response header 'Transfer-Encoding': 'chunked'
    [2015-06-23 15:31:25 PDT] Response header 'Date': 'Tue, 23 Jun 2015 22:31:21 GMT'
    [2015-06-23 15:31:25 PDT] Response header 'Server': 'AmazonS3'
    [2015-06-23 15:31:25 PDT] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
    (to mybucket...)
    0%            0.0 B/s 0s[2015-06-23 15:31:25 PDT] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","18","--symmetric","--force-mdc","--no-textmode"]
    [2015-06-23 15:31:37 PDT] String to sign: "PUT\n\n\nTue, 23 Jun 2015 22:31:37 GMT\nx-amz-storage-class:STANDARD\n/mybucket/GPGHMACSHA1--2222222222"
    [2015-06-23 15:31:37 PDT] Host: "mybucket.s3.amazonaws.com"
    [2015-06-23 15:31:37 PDT] Path: "/GPGHMACSHA1--2222222222"
    [2015-06-23 15:31:37 PDT] Query string: ""
    0%       350.1KB/s 5h40m^C

> [[fixed|done]] --[[Joey]]