diff options
author | 2015-07-16 15:01:55 -0400 | |
---|---|---|
committer | 2015-07-16 15:01:55 -0400 | |
commit | a63e44eef44d7a0c424b8ddf0a7d0a8832a313f6 (patch) | |
tree | 7230de298d43bdaac2908c2d0434a55e3ef47bf3 /doc/bugs | |
parent | 1396113ccbb6be895b1deb7c7eec228323e47078 (diff) |
move forum bug report to bugs, and close
Diffstat (limited to 'doc/bugs')
5 files changed, 145 insertions, 0 deletions
diff --git a/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking.mdwn b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking.mdwn new file mode 100644 index 000000000..1241a0096 --- /dev/null +++ b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking.mdwn @@ -0,0 +1,90 @@ +I'm trying to upload large files into s3 remote. I'm using a very recent version of git-annex: + + git-annex version: 5.20150616-g4d7683b + build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV FsEvents XMPP DNS Feeds Quvi TDFA TorrentParser + key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E MD5E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 MD5 WORM URL + remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external + +Here's how my chunking is set up: + + xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx bucket=mybucket chunk=256MiB cipher=xxxxxx cipherkeys=xxxxxx datacenter=US + host=s3.amazonaws.com name=mybucket port=80 s3creds=xxxxxx storageclass=STANDARD type=S3 timestamp=xxxxxx + +If I run an upload and `^C` it in the middle of the upload, then start it again, it will always resume from the beginning. + +I've proven this to myself by using the `--debug` switch, please see blow. I've renamed certain things for security reasons, however GPGHMACSHA1--1111111111 always refers to the same chunk and GPGHMACSHA1--2222222222 always refers to the same chunk, etc. + +You can see that even after it uploads the same chunk once, it tries again. + +This is consistent with the behavior of letting it sit there for an hour and upload half of the large file, and then interrupting it, and having it start from scratch again. + + $ git annex copy --debug * --to mybucket + + [2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","git-annex"] + [2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"] + [2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","log","refs/heads/git-annex..xxx","-n1","--pretty=%H"] + [2015-06-23 15:24:07 PDT] chat: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","cat-file","--batch"] + [2015-06-23 15:24:07 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","ls-files","--cached","-z","--","aaa.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz"] + copy aaa.tgz [2015-06-23 15:24:07 PDT] chat: gpg ["--quiet","--trust-model","always","--decrypt"] + (checking mybucket...) [2015-06-23 15:24:07 PDT] String to sign: "HEAD\n\n\nTue, 23 Jun 2015 22:24:07 GMT\n/mybucket/GPGHMACSHA1--1111111111" + [2015-06-23 15:24:07 PDT] Host: "mybucket.s3.amazonaws.com" + [2015-06-23 15:24:07 PDT] Path: "/GPGHMACSHA1--1111111111" + [2015-06-23 15:24:07 PDT] Query string: "" + [2015-06-23 15:24:07 PDT] Response status: Status {statusCode = 404, statusMessage = "Not Found"} + [2015-06-23 15:24:07 PDT] Response header 'x-amz-request-id': 'xxx' + [2015-06-23 15:24:07 PDT] Response header 'x-amz-id-2': 'xxx' + [2015-06-23 15:24:07 PDT] Response header 'Content-Type': 'application/xml' + [2015-06-23 15:24:07 PDT] Response header 'Transfer-Encoding': 'chunked' + [2015-06-23 15:24:07 PDT] Response header 'Date': 'Tue, 23 Jun 2015 22:24:03 GMT' + [2015-06-23 15:24:07 PDT] Response header 'Server': 'AmazonS3' + [2015-06-23 15:24:07 PDT] Response metadata: S3: request ID=<none>, x-amz-id-2=<none> + (to mybucket...) + 0% 0.0 B/s 0s[2015-06-23 15:24:07 PDT] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","18","--symmetric","--force-mdc","--no-textmode"] + [2015-06-23 15:24:19 PDT] String to sign: "PUT\n\n\nTue, 23 Jun 2015 22:24:19 GMT\nx-amz-storage-class:STANDARD\n/mybucket/GPGHMACSHA1--2222222222" + [2015-06-23 15:24:19 PDT] Host: "mybucket.s3.amazonaws.com" + [2015-06-23 15:24:19 PDT] Path: "/GPGHMACSHA1--2222222222" + [2015-06-23 15:24:19 PDT] Query string: "" + 3% 636.3KB/s 3h0m[2015-06-23 15:31:01 PDT] Response status: Status {statusCode = 200, statusMessage = "OK"} + [2015-06-23 15:31:01 PDT] Response header 'x-amz-id-2': 'xxx' + [2015-06-23 15:31:01 PDT] Response header 'x-amz-request-id': 'xxx' + [2015-06-23 15:31:01 PDT] Response header 'Date': 'Tue, 23 Jun 2015 22:24:17 GMT' + [2015-06-23 15:31:01 PDT] Response header 'ETag': '"xxx"' + [2015-06-23 15:31:01 PDT] Response header 'Content-Length': '0' + [2015-06-23 15:31:01 PDT] Response header 'Server': 'AmazonS3' + [2015-06-23 15:31:01 PDT] Response metadata: S3: request ID=xxx, x-amz-id-2=xxx + 3% 633.2KB/s 3h1m[2015-06-23 15:31:01 PDT] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","18","--symmetric","--force-mdc","--no-textmode"] + [2015-06-23 15:31:13 PDT] String to sign: "PUT\n\n\nTue, 23 Jun 2015 22:31:13 GMT\nx-amz-storage-class:STANDARD\n/mybucket/GPGHMACSHA1--3333333333" + [2015-06-23 15:31:13 PDT] Host: "mybucket.s3.amazonaws.com" + [2015-06-23 15:31:13 PDT] Path: "/GPGHMACSHA1--3333333333" + [2015-06-23 15:31:13 PDT] Query string: "" + 3% 617.2KB/s 3h6m^C + + $ git annex copy --debug * --to mybucket + + [2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","git-annex"] + [2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"] + [2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","log","refs/heads/git-annex..xxx","-n1","--pretty=%H"] + [2015-06-23 15:31:25 PDT] chat: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","cat-file","--batch"] + [2015-06-23 15:31:25 PDT] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","ls-files","--cached","-z","--","aaa.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz","xxx.tgz"] + copy aaa.tgz [2015-06-23 15:31:25 PDT] chat: gpg ["--quiet","--trust-model","always","--decrypt"] + (checking mybucket...) [2015-06-23 15:31:25 PDT] String to sign: "HEAD\n\n\nTue, 23 Jun 2015 22:31:25 GMT\n/mybucket/GPGHMACSHA1--1111111111" + [2015-06-23 15:31:25 PDT] Host: "mybucket.s3.amazonaws.com" + [2015-06-23 15:31:25 PDT] Path: "/GPGHMACSHA1--1111111111" + [2015-06-23 15:31:25 PDT] Query string: "" + [2015-06-23 15:31:25 PDT] Response status: Status {statusCode = 404, statusMessage = "Not Found"} + [2015-06-23 15:31:25 PDT] Response header 'x-amz-request-id': 'xxx' + [2015-06-23 15:31:25 PDT] Response header 'x-amz-id-2': 'xxx' + [2015-06-23 15:31:25 PDT] Response header 'Content-Type': 'application/xml' + [2015-06-23 15:31:25 PDT] Response header 'Transfer-Encoding': 'chunked' + [2015-06-23 15:31:25 PDT] Response header 'Date': 'Tue, 23 Jun 2015 22:31:21 GMT' + [2015-06-23 15:31:25 PDT] Response header 'Server': 'AmazonS3' + [2015-06-23 15:31:25 PDT] Response metadata: S3: request ID=<none>, x-amz-id-2=<none> + (to mybucket...) + 0% 0.0 B/s 0s[2015-06-23 15:31:25 PDT] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","18","--symmetric","--force-mdc","--no-textmode"] + [2015-06-23 15:31:37 PDT] String to sign: "PUT\n\n\nTue, 23 Jun 2015 22:31:37 GMT\nx-amz-storage-class:STANDARD\n/mybucket/GPGHMACSHA1--2222222222" + [2015-06-23 15:31:37 PDT] Host: "mybucket.s3.amazonaws.com" + [2015-06-23 15:31:37 PDT] Path: "/GPGHMACSHA1--2222222222" + [2015-06-23 15:31:37 PDT] Query string: "" + 0% 350.1KB/s 5h40m^C + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_1_f6b1991e259bf4b3d2c85a08f465aa4a._comment b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_1_f6b1991e259bf4b3d2c85a08f465aa4a._comment new file mode 100644 index 000000000..79f0fe873 --- /dev/null +++ b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_1_f6b1991e259bf4b3d2c85a08f465aa4a._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="anarcat" + subject="comment 1" + date="2015-06-24T00:31:43Z" + content=""" +did a single chunk get transfered correctly? i believe git-annex can only resume at the chunk granularity... that is what it is for, no? --[[anarcat]] +"""]] diff --git a/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_2_db57d14b983a957c454968477d9de634._comment b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_2_db57d14b983a957c454968477d9de634._comment new file mode 100644 index 000000000..df12abc03 --- /dev/null +++ b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_2_db57d14b983a957c454968477d9de634._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="digiuser" + subject="yes" + date="2015-06-24T00:49:22Z" + content=""" +Yes, a single chunk did get transferred correctly. + +Actually, many times I've run this experiment, many chunks did get transferred correctly. I've even verified that they are in S3, but git-annex is trying to re-upload them. + +(I haven't checked their contents in S3 but the filenames are there and the sizes are there) +"""]] diff --git a/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_3_85989f505931ec695d7f3de74db0f5a1._comment b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_3_85989f505931ec695d7f3de74db0f5a1._comment new file mode 100644 index 000000000..aac24a023 --- /dev/null +++ b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_3_85989f505931ec695d7f3de74db0f5a1._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="digiuser" + subject="any updates?" + date="2015-06-29T03:05:53Z" + content=""" +Sorry to post again here but I was wondering if this message got lost. Anyone have a solution here? Thanks! +"""]] diff --git a/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_4_bd631d470ee0365a11483c9a2e563b32._comment b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_4_bd631d470ee0365a11483c9a2e563b32._comment new file mode 100644 index 000000000..ecac7917c --- /dev/null +++ b/doc/bugs/s3_special_remote_does_not_resume_uploads_even_with_new_chunking/comment_4_bd631d470ee0365a11483c9a2e563b32._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2015-07-16T17:57:44Z" + content=""" +This should have been filed as a bug report... I will move the thread to +bugs after posting this comment. + +In your obfuscated log, it tries to HEAD GPGHMACSHA1--1111111111 +and when that fails, it PUTs GPGHMACSHA1--2222222222. From this, we can +deduce that GPGHMACSHA1--1111111111 is not the first chunk, but is the full +non-chunked file, and GPGHMACSHA1--2222222222 is actually the first chunk. + +For testing, I modifed the S3 remote to make file uploads succeed, but then +report to git-annex that they failed. So, git annex copy uploads the 1st +chunk and then fails, same as it was interrupted there. Repeating the copy, +I see the same thing; it HEADs the full key, does not HEAD the first chunk, +and so doesn't notice it was uploaded before, and so re-uploads the first +chunk. + +The HEAD of the full key is just done for backwards compatability reasons. +The problem is that it's not checking if the current chunk it's gonna +upload is present in the remote. But, there is code in seekResume that +is supposed to do that very check: `tryNonAsync (checker k)` + +Aha, the problem seems to be in the checkpresent action that's passed to +that. Looks like it's passing in a dummy checkpresent action. + +I've fixed this in git, and now it resumes properly in my test case. +"""]] |