summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2015-06-16 13:50:48 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2015-06-16 13:50:48 -0400
commit2b6a47ce27ccc1ada2cb15c7e0e021d8b23a726b (patch)
treeb6768232ed20282fda36c35a59c5b40b9160a0ad
parentac374e91966be477a5833a8937c3a8cbeddc8669 (diff)
parent3f446a6d5a158178c97818b3224c5499c50502f2 (diff)
Merge branch 'master' of ssh://git-annex.branchable.com
-rw-r--r--doc/bugs/Corrupted_.git__47__annex__47__index_when_running_assistant/comment_5_81e2f37e7adbd8f24734b67a1dd209f9._comment7
-rw-r--r--doc/bugs/s3_InternalIOException__63__.mdwn2
-rw-r--r--doc/bugs/s3_InternalIOException__63__/comment_2_994dd3ebcf7eaacb0b9e06f1bc14a2d4._comment7
-rw-r--r--doc/bugs/s3_InternalIOException__63__/comment_3_a69927ec705efa31aacb5941bf8d8f9d._comment14
-rw-r--r--doc/bugs/s3_InternalIOException__63__/comment_4_ee791ad24d5d2c0ad4f82ecf6fc434a9._comment28
-rw-r--r--doc/bugs/transfer_in_progress_not_present_in_json_output/comment_1_ca13b034f7034deea6a8b3a295b8fdd3._comment7
-rw-r--r--doc/todo/S3_fsck_support.mdwn55
7 files changed, 120 insertions, 0 deletions
diff --git a/doc/bugs/Corrupted_.git__47__annex__47__index_when_running_assistant/comment_5_81e2f37e7adbd8f24734b67a1dd209f9._comment b/doc/bugs/Corrupted_.git__47__annex__47__index_when_running_assistant/comment_5_81e2f37e7adbd8f24734b67a1dd209f9._comment
new file mode 100644
index 000000000..71d7a7338
--- /dev/null
+++ b/doc/bugs/Corrupted_.git__47__annex__47__index_when_running_assistant/comment_5_81e2f37e7adbd8f24734b67a1dd209f9._comment
@@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="eigengrau"
+ subject="comment 5"
+ date="2015-06-16T13:20:07Z"
+ content="""
+You’re right, but the problem might be my somehow developing a perhaps overzealous habit of using the occasional `--aggressive --prune=now`. From what you said I would surmise that this could be a fairly good explanation for the missing object in my case.
+"""]]
diff --git a/doc/bugs/s3_InternalIOException__63__.mdwn b/doc/bugs/s3_InternalIOException__63__.mdwn
index ec153b708..5a5674f66 100644
--- a/doc/bugs/s3_InternalIOException__63__.mdwn
+++ b/doc/bugs/s3_InternalIOException__63__.mdwn
@@ -8,6 +8,8 @@ Then subsequent transfers seem to fail with:
InternalIOException send: resource vanished (Connection reset by peer)
+Workaround: restart the assistant.
+
### What steps will reproduce the problem?
It's unclear. The assistant is trying to sync a lot of stuff to S3 right now, as files are regularly added into the repository and the assistant migrates them all there. The repository is setup as a "source" repository to make sure it doesn't keep files and send them all to s3.
diff --git a/doc/bugs/s3_InternalIOException__63__/comment_2_994dd3ebcf7eaacb0b9e06f1bc14a2d4._comment b/doc/bugs/s3_InternalIOException__63__/comment_2_994dd3ebcf7eaacb0b9e06f1bc14a2d4._comment
new file mode 100644
index 000000000..4d3172ea3
--- /dev/null
+++ b/doc/bugs/s3_InternalIOException__63__/comment_2_994dd3ebcf7eaacb0b9e06f1bc14a2d4._comment
@@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="anarcat"
+ subject="comment 2"
+ date="2015-06-15T19:15:12Z"
+ content="""
+i have no idea, really. i probably did *not* disable chunking explicitely on the repo, if that's any help..
+"""]]
diff --git a/doc/bugs/s3_InternalIOException__63__/comment_3_a69927ec705efa31aacb5941bf8d8f9d._comment b/doc/bugs/s3_InternalIOException__63__/comment_3_a69927ec705efa31aacb5941bf8d8f9d._comment
new file mode 100644
index 000000000..c32621d6e
--- /dev/null
+++ b/doc/bugs/s3_InternalIOException__63__/comment_3_a69927ec705efa31aacb5941bf8d8f9d._comment
@@ -0,0 +1,14 @@
+[[!comment format=mdwn
+ username="anarcat"
+ subject="comment 3"
+ date="2015-06-15T20:02:56Z"
+ content="""
+okay, after disabling chunking, it still doesn't work.
+
+it seems that it's completely stuck: i haven't seen the assistant transfer any new files yet. using `git annex move` seems to work:
+
+<pre>
+www-data@ip-10-87-135-88:/persistent/media$ git annex move --to s3
+move video/mp4_sd/-_cineastas_indigenas_ntsc_-_eng_0.mov.mp4 (checking s3...) ok
+</pre>
+"""]]
diff --git a/doc/bugs/s3_InternalIOException__63__/comment_4_ee791ad24d5d2c0ad4f82ecf6fc434a9._comment b/doc/bugs/s3_InternalIOException__63__/comment_4_ee791ad24d5d2c0ad4f82ecf6fc434a9._comment
new file mode 100644
index 000000000..fd65fa8f8
--- /dev/null
+++ b/doc/bugs/s3_InternalIOException__63__/comment_4_ee791ad24d5d2c0ad4f82ecf6fc434a9._comment
@@ -0,0 +1,28 @@
+[[!comment format=mdwn
+ username="anarcat"
+ subject="comment 4"
+ date="2015-06-15T20:05:13Z"
+ content="""
+nevermind that: `move` doesn't actually work either, it just skipped files that seemed to have already been transfered. the remaining files are still untransferable:
+
+<pre>
+www-data@ip-10-87-135-88:/persistent/media$ git annex move --to s3
+move video/original/a_gente_luta_-_eng_0.mov (checking s3...) (to s3...)
+0% 0.0 B/s 0s
+ InternalIOException send: resource vanished (Broken pipe)
+failed
+move video/original/a_gente_luta_-_esp_0.mov (checking s3...) (to s3...)
+0% 255.9KB/s 9h23m
+ InternalIOException send: resource vanished (Broken pipe)
+failed
+move video/original/kinja_iakaha_-_dvcam_en.mov (checking s3...) (to s3...)
+0% 0.0 B/s 0s
+ InternalIOException send: resource vanished (Broken pipe)
+failed
+move video/original/quartet_for_deafblind_h264kbs18000_24.mov (checking s3...) (to s3...)
+0% 0.0 B/s 0s
+ InternalIOException send: resource vanished (Broken pipe)
+failed
+git-annex: move: 4 failed
+</pre>
+"""]]
diff --git a/doc/bugs/transfer_in_progress_not_present_in_json_output/comment_1_ca13b034f7034deea6a8b3a295b8fdd3._comment b/doc/bugs/transfer_in_progress_not_present_in_json_output/comment_1_ca13b034f7034deea6a8b3a295b8fdd3._comment
new file mode 100644
index 000000000..054d6a029
--- /dev/null
+++ b/doc/bugs/transfer_in_progress_not_present_in_json_output/comment_1_ca13b034f7034deea6a8b3a295b8fdd3._comment
@@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="anarcat"
+ subject="comment 1"
+ date="2015-06-15T19:48:46Z"
+ content="""
+it seems this is deliberate: there's a `nojson` tag on in the source code. I am just unclear why this is the case... and there doesn't seem to be documentation in the source about the nojson function or why it is desired...
+"""]]
diff --git a/doc/todo/S3_fsck_support.mdwn b/doc/todo/S3_fsck_support.mdwn
new file mode 100644
index 000000000..328cd0d7d
--- /dev/null
+++ b/doc/todo/S3_fsck_support.mdwn
@@ -0,0 +1,55 @@
+I have (i think?) noticed that the s3 remote doesn't really do an fsck:
+
+http://source.git-annex.branchable.com/?p=source.git;a=blob;f=Remote/S3.hs;hb=HEAD#l86
+
+Besides, unless S3 does something magic and amazingly fast, the checksum is just too slow for it to be really operational:
+
+<pre>
+$ time git annex fsck -f s3 video/original/quartet_for_deafblind_h264kbs18000_24.mov
+fsck video/original/quartet_for_deafblind_h264kbs18000_24.mov (checking s3...) ok
+(recording state in git...)
+
+real 0m1.188s
+user 0m0.444s
+sys 0m0.324s
+$ time git annex fsck video/original/quartet_for_deafblind_h264kbs18000_24.mov
+fsck video/original/quartet_for_deafblind_h264kbs18000_24.mov (checksum...)
+ok
+(recording state in git...)
+
+real 3m14.478s
+user 1m55.679s
+sys 0m8.325s
+</pre>
+
+1s is barely the time for git-annex to do an HTTP request to amazon, and what is returned doesn't seem to have a checksum of any kind:
+
+<pre>
+fsck video/original/quartet_for_deafblind_h264kbs18000_24.mov (checking s3...) [2015-06-16 00:31:46 UTC] String to sign: "HEAD\n\n\nTue, 16 Jun 2015 00:31:46 GMT\n/isuma-files/SHA256E-s11855411701--ba268f1c401321db08d4cb149d73a51a10f02968687cb41f06051943b4720465.mov"
+[2015-06-16 00:31:46 UTC] Host: "isuma-files.s3.amazonaws.com"
+[2015-06-16 00:31:46 UTC] Response header 'x-amz-request-id': '9BF7B64EB5A619F3'
+[2015-06-16 00:31:46 UTC] Response header 'x-amz-id-2': '84ZO7IZ0dqJeEghADjt7hTGKGqGAWwbwwaCFVft3ama+oDOVJrvpiFjqn8EY3Z0R'
+[2015-06-16 00:31:46 UTC] Response header 'Content-Type': 'application/xml'
+[2015-06-16 00:31:46 UTC] Response header 'Transfer-Encoding': 'chunked'
+[2015-06-16 00:31:46 UTC] Response header 'Date': 'Tue, 16 Jun 2015 00:32:10 GMT'
+[2015-06-16 00:31:46 UTC] Response header 'Server': 'AmazonS3'
+[2015-06-16 00:31:46 UTC] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
+ok
+</pre>
+
+did i miss something? are there fsck checks for s3 remotes?
+
+if not, i think it would be useful to leverage the "md5summing" functionality that the S3 API provides. there are two relevant stackoverflow responses here:
+
+http://stackoverflow.com/questions/1775816/how-to-get-the-md5sum-of-a-file-on-amazons-s3
+http://stackoverflow.com/questions/8618218/amazon-s3-checksum
+
+... to paraphrase: when a file is `PUT` on S3, one can provide a `Content-MD5` header that S3 will check against the uploaded file content for corruption, when doing the upload. then there is some talk about how the `ETag` header *may* hold the MD5, but that seems inconclusive. There's a specific API call for getting the MD5 sum:
+
+https://docs.aws.amazon.com/AWSAndroidSDK/latest/javadoc/com/amazonaws/services/s3/model/ObjectMetadata.html#getContentMD5()
+
+the android client also happens to check with that API on downloads:
+
+https://github.com/aws/aws-sdk-android/blob/4de3a3146d66d9ab5684eb5e71d5a2cef9f4dec9/aws-android-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1302
+
+now of course MD5 is a pile of dung nowadays, but having that checksum beats not having any checksum at all. *and* it is at no cost on the client side... --[[anarcat]]