summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2017-08-31 12:16:22 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2017-08-31 12:16:22 -0400
commit5fd44282fb15c51a59a2616d01988ae98fe58da4 (patch)
tree85d462cbe3b6b9801ae3f8d5ce7f0cb101bdf843
parent44d71938c6535f207932799ea3e231dc78bcb8de (diff)
parentc2c45406ff7e0a94617429d5ea95acb4c23a0f86 (diff)
Merge branch 'master' into export
-rw-r--r--doc/bugs/get_-J___34__fails__34___to_get_files_with_the_same_key.mdwn48
-rw-r--r--doc/design/exporting_trees_to_special_remotes.mdwn30
-rw-r--r--doc/devblog/day_466__export_prototype.mdwn6
-rw-r--r--doc/forum/huge_text_files___40__not_binary__41___-_compress/comment_2_44eb5f14dbf3c5061afeb847958f27fe._comment9
-rw-r--r--doc/internals.mdwn15
-rw-r--r--doc/todo/Invert_remote_selection/comment_2_9ad4c9b2217f739e67198d16d14d32e7._comment12
6 files changed, 109 insertions, 11 deletions
diff --git a/doc/bugs/get_-J___34__fails__34___to_get_files_with_the_same_key.mdwn b/doc/bugs/get_-J___34__fails__34___to_get_files_with_the_same_key.mdwn
new file mode 100644
index 000000000..fade3b331
--- /dev/null
+++ b/doc/bugs/get_-J___34__fails__34___to_get_files_with_the_same_key.mdwn
@@ -0,0 +1,48 @@
+### What steps will reproduce the problem?
+
+ask annex get in parallel files which point to the same key
+
+### What version of git-annex are you using? On what operating system?
+
+6.20170815+gitg22da64d0f-1~ndall+1
+
+### Please provide any additional information below.
+
+[[!format sh """
+# works in serial mode
+
+$> git annex get rh.white{,_avg}
+get rh.white (from web...)
+/mnt/btrfs/scrap/tmp/ds0001 100%[===========================================>] 360.31K --.-KB/s in 0.1s
+2017-08-30 10:08:02 URL:https://dl.dropboxusercontent.com/s/0lww4tomnwfanwd/rh.white_avg?dl=0 [368962/368962] -> "/mnt/btrfs/scrap/tmp/ds000114/derivatives/freesurfer/.git/annex/tmp/MD5E-s368962--99a4db61cedffee686aef99b2d197794" [1]
+(checksum...) ok
+(recording state in git...)
+(dev)2 10016.....................................:Wed 30 Aug 2017 10:08:02 AM EDT:.
+(git)smaug:…/btrfs/scrap/tmp/ds000114/derivatives/freesurfer[master]fsaverage5/surf
+$> git annex drop --fast rh.white{,_avg}
+drop rh.white (checking https://dl.dropbox.com/s/0lww4tomnwfanwd/rh.white_avg?dl=0...) ok
+(recording state in git...)
+
+# "fails" in parallel
+$> git annex get -J2 rh.white{,_avg}
+get rh.white get rh.white_avg (transfer already in progress, or unable to take transfer lock)
+ Unable to access these remotes: web
+(from web...)
+
+ Try making some of these repositories available:
+ 00000000-0000-0000-0000-000000000001 -- web
+ 5e47b3f3-f09c-4969-8885-920a49ff8a45 -- yoh@smaug:/mnt/btrfs/datasets/datalad/crawl/workshops/nih-workshop-2017/ds000114/derivatives/freesurfer
+failed
+/mnt/btrfs/scrap/tmp/ds0001 100%[===========================================>] 360.31K 1.63MB/s in 0.2s
+2017-08-30 10:08:21 URL:https://dl.dropboxusercontent.com/s/0lww4tomnwfanwd/rh.white_avg?dl=0 [368962/368962] -> "/mnt/btrfs/scrap/tmp/ds000114/derivatives/freesurfer/.git/annex/tmp/MD5E-s368962--99a4db61cedffee686aef99b2d197794" [1]
+(checksum...) ok
+(recording state in git...)
+git-annex: get: 1 failed
+(dev)2 10018 ->1.....................................:Wed 30 Aug 2017 10:08:21 AM EDT:.
+
+"""]]
+
+so at the end we get a run of git-annex which exits with error 1... and in json mode also the error(s) reported etc.
+I wondered if annex should first analyze passed paths to get actual keys to be fetched?
+
+[[!meta author=yoh]]
diff --git a/doc/design/exporting_trees_to_special_remotes.mdwn b/doc/design/exporting_trees_to_special_remotes.mdwn
index 9327b475f..ce7431141 100644
--- a/doc/design/exporting_trees_to_special_remotes.mdwn
+++ b/doc/design/exporting_trees_to_special_remotes.mdwn
@@ -35,6 +35,11 @@ To export a treeish, the user can run:
That does all necessary uploads etc to make the special remote contain
the tree of files. The treeish can be a tag, a branch, or a tree.
+If a file's content is not present, it won't be exported. Re-running the
+same export later should export files whose content has become present.
+(This likely means a second pass, and needs location tracking to track
+which files are in the export.)
+
Users may sometimes want to export multiple treeishes to a single special
remote. For example, exporting several tags. This interface could be
complicated to support that, putting the treeishes in subdirectories on the
@@ -144,9 +149,13 @@ when using any of the above.
## location tracking
+Since not all the files in an exported treeish may have content
+present when the export is done, location tracking will be needed so that
+getting the files and exporting again transfers their content.
+
Does a copy of a file exported to a special remote count as a copy
of a file as far as [[numcopies]] goes? Should git-annex get download
-a file from an export? Or should exporting not update location tracking?
+a file from an export?
The problem is that special remotes with exports are not
key/value stores. The content of a file can change, and if multiple
@@ -206,22 +215,23 @@ there would be a merge conflict. Union merging would *scramble* the exported
tree, so even if a smart merge is added, old versions of git-annex would
corrupt the exported tree.
-To avoid that problem, add a log file `exported/uuid.log` that lists
-the sha1 of the exported tree and the uuid of the repository that exported it.
+To avoid that problem, add a log file `export.log` that contains the uuid
+of the remote that was exported to, and the sha1 of the exported tree.
To avoid the exported tree being GCed, do graft it in to the git-annex
branch, but follow that with a commit that removes the tree again,
and only update `refs/heads/git-annex` after making both commits.
-If `exported/uuid.log` contains multiple active exports, there was an
-export conflict. Short of downloading the whole export to checksum it,
-or deleting the whole export, what can be done to resolve it?
+If `export.log` contains multiple active exports of different trees,
+there was an export conflict. Short of downloading the whole export to
+checksum it, or deleting the whole export, what can be done to resolve it?
In this case, git-annex knows both exported trees. Have the user provide
a tree that resolves the conflict as they desire (it could be the same as
-one of the exported trees, or some merge of them). Then diff each exported
-tree in turn against the resolving tree. If a file differs, re-export that
-file. In some cases this will do unncessary re-uploads, but it's reasonably
-efficient.
+one of the exported trees, or some merge of them or an entirely new tree).
+The UI to do this can just be another `git annex export $tree --to remote`.
+To resolve, diff each exported tree in turn against the resolving tree. If a
+file differs, re-export that file. In some cases this will do unncessary
+re-uploads, but it's reasonably efficient.
The documentation should suggest strongly only exporting to a given special
remote from a single repository, or having some other rule that avoids
diff --git a/doc/devblog/day_466__export_prototype.mdwn b/doc/devblog/day_466__export_prototype.mdwn
new file mode 100644
index 000000000..cdc1926f8
--- /dev/null
+++ b/doc/devblog/day_466__export_prototype.mdwn
@@ -0,0 +1,6 @@
+Put together a prototype of `git annex export` in the "export" branch.
+Exporting to a directory special remote is basically working, but this is
+only the beginning.
+
+Today's work was sponsored by Jake Vosloo on
+[Patreon](https://patreon.com/joeyh/)
diff --git a/doc/forum/huge_text_files___40__not_binary__41___-_compress/comment_2_44eb5f14dbf3c5061afeb847958f27fe._comment b/doc/forum/huge_text_files___40__not_binary__41___-_compress/comment_2_44eb5f14dbf3c5061afeb847958f27fe._comment
new file mode 100644
index 000000000..a1bee43df
--- /dev/null
+++ b/doc/forum/huge_text_files___40__not_binary__41___-_compress/comment_2_44eb5f14dbf3c5061afeb847958f27fe._comment
@@ -0,0 +1,9 @@
+[[!comment format=mdwn
+ username="vgp"
+ avatar="http://cdn.libravatar.org/avatar/b332bfc1d3f49c196e1bff84b53d0f8b"
+ subject="comment 2"
+ date="2017-08-30T12:42:22Z"
+ content="""
+Thanks for your comments joey!
+In fact, compress files in the working tree is not mandatory. The main question is compress then in the git server (quota reasons). When we were using only git, it was slow (caused by huge files) but the files were compressed. Now, using git-annex, the operations are faster but the size of the repository increases a lot (due to lack of compression) and that is the problem once we've reached the disk quota in the git server.
+"""]]
diff --git a/doc/internals.mdwn b/doc/internals.mdwn
index 4ed8001d4..7d39b1068 100644
--- a/doc/internals.mdwn
+++ b/doc/internals.mdwn
@@ -176,10 +176,23 @@ File format is identical to preferred-content.log.
Contains standard preferred content settings for groups. (Overriding or
supplementing the ones built into git-annex.)
-The file format is one line per group, staring with a timestamp, then a
+The file format is one line per group, starting with a timestamp, then a
space, then the group name followed by a space and then the preferred
content expression.
+## `export.log`
+
+Tracks what trees have been exported to special remotes by
+[[git-annex-export]](1).
+
+Each line starts with a timestamp, then the uuid of the special remote,
+followed by the sha1 of the tree that was exported to that special remote.
+
+(The exported tree is also grafted into the git-annex branch, at
+`export.tree`, to prevent git from garbage collecting it. However, the head
+of the git-annex branch should never contain such a grafted in tree;
+the grafted tree is removed in the same commit that updates `export.log`.)
+
## `aaa/bbb/*.log`
These log files record [[location_tracking]] information
diff --git a/doc/todo/Invert_remote_selection/comment_2_9ad4c9b2217f739e67198d16d14d32e7._comment b/doc/todo/Invert_remote_selection/comment_2_9ad4c9b2217f739e67198d16d14d32e7._comment
new file mode 100644
index 000000000..5acf39f95
--- /dev/null
+++ b/doc/todo/Invert_remote_selection/comment_2_9ad4c9b2217f739e67198d16d14d32e7._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="supernaught"
+ avatar="http://cdn.libravatar.org/avatar/55f92a50f2617099e2dc7509130ce158"
+ subject="comment 2"
+ date="2017-08-28T22:01:23Z"
+ content="""
+It's not very ergonomic to type out so much each for each sync, but I suppose it technically accomplishes the idea.
+
+Still -- wouldn't making '\!x' alias to '-c remote.x.annex-sync=false' have minimal impact and provide a bit more symmetry with the matching-options?
+
+I'm not familiar with Haskell, but could probably fumble my way through this one. Would you accept a patch?
+"""]]