diff options
author | Joey Hess <joeyh@joeyh.name> | 2017-09-06 15:33:40 -0400 |
---|---|---|
committer | Joey Hess <joeyh@joeyh.name> | 2017-09-06 15:44:10 -0400 |
commit | 6fcefbdb6629c3e94c41bc05a6b7c224ade99ba0 (patch) | |
tree | dba0a4b10efa30c3fe491c5163a2942eda56eb69 /doc | |
parent | 9dd2651e8e5efbbf3a9cc59cab3afa1fef7446f2 (diff) |
export file renaming
This is seriously super hairy. It has to handle interrupted exports,
which may be resumed with the same or a different tree. It also has to
recover from export conflicts, which could cause the wrong content
to be renamed to a file.
I think this works, or is close to working. See the update to the design
for how it works.
This is definitely not optimal, in that it does more renames than are
necessary. It would probably be worth finding the keys that are really
renamed and only renaming those. But let's get the "simple" approach to
work first..
This commit was supported by the NSF-funded DataLad project.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/design/exporting_trees_to_special_remotes.mdwn | 44 | ||||
-rw-r--r-- | doc/git-annex-export.mdwn | 19 |
2 files changed, 56 insertions, 7 deletions
diff --git a/doc/design/exporting_trees_to_special_remotes.mdwn b/doc/design/exporting_trees_to_special_remotes.mdwn index 0469a4fcc..a8247d2b9 100644 --- a/doc/design/exporting_trees_to_special_remotes.mdwn +++ b/doc/design/exporting_trees_to_special_remotes.mdwn @@ -205,7 +205,7 @@ a tree that resolves the conflict as they desire (it could be the same as one of the exported trees, or some merge of them or an entirely new tree). The UI to do this can just be another `git annex export $tree --to remote`. To resolve, diff each exported tree in turn against the resolving tree -and delete all files that differ. +and delete all files that differ. Then, upload all missing files. ## when to update export.log for efficient resuming of exports @@ -256,18 +256,48 @@ tree, so no state needs to be maintained to clean it up. Also, using the key in the name simplifies calculation of complicated renames (eg, renaming A to B, B to C, C to A) -Export can first try to rename the temp name of all keys -whose files are added in the diff. Followed by deleting the temp name -of all keys whose files are removed in the diff. That is more renames and +Export can first try to rename all files that are deleted/modified +to their key's temp name (falling back to deleting since not all +special remotes support rename), and then, in a second pass, rename +from the temp name to the new name. Followed by deleting the temp name +of all keys whose files are deleted in the diff. That is more renames and deletes than strictly necessary, but it will statelessly clean up an interruped export as long as it's run again with the same new tree. But, an export of tree B should clean up after an interrupted export of tree A. Some state is needed to handle this. Before starting the export of tree A, record it somewhere. Then when -resuming, diff A..B, and rename/delete the temp names of the keys in the -diff. As well as diffing from the last fully exported tree to B and doing -the same rename/delete. +resuming, diff A..B, and delete the temp names of the keys in the +diff. (Can't rename here, because we don't know what was the content +of a file when an export was interrupted.) So, before an export does anything, need to record the tree that's about to be exported to export.log, not as an exported tree, but as a goal. + +## renames and export conflicts + +What is there's an export conflict going on at the same time that a file +in the export gets renamed? + +Suppose that there are two git repos A and B, each exporting to the same +remote. A and B are not currently communicating. A exports T1 which +contains F. B exports T2, which has a different content for F. + +Then A exports T3, which renames F to G. If that rename is done +on the remote, then A will think it's successfully exported T3, +but G will have F's content from T2, not from T1. + +When A and B reconnect, the export conflict will be detected. +To resolve the export conflict, it says above to: + +> To resolve, diff each exported tree in turn against the resolving tree +> and delete all files that differ. Then, upload all missing files. + +Assume that the resolving tree is T3. So B's export of T2 is diffed against +T3. F differs and is deleted (no change). G differs and is deleted, +which fixes up the problem that the wrong content was renamed to G. +G is missing so gets uploaded. + +So, this works, as long as "delete all files that differ" means it +deletes both old and new files. And as long as conflict resolution does not +itself stash away files in the temp name for later renaming. diff --git a/doc/git-annex-export.mdwn b/doc/git-annex-export.mdwn index c8d8eac9a..e3cbcbd7a 100644 --- a/doc/git-annex-export.mdwn +++ b/doc/git-annex-export.mdwn @@ -31,6 +31,25 @@ verification of content downloaded from an export. Some types of keys, that are not based on checksums, cannot be downloaded from an export. And, git-annex will never trust an export to retain the content of a key. +# EXPORT CONFLICTS + +If two different git-annex repositories are both exporting different trees +to the same special remote, it's possible for an export conflict to occur. +This leaves the special remote with some files from one tree, and some +files from the other. Files in the special remote may have entirely the +wrong content as well. + +It's not possible for git-annex to detect when making an export will result +in an export conflict. The best way to avoid export conflicts is to either +only ever export to a special remote from a single repository, or to have a +rule about the tree that you export to the special remote. For example, if +you always export origin/master after pushing to origin, then an export +conflict can't happen. + +An export conflict can only be detected after the two git repositories +that produced it get back in sync. Then the next time you run `git annex +export`, it will detect the export conflict, and resolve it. + # SEE ALSO [[git-annex]](1) |