summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2017-09-06 13:04:09 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2017-09-06 13:04:09 -0400
commitf1b255623bc026d1480d44808cfc30507537cda1 (patch)
treeb0c9a52f93138726a417356e260e660a81772316 /doc
parente678f8e94e31662ea138f3ebe29b59b49a11143a (diff)
thoughts on handling renames efficiently
This gets complicated, but I think this design will work! This commit was supported by the NSF-funded DataLad project.
Diffstat (limited to 'doc')
-rw-r--r--doc/design/exporting_trees_to_special_remotes.mdwn42
-rw-r--r--doc/todo/export.mdwn6
2 files changed, 39 insertions, 9 deletions
diff --git a/doc/design/exporting_trees_to_special_remotes.mdwn b/doc/design/exporting_trees_to_special_remotes.mdwn
index 7ff1df870..0469a4fcc 100644
--- a/doc/design/exporting_trees_to_special_remotes.mdwn
+++ b/doc/design/exporting_trees_to_special_remotes.mdwn
@@ -237,11 +237,37 @@ for the current treeish. (Unless a conflicting export was made from
elsewhere, but in that case, the conflict resolution will have to fix up
later.)
-Efficient resuming can then first check if the location log says the
-export contains the content. (If not, transfer a copy.) If the location
-log says the export contains the content, use CHECKPRESENTEXPORT to see if
-the file exists, and if not transfer a copy. The CHECKPRESENTEXPORT check
-deals with the case where the treeish has two files with the same content.
-If we have a key-to-files map for the export, then we can skip the
-CHECKPRESENTEXPORT check when there's only one file using a key. So,
-resuming can be quite efficient.
+## handling renames efficiently
+
+To handle two files that swap names, a temp name is required.
+
+Difficulty with a temp name is picking a name that won't ever be used by
+any exported file.
+
+Interrupted exports also complicate this. While a name could be picked that
+is in neither the old nor the new tree, an export could be interrupted,
+leaving the file at the temp name. There needs to be something to clean
+that up when the export is resumed, even if it's resumed with a different
+tree.
+
+Could use something like ".git-annex-tmp-content-$key" as the temp name.
+This hides it from casual view, which is good, and it's not depedent on the
+tree, so no state needs to be maintained to clean it up. Also, using the
+key in the name simplifies calculation of complicated renames (eg, renaming
+A to B, B to C, C to A)
+
+Export can first try to rename the temp name of all keys
+whose files are added in the diff. Followed by deleting the temp name
+of all keys whose files are removed in the diff. That is more renames and
+deletes than strictly necessary, but it will statelessly clean up
+an interruped export as long as it's run again with the same new tree.
+
+But, an export of tree B should clean up after
+an interrupted export of tree A. Some state is needed to handle this.
+Before starting the export of tree A, record it somewhere. Then when
+resuming, diff A..B, and rename/delete the temp names of the keys in the
+diff. As well as diffing from the last fully exported tree to B and doing
+the same rename/delete.
+
+So, before an export does anything, need to record the tree that's about
+to be exported to export.log, not as an exported tree, but as a goal.
diff --git a/doc/todo/export.mdwn b/doc/todo/export.mdwn
index 5813cd869..f345534e8 100644
--- a/doc/todo/export.mdwn
+++ b/doc/todo/export.mdwn
@@ -19,7 +19,11 @@ Work is in progress. Todo list:
* `git annex get --from export` works in the repo that exported to it,
but in another repo, the export db won't be populated, so it won't work.
- Maybe just show a useful error message in this case?
+ Maybe just show a useful error message in this case?
+ However, exporting from one repository and then trying to update the
+ export from another repository also doesn't work right, because the
+ export database is not populated. So, seems that the export database needs
+ to get populated based on the export log in these cases.
* Efficient handling of renames.
* Support export to aditional special remotes (S3 etc)
* Support export to external special remotes.