summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2015-03-26 11:44:20 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2015-03-26 11:44:20 -0400
commit4267f36f058eb88b3d18613acee34d8eb4bb4e4f (patch)
tree0e0c416cb0ad7456dbe5f1818bca8182e8460fad
parent71d824d16cd928693d149950a8eabf2652243350 (diff)
improve import duplicate docs
-rw-r--r--doc/git-annex-import.mdwn28
-rw-r--r--doc/todo/inject_on_import/comment_2_205ecbc7401f99fc83719acbf5da174e._comment26
2 files changed, 43 insertions, 11 deletions
diff --git a/doc/git-annex-import.mdwn b/doc/git-annex-import.mdwn
index 4d2c05547..43e619607 100644
--- a/doc/git-annex-import.mdwn
+++ b/doc/git-annex-import.mdwn
@@ -13,11 +13,18 @@ the annex. Individual files to import can be specified.
If a directory is specified, the entire directory is imported.
git annex import /media/camera/DCIM/*
-
-By default, importing two files with the same contents from two different
-locations will result in both files being added to the repository.
-(With all checksumming backends, including the default SHA256E,
-only one copy of the data will be stored.)
+
+When importing files, there's a possibility of importing a duplicate
+of a file that is already known to git-annex -- its content is either
+present in the local repository already, or git-annex knows of anther
+repository that contains it.
+
+By default, importing a duplicate of a known file will result in
+a new filename being added to the repository, so the duplicate file
+is present in the repository twice. (With all checksumming backends,
+including the default SHA256E, only one copy of the data will be stored.)
+
+Several options can be used to adjust handling of duplicate files.
# OPTIONS
@@ -32,19 +39,18 @@ only one copy of the data will be stored.)
* `--deduplicate`
- Only import files whose content has not been seen before by git-annex.
-
- Duplicate files will be deleted from the import location.
+ Only import files that are not duplicates;
+ duplicate files will be deleted from the import location.
* `--skip-duplicates`
- Only import files whose content has not been seen before by git-annex,
- but avoid deleting duplicate files.
+ Only import files that are not duplicates; and avoid deleting
+ duplicate files from the import location.
* `--clean-duplicates`
Does not import any files, but any files found in the import location
- that are duplicates of content in the annex are deleted.
+ that are duplicates are deleted.
* file matching options
diff --git a/doc/todo/inject_on_import/comment_2_205ecbc7401f99fc83719acbf5da174e._comment b/doc/todo/inject_on_import/comment_2_205ecbc7401f99fc83719acbf5da174e._comment
new file mode 100644
index 000000000..acd661feb
--- /dev/null
+++ b/doc/todo/inject_on_import/comment_2_205ecbc7401f99fc83719acbf5da174e._comment
@@ -0,0 +1,26 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2015-03-26T15:28:45Z"
+ content="""
+Well, you've found an edge case here.
+
+It behaves as documented as long as the file being imported is located in some
+repository know to git-annex. The file content does not have to be present in
+the local repository for it to behave as documented.
+
+In your case, the file being imported has a symlink in the git repo, but
+git-annex knows about 0 annexed copies of the file, so it's treated as
+if it's a new file and not a duplicate.
+
+Since import is working at the key level, there's not a good way to look up
+that there are some symlinks in the git repo even though the content is
+gone. And even if there was, I think I'd be uncomfortable with it deleting
+the file as "duplicate" when its content is not available in any known
+repository. The only behavior improvement might be to import the content
+but not make a redundant symlink in this case.
+
+I think it's best to change the documentation. I've added a new
+paragraph that more exactly and clearly explains what duplicate files
+are for the purposes of importing.
+"""]]