summaryrefslogtreecommitdiff
path: root/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2011-04-09 16:12:32 -0400
committerGravatar Joey Hess <joey@kitenet.net>2011-04-09 16:12:32 -0400
commitad7f87880e0aca651162e8960b398f6c8d52e7a9 (patch)
treeaa6feda38b830730368f605ca33dad6814324276 /doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates
parentde14252f780fb0259dbd9e6f1fa71c1f092a2135 (diff)
move wishlists to todo
Diffstat (limited to 'doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates')
-rw-r--r--doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_1_fd213310ee548d8726791d2b02237fde._comment29
-rw-r--r--doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_2_4394bde1c6fd44acae649baffe802775._comment18
2 files changed, 47 insertions, 0 deletions
diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_1_fd213310ee548d8726791d2b02237fde._comment b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_1_fd213310ee548d8726791d2b02237fde._comment
new file mode 100644
index 000000000..094e4526e
--- /dev/null
+++ b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_1_fd213310ee548d8726791d2b02237fde._comment
@@ -0,0 +1,29 @@
+[[!comment format=mdwn
+ username="http://joey.kitenet.net/"
+ nickname="joey"
+ subject="comment 1"
+ date="2011-01-27T18:29:44Z"
+ content="""
+Hey Asheesh, I'm happy you're finding git-annex useful.
+
+So, there are two forms of duplication going on here. There's duplication of the content, and duplication of the filenames
+pointing at that content.
+
+Duplication of the filenames is probably not a concern, although it's what I thought you were talking about at first. It's probably info worth recording that backup-2010/some_dir/foo and backup-2009/other_dir/foo are two names you've used for the same content in the past. If you really wanted to remove backup-2009/foo, you could do it by writing a script that looks at the basenames of the symlink targets and removes files that point to the same content as other files.
+
+Using SHA1 ensures that the same key is used for identical files, so generally avoids duplication of content. But if you have 2 disks with an identical file on each, and make them both into annexes, then git-annex will happily retain both
+copies of the content, one per disk. It generally considers keeping copies of content a good thing. :)
+
+So, what if you want to remove the unnecessary copies? Well, there's a really simple way:
+
+<pre>
+cd /media/usb-1
+git remote add other-disk /media/usb-0
+git annex add
+git annex drop
+</pre>
+
+This asks git-annex to add everything to the annex, but then remove any file contents that it can safely remove. What can it safely remove? Well, anything that it can verify is on another repository such as \"other-disk\"! So, this will happily drop any duplicated file contents, while leaving all the rest alone.
+
+In practice, you might not want to have all your old backup disks mounted at the same time and configured as remotes. Look into configuring [[trust]] to avoid needing do to that. If usb-0 is already a trusted disk, all you need is a simple \"git annex drop\" on usb-1.
+"""]]
diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_2_4394bde1c6fd44acae649baffe802775._comment b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_2_4394bde1c6fd44acae649baffe802775._comment
new file mode 100644
index 000000000..04d58a459
--- /dev/null
+++ b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_2_4394bde1c6fd44acae649baffe802775._comment
@@ -0,0 +1,18 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawkjvjLHW9Omza7x1VEzIFQ8Z5honhRB90I"
+ nickname="Asheesh"
+ subject="I actually *do* want to avoid duplication of filenames"
+ date="2011-01-28T07:30:05Z"
+ content="""
+I really do want just one filename per file, at least for some cases.
+
+For my photos, there's no benefit to having a few filenames point to the same file. As I'm putting them all into the git-annex, that is a good time to remove the pure duplicates so that I don't e.g. see them twice when browsing the directory as a gallery. Also, I am uploading my photos to the web, and I want to avoid uploading the same photo (by content) twice.
+
+I hope that makes things clearer!
+
+For now I'm just doing this:
+
+* paulproteus@renaissance:/mnt/backups-terabyte/paulproteus/sd-card-from-2011-01-06/sd-cards/DCIM/100CANON $ for file in *; do hash=$(sha1sum \"$file\"); if ls /home/paulproteus/Photos/in-flickr/.git-annex | grep -q \"$hash\"; then echo already annexed ; else flickr_upload \"$file\" && mv \"$file\" \"/home/paulproteus/Photos/in-flickr/2011-01-28/from-some-nested-sd-card-bk\" && (cd /home/paulproteus/Photos/in-flickr/2011-01-28/from-some-nested-sd-card-bk && git annex add . && git commit -m ...) ; fi; done
+
+(Yeah, Flickr for my photos for now. I feel sad about betraying the principle of autonomo.us-ness.)
+"""]]