summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment35
1 files changed, 35 insertions, 0 deletions
diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment
new file mode 100644
index 000000000..a218ee3d5
--- /dev/null
+++ b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment
@@ -0,0 +1,35 @@
+[[!comment format=mdwn
+ username="http://adamspiers.myopenid.com/"
+ nickname="Adam"
+ subject="List the duplicate filenames, then let the user decide what to do"
+ date="2011-12-22T12:31:29Z"
+ content="""
+I have the same use case as Asheesh but I want to be able to see which filenames point to the same objects and then decide which of the duplicates to drop myself. I think
+
+ git annex drop --by-contents
+
+would be the wrong approach because how does git-annex know which ones to drop? There's too much potential for error.
+
+Instead it would be great to have something like
+
+ git annex finddups
+
+While it's easy enough to knock up a bit of shell or Perl to achieve this, that relies on knowledge of the annex symlink structure, so I think really it belongs inside git-annex.
+
+If this command gave output similar to the excellent `fastdup` utility:
+
+ Scanning for files... 672 files in 10.439 seconds
+ Comparing 2 sets of files...
+
+ 2 files (70.71 MB/ea)
+ /home/adam/media/flat/tour/flat-tour.3gp
+ /home/adam/videos/tour.3gp
+
+ Found 1 duplicate of 1 file (70.71 MB wasted)
+ Scanned 672 files (1.96 GB) in 11.415 seconds
+
+then you could do stuff like
+
+ git annex finddups | grep /home/adam/media/flat | xargs rm
+
+"""]]