From 97bef4af733ac8c42adfe6809ba6ae7269530473 Mon Sep 17 00:00:00 2001 From: "http://adamspiers.myopenid.com/" Date: Thu, 22 Dec 2011 12:31:36 +0000 Subject: Added a comment: List the duplicate filenames, then let the user decide what to do --- ...ent_4_f120d1e83c1a447f2ecce302fc69cf74._comment | 35 ++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment (limited to 'doc') diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment new file mode 100644 index 000000000..a218ee3d5 --- /dev/null +++ b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment @@ -0,0 +1,35 @@ +[[!comment format=mdwn + username="http://adamspiers.myopenid.com/" + nickname="Adam" + subject="List the duplicate filenames, then let the user decide what to do" + date="2011-12-22T12:31:29Z" + content=""" +I have the same use case as Asheesh but I want to be able to see which filenames point to the same objects and then decide which of the duplicates to drop myself. I think + + git annex drop --by-contents + +would be the wrong approach because how does git-annex know which ones to drop? There's too much potential for error. + +Instead it would be great to have something like + + git annex finddups + +While it's easy enough to knock up a bit of shell or Perl to achieve this, that relies on knowledge of the annex symlink structure, so I think really it belongs inside git-annex. + +If this command gave output similar to the excellent `fastdup` utility: + + Scanning for files... 672 files in 10.439 seconds + Comparing 2 sets of files... + + 2 files (70.71 MB/ea) + /home/adam/media/flat/tour/flat-tour.3gp + /home/adam/videos/tour.3gp + + Found 1 duplicate of 1 file (70.71 MB wasted) + Scanned 672 files (1.96 GB) in 11.415 seconds + +then you could do stuff like + + git annex finddups | grep /home/adam/media/flat | xargs rm + +"""]] -- cgit v1.2.3