aboutsummaryrefslogtreecommitdiff
path: root/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-05-29 15:23:05 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-05-29 15:23:05 -0400
commit1f6cfecc972b121fa42ea80383183bbaccc2195a (patch)
tree0a450c4226f5e05c2a3597a9f520376de281fffe /doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment
parenta95fb731cd117f35a6e0fce90d9eb35d0941e26e (diff)
remove old closed bugs and todo items to speed up wiki updates and reduce size
Remove closed bugs and todos that were least edited before 2014. Command line used: for f in $(grep -l '\[\[done\]\]' *.mdwn); do if [ -z $(git log --since=2014 --pretty=oneline "$f") ]; then git rm $f; git rm -rf $(echo "$f" | sed 's/.mdwn$//'); fi; done
Diffstat (limited to 'doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment')
-rw-r--r--doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment68
1 files changed, 0 insertions, 68 deletions
diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment
deleted file mode 100644
index 5dbb66cf6..000000000
--- a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment
+++ /dev/null
@@ -1,68 +0,0 @@
-[[!comment format=mdwn
- username="http://adamspiers.myopenid.com/"
- nickname="Adam"
- subject="comment 10"
- date="2011-12-23T17:22:11Z"
- content="""
-> Your perl script is not O(n). Inserting into perl hash tables has
-> overhead of minimum O(n log n).
-
-What's your source for this assertion? I would expect an amortized
-average of `O(1)` per insertion, i.e. `O(n)` for full population.
-
-> Not counting the overhead of resizing hash tables,
-> the grevious slowdown if the bucket size is overcome by data (it
-> probably falls back to a linked list or something then), and the
-> overhead of traversing the hash tables to get data out.
-
-None of which necessarily change the algorithmic complexity. However
-real benchmarks are far more useful here than complexity analysis, and
-[the dangers of premature optimization](http://c2.com/cgi/wiki?PrematureOptimization)
-should not be forgotten.
-
-> Your memory size calculations ignore the overhead of a hash table or
-> other data structure to store the data in, which will tend to be
-> more than the actual data size it's storing. I estimate your 50
-> million number is off by at least one order of magnitude, and more
-> likely two;
-
-Sure, I was aware of that, but my point still stands. Even 500k keys
-per 1GB of RAM does not sound expensive to me.
-
-> in any case I don't want git-annex to use 1 gb of ram.
-
-Why not? What's the maximum it should use? 512MB? 256MB?
-32MB? I don't see the sense in the author of a program
-dictating thresholds which are entirely dependent on the context
-in which the program is *run*, not the context in which it's *written*.
-That's why systems have files such as `/etc/security/limits.conf`.
-
-You said you want git-annex to scale to enormous repositories. If you
-impose an arbitrary memory restriction such as the above, that means
-avoiding implementing *any* kind of functionality which requires `O(n)`
-memory or worse. Isn't it reasonable to assume that many users use
-git-annex on repositories which are *not* enormous? Even when they do
-work with enormous repositories, just like with any other program,
-they would naturally expect certain operations to take longer or
-become impractical without sufficient RAM. That's why I say that this
-restriction amounts to throwing out the baby with the bathwater.
-It just means that those who need the functionality would have to
-reimplement it themselves, assuming they are able, which is likely
-to result in more wheel reinventions. I've already shared
-[my implementation](https://github.com/aspiers/git-config/blob/master/bin/git-annex-finddups)
-but how many people are likely to find it, let alone get it working?
-
-> Little known fact: sort(1) will use a temp file as a buffer if too
-> much memory is needed to hold the data to sort.
-
-Interesting. Presumably you are referring to some undocumented
-behaviour, rather than `--batch-size` which only applies when merging
-multiple files, and not when only sorting STDIN.
-
-> It's also written in the most efficient language possible and has
-> been ruthlessly optimised for 30 years, so I would be very surprised
-> if it was not the best choice.
-
-It's the best choice for sorting. But sorting purely to detect
-duplicates is a dismally bad choice.
-"""]]