diff options
author | 2014-05-29 15:23:05 -0400 | |
---|---|---|
committer | 2014-05-29 15:23:05 -0400 | |
commit | 1f6cfecc972b121fa42ea80383183bbaccc2195a (patch) | |
tree | 0a450c4226f5e05c2a3597a9f520376de281fffe /doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment | |
parent | a95fb731cd117f35a6e0fce90d9eb35d0941e26e (diff) |
remove old closed bugs and todo items to speed up wiki updates and reduce size
Remove closed bugs and todos that were least edited before 2014.
Command line used:
for f in $(grep -l '\[\[done\]\]' *.mdwn); do if [ -z $(git log --since=2014 --pretty=oneline "$f") ]; then git rm $f; git rm -rf $(echo "$f" | sed 's/.mdwn$//'); fi; done
Diffstat (limited to 'doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment')
-rw-r--r-- | doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment | 68 |
1 files changed, 0 insertions, 68 deletions
diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment deleted file mode 100644 index 5dbb66cf6..000000000 --- a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_10_d78d79fb2f3713aa69f45d2691cf8dfe._comment +++ /dev/null @@ -1,68 +0,0 @@ -[[!comment format=mdwn - username="http://adamspiers.myopenid.com/" - nickname="Adam" - subject="comment 10" - date="2011-12-23T17:22:11Z" - content=""" -> Your perl script is not O(n). Inserting into perl hash tables has -> overhead of minimum O(n log n). - -What's your source for this assertion? I would expect an amortized -average of `O(1)` per insertion, i.e. `O(n)` for full population. - -> Not counting the overhead of resizing hash tables, -> the grevious slowdown if the bucket size is overcome by data (it -> probably falls back to a linked list or something then), and the -> overhead of traversing the hash tables to get data out. - -None of which necessarily change the algorithmic complexity. However -real benchmarks are far more useful here than complexity analysis, and -[the dangers of premature optimization](http://c2.com/cgi/wiki?PrematureOptimization) -should not be forgotten. - -> Your memory size calculations ignore the overhead of a hash table or -> other data structure to store the data in, which will tend to be -> more than the actual data size it's storing. I estimate your 50 -> million number is off by at least one order of magnitude, and more -> likely two; - -Sure, I was aware of that, but my point still stands. Even 500k keys -per 1GB of RAM does not sound expensive to me. - -> in any case I don't want git-annex to use 1 gb of ram. - -Why not? What's the maximum it should use? 512MB? 256MB? -32MB? I don't see the sense in the author of a program -dictating thresholds which are entirely dependent on the context -in which the program is *run*, not the context in which it's *written*. -That's why systems have files such as `/etc/security/limits.conf`. - -You said you want git-annex to scale to enormous repositories. If you -impose an arbitrary memory restriction such as the above, that means -avoiding implementing *any* kind of functionality which requires `O(n)` -memory or worse. Isn't it reasonable to assume that many users use -git-annex on repositories which are *not* enormous? Even when they do -work with enormous repositories, just like with any other program, -they would naturally expect certain operations to take longer or -become impractical without sufficient RAM. That's why I say that this -restriction amounts to throwing out the baby with the bathwater. -It just means that those who need the functionality would have to -reimplement it themselves, assuming they are able, which is likely -to result in more wheel reinventions. I've already shared -[my implementation](https://github.com/aspiers/git-config/blob/master/bin/git-annex-finddups) -but how many people are likely to find it, let alone get it working? - -> Little known fact: sort(1) will use a temp file as a buffer if too -> much memory is needed to hold the data to sort. - -Interesting. Presumably you are referring to some undocumented -behaviour, rather than `--batch-size` which only applies when merging -multiple files, and not when only sorting STDIN. - -> It's also written in the most efficient language possible and has -> been ruthlessly optimised for 30 years, so I would be very surprised -> if it was not the best choice. - -It's the best choice for sorting. But sorting purely to detect -duplicates is a dismally bad choice. -"""]] |