[[!comment format=mdwn username="zardoz" ip="78.48.163.229" subject="comment 5" date="2014-08-18T20:54:10Z" content=""" > True of a single filesystem, but not of a set of connected git repositories. That’s a good point. Might depend on the use case, though. > The probabilities of these seem low, but perhaps not as low as the probability that there will be two differing files with the same name+size+mtime in the first place. This one I’m not completely sure about. E.g., I have an annex with web pages mirrored from the web. Due to the crawler implementation, there are lots of «index.html» or «favicon.ico» with the same mtime (in particular when mtime is read with a 1 sec. precision). Files like favicon are often bitmaps of the same resolution and often have the same size due to this. Because there are file-formats where both size and basename are semantically pre-determined, there is zero entropy from these sources alone (also cf. «readme.txt»). The entropy of mtime alone is not really large, I suppose, and in some use-cases will also approach zero (think «initializing a repo by cp -r on a fast disk without preserving mtime). The relative path could make a huge difference there. I believe this argument is actually what worried me the most. Does it seem valid? Apart from entropy, there’s the non-probabilistic advantage we discussed (granted, with some limiting constraints which one has to assure for oneself). Granted, one might argue a hash would be the better way, but this is not always practical in every setup. """]]