diff options
Diffstat (limited to 'doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__')
14 files changed, 277 insertions, 0 deletions
diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_10_cdbd35ab0ba9c9157fd6530bebbc4650._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_10_cdbd35ab0ba9c9157fd6530bebbc4650._comment new file mode 100644 index 000000000..0b91fe818 --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_10_cdbd35ab0ba9c9157fd6530bebbc4650._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 10""" + date="2015-11-10T18:47:36Z" + content=""" +I think what yoh meant above was that -o lock (not `sync`) has a heavy +performance impact. Or at least a perceived one. And thus, lustre clusters +are not using it even though the feature is available and would probably be +find for git-annex's use if enabled. +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_11_9425cfd2739eca4a21c27d490192c31a._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_11_9425cfd2739eca4a21c27d490192c31a._comment new file mode 100644 index 000000000..5608afa30 --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_11_9425cfd2739eca4a21c27d490192c31a._comment @@ -0,0 +1,37 @@ +[[!comment format=mdwn + username="joey" + subject="""design""" + date="2015-11-11T17:54:24Z" + content=""" +* annex.pidlock config setting +* init can test if regular locks work and if not set annex.pidlock +* Use .git/annex/lock as the lock file; create with `O_EXCL` and write pid + and program name to it. (Should be able to check for stale pid and break + old lock.) +* Adapt Utility.LockPool to use that lock file and lock method when + annex.pidlock is set. (How? It's a generic library..) +* Note that for sanity, whenever Utility.LockPool would create a + fine-grained lock file, that should still happen when using + annex.pidlock. Just avoid locking it and use the + global lock. This prevents any bugs along the lines of some code + depending on the fine-grained lock file having been created + (in order to delete it etc). +* (We could possibly assume that, if a lock file is being created, + it could be used as a pid lock file, and so use that instead of the + single top-level lock file. This assumption might hold, but I don't + really want to risk it. If some other code path uses the same lock file + but does not allow it to be created, it would not be able to write the + pid to it (because it might be eg an annex object file), then the two + code paths would end up using different lock files for the same lock, + which would be bad.) + +This will always be an exclusive lock, and a single lock at that, unlike +git-annex's usual fine-grained, often shared locks. But, the LockPool +builds all that stuff at the thread level using STM anyway, so multiple +threads of the same process can still cooperate with shared locks etc. + +Commands that don't need to take any lock (eg, query commands) will +interoperate as before. But, many commands that can normally run +concurrently won't be able to when using annex.pidlock, and will +have to either loop-wait on the lock file, or error out. +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_12_93e7e29ebcbffe8913a0dc216636ae47._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_12_93e7e29ebcbffe8913a0dc216636ae47._comment new file mode 100644 index 000000000..930f99d3c --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_12_93e7e29ebcbffe8913a0dc216636ae47._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4" + subject="comment 12" + date="2015-11-11T19:28:41Z" + content=""" +sounds good to me ;) + +\"write pid and program name to it\" -- may be also a hostname so this could be safe in shared/networked environments... I believe emacs does similar, e.g. for 1.txt it creates a symlink + +.#1.txt -> yoh\@head1.hydra.dartmouth.edu.28757:1441910040 + + +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_13_5a1191042b32a85b4299cff3004d29de._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_13_5a1191042b32a85b4299cff3004d29de._comment new file mode 100644 index 000000000..ae9691cf9 --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_13_5a1191042b32a85b4299cff3004d29de._comment @@ -0,0 +1,41 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 13""" + date="2015-11-13T18:23:37Z" + content=""" +While testing a git-annex that used pid locks, on the Lustre +system I've been given access to, I observed something most +strange: + + link(".git/annex/locktmp12011", ".git/annex/pidlock") = 0 + lstat64(".git/annex/locktmp12011", {st_mode=S_IFREG|0444, st_size=70, ...}) = 0 + lstat64(".git/annex/pidlock", {st_mode=S_IFREG|0444, st_size=70, ...}) = 0 + ... + unlink(".git/annex/pidlock") = 0 + +Seeing that strace, it would make sense that the pidlock file didn't exist, +since a hard link was successfully made by that name, and link() never, +ever overwrites an existing file. The stats of the 2 files are of course +identical since they're hard links. And, since the pidlock is unlinked at +the end, we'd expect the file to be gone then. + +But, none of that has anything to do with the reality. Which was: +The pidlock file already existed, with size=72, and had existed for some +hours at the point the strace begins. The link didn't replace it +at all, and the unlink didn't delete it. When the program exited, +the pidlock file still existed, with contents unaltered. + +All I can guess is happening is that different processes on a Lustre +filesystem, running on the same host, somehow see inconsistent realities. + +I do think that, despite this being completely insane, the locking will +actually work ok, when all git-annex processes in a given repo on Lustre +are running *on the same computer*. That because git-annex actually will +drop a proper lock into a proper filesystem (/dev/shm), and so avoid all +this Lustre nonsense. + +But in general, I can make no warantee express or implied as to the +suitability of Lustre as a platform to use git-annex. If it's this +inconsistent, and modifications made to files are somehow silently rolled +back, anything could happen. +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_14_4dea6eac389bbf5235a3d5d3378e6d04._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_14_4dea6eac389bbf5235a3d5d3378e6d04._comment new file mode 100644 index 000000000..bb58cbeeb --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_14_4dea6eac389bbf5235a3d5d3378e6d04._comment @@ -0,0 +1,38 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 14""" + date="2015-11-13T20:00:48Z" + content=""" +Adding to the crazy Lustre fun, check this out: + + $ ls -l .git/annex/ + total 56 + -rw-rw-r-- 1 hess root 18387 Nov 13 14:35 index + -rw-rw-r-- 1 hess root 41 Nov 13 14:35 index.lck + drwxrwsr-x 2 hess root 12288 Nov 13 14:35 journal + -rw-rw-r-- 1 hess root 0 Nov 13 11:48 journal.lck + drwxrwsr-x 2 hess root 4096 Nov 13 14:35 misctmp + drwxrwsr-x 88 hess root 4096 Nov 13 14:35 objects + -r--r--r-- 1 hess root 70 Nov 13 14:35 pidlock + -r--r--r-- 1 hess root 70 Nov 13 14:35 pidlock + -rw-rw-r-- 1 hess root 0 Nov 13 11:48 sentinal + -rw-rw-r-- 1 hess root 23 Nov 13 11:48 sentinal.cache + +There are 2 pidlock files in that directory listing. 2 files with the same name. +I deleted one of them, and with no other changes, ls shows only 1 now. + + -r--r--r-- 1 hess root 74 Nov 13 14:35 pidlock + +Notice that the file stat has changed too. + +So, Lustre has clearly thrown POSIX out the window, and then defrenstrated +sanity for good measure. + +On the plus side, this may show how I can detect when rename() fails to +preserve POSIX semantics.. + +Update: Indeed, I was able to get git-annex to detect the doubled file +and so know that it can't take the lock. + +I can't guarantee anything, but this is enough to close this bug. +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_1_5dc6b520381a7b26563c641fcc284b31._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_1_5dc6b520381a7b26563c641fcc284b31._comment new file mode 100644 index 000000000..d5031602a --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_1_5dc6b520381a7b26563c641fcc284b31._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4" + subject="FWIW: possibly useful" + date="2015-09-04T13:26:19Z" + content=""" +https://github.com/marcindulak/vagrant-lustre-tutorial +to get env with lustre deployment. yet to figure out user management(see [https://github.com/marcindulak/vagrant-lustre-tutorial/issues/2]) since issue didn't replicate under root, so I guess it is a question of some permissions +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_2_8c8d7ad99de78d282d202c541323a299._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_2_8c8d7ad99de78d282d202c541323a299._comment new file mode 100644 index 000000000..71da557b1 --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_2_8c8d7ad99de78d282d202c541323a299._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2015-09-09T16:19:06Z" + content=""" +This is a POSIX fcntl lock failing on that filesystem. + +git-annex really needs these locks for safe concurrency, including guarding +against situations where data could be lost. + +I wonder if flock locks would be more portable? +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_3_08d950812832acd5aa54287a54fed207._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_3_08d950812832acd5aa54287a54fed207._comment new file mode 100644 index 000000000..b984c6788 --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_3_08d950812832acd5aa54287a54fed207._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4" + subject="comment 3" + date="2015-09-10T18:44:25Z" + content=""" +so far I have failed to replicate this issue on a luster under virtualbox following aforementioned instructions (if you would like, there is a screen available under datalad@smaug to which I believe you should have access to). I guess I will wait for issues associated with standalone builds to get resolved (ref: http://git-annex.branchable.com/bugs/fails_to_addurl_to_file:__47____47____47___in_the_most_recent_snapshot_build/#comment-424388b7369d9f4889afaa56381e4e38) before attempting more tests there. Meanwhile I will seek more information on the problematic lustre setup (versions etc) +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_4_b8c8fac1dc7bd72cfa8a01495c4a5096._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_4_b8c8fac1dc7bd72cfa8a01495c4a5096._comment new file mode 100644 index 000000000..0b766149f --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_4_b8c8fac1dc7bd72cfa8a01495c4a5096._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4" + subject="comment 4" + date="2015-09-24T20:33:51Z" + content=""" +please let me know if you would need an access on the lustre box to troubleshoot this issue (I have failed to replicate in virtualbox) +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_5_e873c82ebc62e0af5051cf36ca084e0a._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_5_e873c82ebc62e0af5051cf36ca084e0a._comment new file mode 100644 index 000000000..c8ec76575 --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_5_e873c82ebc62e0af5051cf36ca084e0a._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2015-09-29T16:38:17Z" + content=""" +AFAICS, you have not managed to reproduce the problem. If there's a way to +get access to a box that does have the problem, my ssh key is `ssh-rsa +AAAAB3NzaC1yc2EAAAADAQABAAABAQC1YoyHxZwG5Eg0yiMTJLSWJ/+dMM6zZkZiR4JJ0iUfP+tT2bm/lxYompbSqBeiCq+PYcSC67mALxp1vfmdOV//LWlbXfotpxtyxbdTcQbHhdz4num9rJQz1tjsOsxTEheX5jKirFNC5OiKhqwIuNydKWDS9qHGqsKcZQ8p+n1g9Lr3nJVGY7eRRXzw/HopTpwmGmAmb9IXY6DC2k91KReRZAlOrk0287LaK3eCe1z0bu7LYzqqS+w99iXZ/Qs0m9OqAPnHZjWQQ0fN4xn5JQpZSJ7sqO38TBAimM+IHPmy2FTNVVn9zGM+vN1O2xr3l796QmaUG1+XLL0shfR/OZbb` + -- or, there are some simple things could be run on that box to check if + eg, fcntl locks work at all. +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_6_f439c7d9491036035d95a4c0abc99123._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_6_f439c7d9491036035d95a4c0abc99123._comment new file mode 100644 index 000000000..032c7245b --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_6_f439c7d9491036035d95a4c0abc99123._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4" + subject="account information was emailed" + date="2015-10-05T23:13:29Z" + content=""" +please let me know if you didn't get it +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_7_1c0352c37ff07a8478d13c12aa72b484._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_7_1c0352c37ff07a8478d13c12aa72b484._comment new file mode 100644 index 000000000..e0f73a82f --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_7_1c0352c37ff07a8478d13c12aa72b484._comment @@ -0,0 +1,65 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2015-10-06T16:34:18Z" + content=""" +I have an account on the system now. + +As expected, it's an fcntl lock failing: + + fcntl64(7, 0xe /* F_??? */, 0xf7680510) = -1 ENOSYS (Function not implemented) + +git-annex init also uses such a lock, so also fails. A standalone C program +that I built on the system used fcntl(), rather than fcntl64() for locking, +and also failed. + + fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=10}) = -1 ENOSYS (Function not implemented) + +flock() locking also fails on this filesystem as does lockf(), +all with ENOSYS. So, I think there's no usable file locking at all. + +I notice this system has an old kernel (2.6.32), and lustre 1.8.9. + +<http://comments.gmane.org/gmane.comp.file-systems.lustre.user/3429> +This thread says that fnctl locking works back to lustre 1.2 or earlier, +but is not enabled by default and needs a -o flock mount option. + +So, I think that would be the first thing to try! + +(Some thoughts on other options below.) + +---- + +The only option on the git-annex side would be to add an option to totally +disable use of locks, which would make it rather unsafe to use. +Or to use only dotlocks (file existence level locks). + +Dotlocks are problimatic. Some of the uses git-annex makes of locking, +like using both shared and exclusive locks on a file to let multiple +concurrent readers, would be very hard to emulate with dotlocks. Also, +dotlocks go stale when processes die, and git-annex uses lots of different +locks in different places, which would be problimatic to clean up in such +a situation. + +I think that it might make sense, if git-annex has to fall back to dotlocks, +to keep the use down to a single top-level dotlock, and only let one git-annex +process run at a time. Instead of trying to replicate the full suite of +fcntl lock file uses with dotlocks. + +(Note that this approach would not allow using the assistant, as it +execs helper git-annex processes to transfer files etc. Otherwise, git-annex +should basically work. Even git annex get -JN would work ok, since git-annex +uses inter-thread locking which will work fine here.) + +---- + +Of course, this assumes that a distributed filesystem, like lustre, +is consistent enough to support the atomic operations needed to use +even simple dotlocks. It might be that git-annex on 2 nodes could +race and both think they successfully took the dotlock, and there +are situations where git-annex could then lose data. + +According to <https://en.wikipedia.org/wiki/Lustre_%28file_system%29#Locking>, +"Access and modification of a Lustre file is completely cache coherent among all of the clients", +so I guess it'd work well enough. +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_8_7bdcfb72d5b9998402317ae6c9fd6046._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_8_7bdcfb72d5b9998402317ae6c9fd6046._comment new file mode 100644 index 000000000..82012ec6a --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_8_7bdcfb72d5b9998402317ae6c9fd6046._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s-#7c0f4" + subject="FWIW" + date="2015-10-30T17:51:09Z" + content=""" +found another user of lustre who wouldn't be able to use annex there -- 'sync' is not an option for them since it has heavy performance hit. + +What does git do? +"""]] diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_9_9f6b04e9f155d289e8330b444585444f._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_9_9f6b04e9f155d289e8330b444585444f._comment new file mode 100644 index 000000000..d7c331bee --- /dev/null +++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre__58___waitToSetLock__58___unsupported_operation___40__Function_not_implemented__41__/comment_9_9f6b04e9f155d289e8330b444585444f._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2015-11-10T15:25:58Z" + content=""" +git uses dot-locks (.git/index.lck). + +I need to understand why multiple Lustre users seem to have it configured +in a way that doesn't allow using the locking capabilities that are built +into it. Maybe there's a good reason they do that; but adding ugly top-level +dot locking to git-annex without a good reason could be a bad mistake. +"""]] |