From 4930c4c4c29d61e259a8dd5c519d4f9e78664bc8 Mon Sep 17 00:00:00 2001
From: Joey Hess <joeyh@joeyh.name>
Date: Tue, 6 Oct 2015 13:30:58 -0400
Subject: comment

---
 ...ent_7_1c0352c37ff07a8478d13c12aa72b484._comment | 65 ++++++++++++++++++++++
 1 file changed, 65 insertions(+)
 create mode 100644 doc/bugs/git-annex_doesn__39__t_work_on_lustre:_waitToSetLock:_unsupported_operation___40__Function_not_implemented__41__/comment_7_1c0352c37ff07a8478d13c12aa72b484._comment

(limited to 'doc')

diff --git a/doc/bugs/git-annex_doesn__39__t_work_on_lustre:_waitToSetLock:_unsupported_operation___40__Function_not_implemented__41__/comment_7_1c0352c37ff07a8478d13c12aa72b484._comment b/doc/bugs/git-annex_doesn__39__t_work_on_lustre:_waitToSetLock:_unsupported_operation___40__Function_not_implemented__41__/comment_7_1c0352c37ff07a8478d13c12aa72b484._comment
new file mode 100644
index 000000000..e0f73a82f
--- /dev/null
+++ b/doc/bugs/git-annex_doesn__39__t_work_on_lustre:_waitToSetLock:_unsupported_operation___40__Function_not_implemented__41__/comment_7_1c0352c37ff07a8478d13c12aa72b484._comment
@@ -0,0 +1,65 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 7"""
+ date="2015-10-06T16:34:18Z"
+ content="""
+I have an account on the system now.
+
+As expected, it's an fcntl lock failing:
+
+	fcntl64(7, 0xe /* F_??? */, 0xf7680510) = -1 ENOSYS (Function not implemented)
+
+git-annex init also uses such a lock, so also fails. A standalone C program
+that I built on the system used fcntl(), rather than fcntl64() for locking,
+and also failed.
+
+	fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=10}) = -1 ENOSYS (Function not implemented)
+
+flock() locking also fails on this filesystem as does lockf(),
+all with ENOSYS. So, I think there's no usable file locking at all.
+
+I notice this system has an old kernel (2.6.32), and lustre 1.8.9.
+
+<http://comments.gmane.org/gmane.comp.file-systems.lustre.user/3429>  
+This thread says that fnctl locking works back to lustre 1.2 or earlier,
+but is not enabled by default and needs a -o flock mount option.
+
+So, I think that would be the first thing to try! 
+
+(Some thoughts on other options below.)
+
+----
+
+The only option on the git-annex side would be to add an option to totally
+disable use of locks, which would make it rather unsafe to use.
+Or to use only dotlocks (file existence level locks).
+
+Dotlocks are problimatic. Some of the uses git-annex makes of locking,
+like using both shared and exclusive locks on a file to let multiple
+concurrent readers, would be very hard to emulate with dotlocks. Also,
+dotlocks go stale when processes die, and git-annex uses lots of different
+locks in different places, which would be problimatic to clean up in such
+a situation.
+
+I think that it might make sense, if git-annex has to fall back to dotlocks,
+to keep the use down to a single top-level dotlock, and only let one git-annex
+process run at a time. Instead of trying to replicate the full suite of
+fcntl lock file uses with dotlocks.
+
+(Note that this approach would not allow using the assistant, as it
+execs helper git-annex processes to transfer files etc. Otherwise, git-annex
+should basically work. Even git annex get -JN would work ok, since git-annex
+uses inter-thread locking which will work fine here.)
+
+----
+
+Of course, this assumes that a distributed filesystem, like lustre,
+is consistent enough to support the atomic operations needed to use
+even simple dotlocks. It might be that git-annex on 2 nodes could
+race and both think they successfully took the dotlock, and there
+are situations where git-annex could then lose data.
+
+According to <https://en.wikipedia.org/wiki/Lustre_%28file_system%29#Locking>,
+"Access and modification of a Lustre file is completely cache coherent among all of the clients",
+so I guess it'd work well enough.
+"""]]
-- 
cgit v1.2.3