summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2016-10-26 13:16:41 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2016-10-26 13:16:41 -0400
commit1a9da6bbe06818822e90018c9c5c3ba3146c25f9 (patch)
tree303a39a4d65204a2e0aa3813c6d4c63e6bb71ee2
parent3b4b4e5043e6d2575971577be50893d3f9a9549b (diff)
Improve ssh socket cleanup code to skip over the cruft that NFS sometimes puts in a directory when a file is being deleted.
-rw-r--r--Annex/Ssh.hs13
-rw-r--r--CHANGELOG2
-rw-r--r--doc/bugs/git-annex__58___content_is_locked__while_trying_to_move_under_NFS_and_pidlock/comment_13_05687dca2462eb79872879bd7695d665._comment51
3 files changed, 64 insertions, 2 deletions
diff --git a/Annex/Ssh.hs b/Annex/Ssh.hs
index 10efa9f9e..4f879436b 100644
--- a/Annex/Ssh.hs
+++ b/Annex/Ssh.hs
@@ -136,11 +136,20 @@ prepSocket socketfile = do
liftIO $ createDirectoryIfMissing True $ parentDir socketfile
lockFileCached $ socket2lock socketfile
+{- Find ssh socket files.
+ -
+ - The check that the lock file exists makes only socket files
+ - that were set up by prepSocket be found. On some NFS systems,
+ - a deleted socket file may linger for a while under another filename;
+ - and this check makes such files be skipped since the corresponding lock
+ - file won't exist.
+ -}
enumSocketFiles :: Annex [FilePath]
-enumSocketFiles = go =<< sshCacheDir
+enumSocketFiles = liftIO . go =<< sshCacheDir
where
go Nothing = return []
- go (Just dir) = liftIO $ filter (not . isLock)
+ go (Just dir) = filterM (doesFileExist . socket2lock)
+ =<< filter (not . isLock)
<$> catchDefaultIO [] (dirContents dir)
{- Stop any unused ssh connection caching processes. -}
diff --git a/CHANGELOG b/CHANGELOG
index 4fe08d537..d91b79c63 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -10,6 +10,8 @@ git-annex (6.20161013) UNRELEASED; urgency=medium
* Assistant, repair: Improved filtering out of git fsck lines about
duplicate file entries in tree objects.
* test: Deal with gpg-agent behavior change that broke the test suite.
+ * Improve ssh socket cleanup code to skip over the cruft that
+ NFS sometimes puts in a directory when a file is being deleted.
-- Joey Hess <id@joeyh.name> Mon, 17 Oct 2016 12:46:54 -0400
diff --git a/doc/bugs/git-annex__58___content_is_locked__while_trying_to_move_under_NFS_and_pidlock/comment_13_05687dca2462eb79872879bd7695d665._comment b/doc/bugs/git-annex__58___content_is_locked__while_trying_to_move_under_NFS_and_pidlock/comment_13_05687dca2462eb79872879bd7695d665._comment
new file mode 100644
index 000000000..64df7bc57
--- /dev/null
+++ b/doc/bugs/git-annex__58___content_is_locked__while_trying_to_move_under_NFS_and_pidlock/comment_13_05687dca2462eb79872879bd7695d665._comment
@@ -0,0 +1,51 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 13"""
+ date="2016-10-26T16:34:38Z"
+ content="""
+Looking at the strace finally...
+
+Here's git-annex creating the dummy posix lock file, soon after
+the ssh socket is removed. (I think that's ssh removing the socket before,
+but the jumble of strace output is hard to follow here.)
+
+ 2456732 unlink(".git/annex/ssh/smaug" <unfinished ...>
+ ...
+ 2456732 open(".git/annex/ssh/.nfs00000000099d85d000000002.lock", O_RDONLY|O_CREAT, 0666 <unfinished ...>
+
+And later ssh is told to stop using a related socket:
+
+ 2456815 execve("/mnt/btrfs/scrap/datalad/test_nfs/git-annex.linux/bin/ssh", ["ssh", "-O", "stop", "-S", ".nfs00000000099d85d000000002", "-o", "ControlMaster=auto", "-o", "ControlPersist=yes", "localhost"], [/* 98 vars */] <unfinished ...>
+
+Which it seems does not exist by then:
+
+ 2456815 connect(3, {sa_family=AF_LOCAL, sun_path=".nfs00000000099d85d000000002"}, 31) = -1 ENOENT (No such file or directory)
+ 2456732 unlink(".git/annex/ssh/.nfs00000000099d85d000000002") = -1 ENOENT (No such file or directory)
+
+Seems like this would have to be `sshCleanup` running, and `enumSocketFiles`
+seeing ".nfs00000000099d85d000000002" existing at that point due
+probably to the ssh/smaug socket being in the process of being
+removed.
+
+So, it assumes it's a socket file and tries to clean it up, creating
+the dummy posix lock file. And posix lock files don't get deleted
+(it's impossible to do so in a race-free way), so this results in more
+and more `.nfs*.lock` files.
+
+Need to find a way to prevent `enumSocketFiles` from listing these
+files. Seems that ".nfs00000000099d85d000000002" is probably the deleted
+form of the "smaug" socket, so it really is a socket file, so checking
+if a file is a socket won't help. Of course it could
+ explicitly filter out ".nfs" prefixed files. That could be a case of
+kicking the can down the road, since NFS could always choose to use
+another name for these files.
+
+Hmm.. Before a ssh socket file gets created, git-annex always locks the
+associated lock file. And as noted, the posix lock file is never
+removed. (On Windows the lock file is also created and never deleted.)
+So, if $file does not have an associated $file.lock
+then $file must not be a ssh socket, and `enumSocketFiles` can skip it.
+
+I've added that check. Unable to test it because the NFS mount has
+been lost in the intervening time. @yoh can you test this fixed it?
+"""]]