diff options
author | Joey Hess <joeyh@joeyh.name> | 2017-10-17 14:50:48 -0400 |
---|---|---|
committer | Joey Hess <joeyh@joeyh.name> | 2017-10-17 17:10:50 -0400 |
commit | f31dbb13cad2e8e1b29180fff755026256eabd57 (patch) | |
tree | 24fb7615a56bd81b56facbb14f2faa98e0e687b3 /Annex/Transfer.hs | |
parent | 461d559ee32033266c253f9d8e004664258efed1 (diff) |
Improve behavior when -J transfers multiple files that point to the same key
After a false start, I found a fairly non-intrusive way to deal with it.
Although it only handles transfers -- there may be issues with eg
concurrent dropping of the same key, or other operations.
There is no added overhead when -J is not used, other than an added
inAnnex check. When -J is used, it has to maintain and check a small
Set, which should be negligible overhead.
It could output some message saying that the transfer is being done by
another thread. Or it could even display the same progress info for both
files that are being downloaded since they have the same content. But I
opted to keep it simple, since this is rather an edge case, so it just
doesn't say anything about the transfer of the file until the other
thread finishes.
Since the deferred transfer action still runs, actions that do more than
transfer content will still get a chance to do their other work. (An
example of something that needs to do such other work is P2P.Annex,
where the download always needs to receive the content from the peer.)
And, if the first thread fails to complete a transfer, the second thread
can resume it.
But, this unfortunately means that there's a risk of redundant work
being done to transfer a key that just got transferred.
That's not ideal, but should never cause breakage; the same
thing can occur when running two separate git-annex processes.
The get/move/copy/mirror --from commands had extra inAnnex checks added,
inside the download actions. Without those checks, the first thread
downloaded the content, and then the second thread woke up and
downloaded the same content redundantly.
move/copy/mirror --to is left doing redundant uploads for now. It
would need a second checkPresent of the remote inside the upload
to avoid them, which would be expensive. A better way to avoid
redundant work needs to be found..
This commit was supported by the NSF-funded DataLad project.
Diffstat (limited to 'Annex/Transfer.hs')
-rw-r--r-- | Annex/Transfer.hs | 56 |
1 files changed, 38 insertions, 18 deletions
diff --git a/Annex/Transfer.hs b/Annex/Transfer.hs index 3fcf1a1b9..35294ba2b 100644 --- a/Annex/Transfer.hs +++ b/Annex/Transfer.hs @@ -32,7 +32,9 @@ import qualified Types.Remote as Remote import Types.Concurrency import Control.Concurrent +import Control.Concurrent.STM import qualified Data.Map.Strict as M +import qualified Data.Set as S import Data.Ord class Observable a where @@ -89,22 +91,23 @@ alwaysRunTransfer :: Observable v => Transfer -> AssociatedFile -> RetryDecider alwaysRunTransfer = runTransfer' True runTransfer' :: Observable v => Bool -> Transfer -> AssociatedFile -> RetryDecider -> (MeterUpdate -> Annex v) -> Annex v -runTransfer' ignorelock t afile shouldretry transferaction = checkSecureHashes t $ do - info <- liftIO $ startTransferInfo afile - (meter, tfile, metervar) <- mkProgressUpdater t info - mode <- annexFileMode - (lck, inprogress) <- prep tfile mode info - if inprogress && not ignorelock - then do - showNote "transfer already in progress, or unable to take transfer lock" - return observeFailure - else do - v <- retry info metervar $ transferaction meter - liftIO $ cleanup tfile lck - if observeBool v - then removeFailedTransfer t - else recordFailedTransfer t info - return v +runTransfer' ignorelock t afile shouldretry transferaction = + checkSecureHashes t $ currentProcessTransfer t $ do + info <- liftIO $ startTransferInfo afile + (meter, tfile, metervar) <- mkProgressUpdater t info + mode <- annexFileMode + (lck, inprogress) <- prep tfile mode info + if inprogress && not ignorelock + then do + showNote "transfer already in progress, or unable to take transfer lock" + return observeFailure + else do + v <- handleretry info metervar $ transferaction meter + liftIO $ cleanup tfile lck + if observeBool v + then removeFailedTransfer t + else recordFailedTransfer t info + return v where #ifndef mingw32_HOST_OS prep tfile mode info = catchPermissionDenied (const prepfailed) $ do @@ -153,7 +156,7 @@ runTransfer' ignorelock t afile shouldretry transferaction = checkSecureHashes t dropLock lockhandle void $ tryIO $ removeFile lck #endif - retry oldinfo metervar run = do + handleretry oldinfo metervar run = do v <- tryNonAsync run case v of Right b -> return b @@ -162,7 +165,7 @@ runTransfer' ignorelock t afile shouldretry transferaction = checkSecureHashes t b <- getbytescomplete metervar let newinfo = oldinfo { bytesComplete = Just b } if shouldretry oldinfo newinfo - then retry newinfo metervar run + then handleretry newinfo metervar run else return observeFailure getbytescomplete metervar | transferDirection t == Upload = @@ -256,3 +259,20 @@ lessActiveFirst :: M.Map Remote Integer -> Remote -> Remote -> Ordering lessActiveFirst active a b | Remote.cost a == Remote.cost b = comparing (`M.lookup` active) a b | otherwise = compare a b + +{- Runs a transfer action. Only one thread can run for a given Transfer + - at a time; other threads will block. -} +currentProcessTransfer :: Transfer -> Annex a -> Annex a +currentProcessTransfer t a = go =<< Annex.getState Annex.concurrency + where + go NonConcurrent = a + go (Concurrent _) = do + tv <- Annex.getState Annex.currentprocesstransfers + bracket_ (setup tv) (cleanup tv) a + setup tv = liftIO $ atomically $ do + s <- readTVar tv + if S.member t s + then retry + else writeTVar tv $! S.insert t s + cleanup tv = liftIO $ atomically $ + modifyTVar' tv $ S.delete t |