summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2017-03-06 13:32:47 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2017-03-06 13:32:47 -0400
commitac0dcdeeeb9f970419b9cee571e6438bc93b0785 (patch)
treefccef7d2ce120bc57cae1c2cfac4c59a4f41ce88
parentb810f6903b7941f82b180f6399025b4266ee1c57 (diff)
assistant: Add 1/200th second delay between checking each file in the full transfer scan, to avoid using too much CPU.
The slowdown is not going to be large in typical small-ish repos. And it does not seem to matter if the assistant reacts a little bit slower in situations involving the expensive scan, since: a) Those situations typically involve getting back in sync after something has changed on a remote, often after a disconnect of some duration. So taking a few seconds more is not noticable. b) If the scan finds things that it needs to do, it will start blocking anyway after 10 transfers are queued (due to use of queueTransferWhenSmall). So, only the speed of finding the first 10 transfers will be impacted by this change. This commit was sponsored by Jochen Bartl on Patreon.
-rw-r--r--Assistant/Threads/TransferScanner.hs6
-rw-r--r--CHANGELOG2
-rw-r--r--doc/forum/__34__Scanning_for_files_to_transfer__34__/comment_1_4fa6d0d5264707886e1e9b5184090386._comment36
3 files changed, 44 insertions, 0 deletions
diff --git a/Assistant/Threads/TransferScanner.hs b/Assistant/Threads/TransferScanner.hs
index a55a3496e..2128ce969 100644
--- a/Assistant/Threads/TransferScanner.hs
+++ b/Assistant/Threads/TransferScanner.hs
@@ -25,6 +25,7 @@ import qualified Types.Remote as Remote
import Utility.ThreadScheduler
import Utility.NotificationBroadcaster
import Utility.Batch
+import Utility.ThreadScheduler
import qualified Git.LsFiles as LsFiles
import Annex.WorkTree
import Annex.Content
@@ -32,6 +33,7 @@ import Annex.Wanted
import CmdLine.Action
import qualified Data.Set as S
+import Control.Concurrent
{- This thread waits until a remote needs to be scanned, to find transfers
- that need to be made, to keep data in sync.
@@ -145,6 +147,10 @@ expensiveScan urlrenderer rs = batch <~> do
(findtransfers f unwanted)
=<< liftAnnex (lookupFile f)
mapM_ (enqueue f) ts
+
+ {- Delay for a short time to avoid using too much CPU. -}
+ liftIO $ threadDelay $ fromIntegral $ oneSecond `div` 200
+
scan unwanted' fs
enqueue f (r, t) =
diff --git a/CHANGELOG b/CHANGELOG
index 524ad53c2..d6297bb79 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -5,6 +5,8 @@ git-annex (6.20170301.2) UNRELEASED; urgency=medium
* status: Propigate nonzero exit code from git status.
* Linux standalone builds put the bundled ssh last in PATH,
so any system ssh will be preferred over it.
+ * assistant: Add 1/200th second delay between checking each file
+ in the full transfer scan, to avoid using too much CPU.
-- Joey Hess <id@joeyh.name> Thu, 02 Mar 2017 12:51:40 -0400
diff --git a/doc/forum/__34__Scanning_for_files_to_transfer__34__/comment_1_4fa6d0d5264707886e1e9b5184090386._comment b/doc/forum/__34__Scanning_for_files_to_transfer__34__/comment_1_4fa6d0d5264707886e1e9b5184090386._comment
new file mode 100644
index 000000000..cab618d63
--- /dev/null
+++ b/doc/forum/__34__Scanning_for_files_to_transfer__34__/comment_1_4fa6d0d5264707886e1e9b5184090386._comment
@@ -0,0 +1,36 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 1"""
+ date="2017-03-06T17:03:12Z"
+ content="""
+The scan that is skipped is one of the files on disk in order to find
+changes that were made while the assistant was not running.
+
+What you are seeing is the full transfer scan. While annex.startupscan
+could be made to also skip that scan, a full transfer scan is not only run
+at startup, but after merging git-annex branch changes from a remote. So
+disabling it only at startup does not seem very useful.
+
+There could be an option to disable the full transfer scan ever running.
+However, this would make the assistant not notice certian transfers/drops
+that you would normally want it to do. For example, if a remote got a bunch
+of files in an archive/ directory from somewhere else, and the local
+repository contains those files, the full transfer scan is needed to notice
+that the archived files can now be removed from the local repository.
+In other situations, the local repository would not get files that it
+ought to contain.
+
+So, I think it might be better to make the expensive transfer scan run a
+little bit slower so it doesn't peg your CPU. I've added a 1/200th second
+delay after each file it checks.
+
+That will make it use something like
+5-10% of the CPU, instead of 100%. At the same time it doesn't slow down the
+total scan very much. In a repository with 5k files, it makes the scan 25
+seconds slower, which makes the assistant react that much slower -- but
+the expensive scan is only needed to make sure things turn out consistent,
+so its overall speed is not super important.
+
+Check it out, let me know if it's still using too much CPU. We could always
+make that 1/200th second tunable, or find a better value for it.
+"""]]