diff options
author | Joey Hess <joey@kitenet.net> | 2012-08-24 12:17:24 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2012-08-24 12:17:24 -0400 |
commit | bc6eaa4ebb61a3278201d0a3de939b00dfb48453 (patch) | |
tree | eb29d6d2d61c502a7d178d59357a3e6902ac4081 | |
parent | b985e0b7ec0e5ba23e0d36c4bcab0d1760e25676 (diff) | |
parent | 199cedf978c999201d3541003d3cffd4b0a30b77 (diff) |
Merge remote-tracking branch 'origin/master'
5 files changed, 115 insertions, 3 deletions
diff --git a/doc/design/assistant/blog/day_61__network_connection_detection.mdwn b/doc/design/assistant/blog/day_61__network_connection_detection.mdwn new file mode 100644 index 000000000..8ab40f516 --- /dev/null +++ b/doc/design/assistant/blog/day_61__network_connection_detection.mdwn @@ -0,0 +1,36 @@ +Today, added a thread that deals with recovering when there's been a loss +of network connectivity. When the network's down, the normal immediate +syncing of changes of course doesn't work. So this thread detects when the +network comes back up, and does a pull+push to network remotes, and +triggers scanning for file content that needs to be transferred. + +I used dbus again, to detect events generated by both network-manager and +wicd when they've sucessfully brought an interface up. Or, if they're not +available, it polls every 30 minutes. + +When the network comes up, in addition to the git pull+push, it also +currently does a full scan of the repo to find files whose contents +need to be transferred to get fully back into sync. + +I think it'll be ok for some git pulls and pushes to happen when +moving to a new network, or resuming a laptop (or every 30 minutes when +resorting to polling). But the transfer scan is currently really too heavy +to be appropriate to do every time in those situations. I have an idea for +avoiding that scan when the remote's git-annex branch has not changed. But +I need to refine it, to handle cases like this: + +1. a new remote is added +2. file contents start being transferred to (or from it) +3. the network is taken down +4. all the queued transfers fail +5. the network comes back up +6. the transfer scan needs to know the remote was not all in sync + before #3, and so should do a full scan despite the git-annex branch + not having changed + +--- + +Doubled the ram in my netbook, which I use for all development. Yesod needs +rather a lot of ram to compile and link, and this should make me quite a +lot more productive. I was struggling with OOM killing bits of chromium +during my last week of development. diff --git a/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment b/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment new file mode 100644 index 000000000..029aec783 --- /dev/null +++ b/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawmubB1Sj2rwFoVdZYvGV0ACaQUJQyiJXJI" + nickname="Paul" + subject="Amazon Glacier" + date="2012-08-23T06:32:24Z" + content=""" +Do you think git-annex could support [Amazon Glacier](http://aws.amazon.com/glacier/) as a backend? +"""]] diff --git a/doc/design/assistant/blog/day_62__smarter_syncing.mdwn b/doc/design/assistant/blog/day_62__smarter_syncing.mdwn new file mode 100644 index 000000000..28fa892d3 --- /dev/null +++ b/doc/design/assistant/blog/day_62__smarter_syncing.mdwn @@ -0,0 +1,21 @@ +Woke up this morning with most of the design for a smarter approach to +[[syncing]] in my head. (This is why I sometimes slip up and tell people I +work on this project 12 hours a day..) + +To keep the current `assistant` branch working while I make changes +that break use cases that are working, I've started +developing in a new branch, `assistant-wip`. + +In it, I've started getting rid of unnecessary expensive transfer scans. + +First optimisation I've done is to detect when a remote that was +disconnected has diverged its `git-annex` branch from the local branch. +Only when that's the case does a new transfer scan need to be done, to find +out what new stuff might be available on that remote, to have caused the +change to its branch, while it was disconnected. + +That broke a lot of stuff. I have a plan to fix it written down in +[[syncing]]. It'll involve keeping track of whether a transfer scan has +ever been done (if not, one should be run), and recording logs when +transfers failed, so those failed transfers can be retried when the +remote gets reconnected. diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 3aeb76afc..83c5e9d22 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -3,9 +3,42 @@ all the other git clones, at both the git level and the key/value level. ## immediate action items -* At startup, and possibly periodically, or when the network connection - changes, or some heuristic suggests that a remote was disconnected from - us for a while, queue remotes for processing by the TransferScanner. +* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily + broke content syncing in some situations, which need to be added back. + + Now syncing a disconnected remote only starts a transfer scan if the + remote's git-annex branch has diverged, which indicates it probably has + new files. But that leaves open the cases where the local repo has + new files; and where the two repos git branches are in sync, but the + content transfers are lagging behind; and where the transfer scan has + never been run. + + Need to track locally whether we're believed to be in sync with a remote. + This includes: + * All local content has been transferred to it successfully. + * The remote has been scanned once for data to transfer from it, and all + transfers initiated by that scan succeeded. + + Note the complication that, if it's initiated a transfer, our queued + transfer will be thrown out as unnecessary. But if its transfer then + fails, that needs to be noticed. + + If we're going to track failed transfers, we could just set a flag, + and use that flag later to initiate a new transfer scan. We need a flag + in any case, to ensure that a transfer scan is run for each new remote. + The flag could be `.git/annex/transfer/scanned/uuid`. + + But, if failed transfers are tracked, we could also record them, in + order to retry them later, without the scan. I'm thinking about a + directory like `.git/annex/transfer/failed/{upload,download}/uuid/`, + which failed transfer log files could be moved to. + + Note that a remote may lose content it had before, so when requeuing + a failed download, should check the location log to see if it still has + the content, and if not, queue a download from elsewhere. (And, a remote + may get content we were uploading from elsewhere, so check the location + log when queuing a failed Upload too.) + * Ensure that when a remote receives content, and updates its location log, it syncs that update back out. Prerequisite for: * After git sync, identify new content that we don't have that is now available @@ -43,6 +76,10 @@ all the other git clones, at both the git level and the key/value level. that need to be done to sync with a remote. Currently it walks the git working copy and checks each file. +## misc todo + +* --debug will show often unnecessary work being done. Optimise. + ## data syncing There are two parts to data syncing. First, map the network and second, @@ -157,3 +194,5 @@ redone to check it. finishes. **done** * Test MountWatcher on KDE, and add whatever dbus events KDE emits when drives are mounted. **done** +* It would be nice if, when a USB drive is connected, + syncing starts automatically. Use dbus on Linux? **done** diff --git a/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository__63__/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment b/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository__63__/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment new file mode 100644 index 000000000..7a0054c49 --- /dev/null +++ b/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository__63__/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/speredenn#aaf38" + nickname="Jean-Baptiste Carré" + subject="comment 3" + date="2012-08-21T18:15:48Z" + content=""" +You're totally right: The UUIDs are the same. So it shouldn't matter if there are many repositories pointing to the same folder, as you state it. Thanks a lot! +"""]] |