diff options
author | Joey Hess <joey@kitenet.net> | 2012-08-24 12:17:44 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2012-08-24 12:17:44 -0400 |
commit | 13fa141cd3a47659b8fd5759f86bc538b893d17b (patch) | |
tree | d6ad9a9b49027f12d40a82eec5b421433542f56a | |
parent | bc6eaa4ebb61a3278201d0a3de939b00dfb48453 (diff) | |
parent | d25f407e6767c8ce9214fcc7c503178cfa3fa9f5 (diff) |
Merge branch 'master' of ssh://git-annex.branchable.com
5 files changed, 91 insertions, 40 deletions
diff --git a/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn b/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn index e15529c64..883c53d36 100644 --- a/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn +++ b/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn @@ -22,3 +22,9 @@ The original file also has sha512 ead9db1f34739014a216239d9624bce74d92fe723de065 >> And what sha512 does the file in .git/annex/bad have **now**? (fsck >> preserves the original filename; this says nothing about what the >> current checksum is, if the file has been corrupted). --[[Joey]] + +The same, as it's the file I was trying to inject: + +ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d .git/annex/bad/SHA512E-s94402560--ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d.Moon.avi + +That's what puzzles me, it is the same file, but for some weird reason git annex thinks it's not. diff --git a/doc/design/assistant/blog/day_63__transfer_retries.mdwn b/doc/design/assistant/blog/day_63__transfer_retries.mdwn new file mode 100644 index 000000000..d668f507b --- /dev/null +++ b/doc/design/assistant/blog/day_63__transfer_retries.mdwn @@ -0,0 +1,26 @@ +Implemented everything I planned out yesterday: Expensive scans are only +done once per remote (unless the remote changed while it was disconnected), +and failed transfers are logged so they can be retried later. + +Changed the TransferScanner to prefer to scan low cost remotes first, +as a crude form of scheduling lower-cost transfers first. + +A whole bunch of interesting syncing scenarios should work now. I have not +tested them all in detail, but to the best of my knowledge, all these +should work: + +* Connect to the network. It starts syncing with a networked remote. + Disconnect the network. Reconnect, and it resumes where it left off. +* Migrate between networks (ie, home to cafe to work). Any transfers + that can only happen on one LAN are retried on each new network you + visit, until they succeed. + +One that is not working, but is soooo close: + +* Plug in a removable drive. Some transfers start. Yank the plug. + Plug it back in. All necessary transfers resume, and it ends up + fully in sync, no matter how many times you yank that cable. + +That's not working because of an infelicity in the MountWatcher. +It doesn't notice when the drive gets unmounted, so it ignores +the new mount event. diff --git a/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment b/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment new file mode 100644 index 000000000..119aee2c9 --- /dev/null +++ b/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo" + nickname="Justin" + subject="comment 1" + date="2012-08-23T21:25:48Z" + content=""" +Do encrypted rsync remotes resume quickly as well? + +One thing I noticed was that if a copy --to an encrypted rsync remote gets interrupted it will remove the tmp file and re-encrypt the whole file before resuming rsync. +"""]] diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 83c5e9d22..071ea2730 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -3,42 +3,12 @@ all the other git clones, at both the git level and the key/value level. ## immediate action items -* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily - broke content syncing in some situations, which need to be added back. - - Now syncing a disconnected remote only starts a transfer scan if the - remote's git-annex branch has diverged, which indicates it probably has - new files. But that leaves open the cases where the local repo has - new files; and where the two repos git branches are in sync, but the - content transfers are lagging behind; and where the transfer scan has - never been run. - - Need to track locally whether we're believed to be in sync with a remote. - This includes: - * All local content has been transferred to it successfully. - * The remote has been scanned once for data to transfer from it, and all - transfers initiated by that scan succeeded. - - Note the complication that, if it's initiated a transfer, our queued - transfer will be thrown out as unnecessary. But if its transfer then - fails, that needs to be noticed. - - If we're going to track failed transfers, we could just set a flag, - and use that flag later to initiate a new transfer scan. We need a flag - in any case, to ensure that a transfer scan is run for each new remote. - The flag could be `.git/annex/transfer/scanned/uuid`. - - But, if failed transfers are tracked, we could also record them, in - order to retry them later, without the scan. I'm thinking about a - directory like `.git/annex/transfer/failed/{upload,download}/uuid/`, - which failed transfer log files could be moved to. - - Note that a remote may lose content it had before, so when requeuing - a failed download, should check the location log to see if it still has +* Fix MountWatcher to notice umounts and remounts of drives. +* A remote may lose content it had before, so when requeuing + a failed download, check the location log to see if the remote still has the content, and if not, queue a download from elsewhere. (And, a remote may get content we were uploading from elsewhere, so check the location log when queuing a failed Upload too.) - * Ensure that when a remote receives content, and updates its location log, it syncs that update back out. Prerequisite for: * After git sync, identify new content that we don't have that is now available @@ -67,18 +37,17 @@ all the other git clones, at both the git level and the key/value level. files in some directories and not others. See for use cases: [[forum/Wishlist:_options_for_syncing_meta-data_and_data]] * speed up git syncing by using the cached ssh connection for it too - (will need to use `GIT_SSH`, which needs to point to a command to run, - not a shell command line) + Will need to use `GIT_SSH`, which needs to point to a command to run, + not a shell command line. Beware that the network connection may have + bounced and the cached ssh connection not be usable. * Map the network of git repos, and use that map to calculate optimal transfers to keep the data in sync. Currently a naive flood fill is done instead. * Find a more efficient way for the TransferScanner to find the transfers that need to be done to sync with a remote. Currently it walks the git - working copy and checks each file. - -## misc todo - -* --debug will show often unnecessary work being done. Optimise. + working copy and checks each file. That probably needs to be done once, + but further calls to the TransferScanner could eg, look at the delta + between the last scan and the current one in the git-annex branch. ## data syncing @@ -196,3 +165,33 @@ redone to check it. drives are mounted. **done** * It would be nice if, when a USB drive is connected, syncing starts automatically. Use dbus on Linux? **done** +* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily + broke content syncing in some situations, which need to be added back. + **done** + + Now syncing a disconnected remote only starts a transfer scan if the + remote's git-annex branch has diverged, which indicates it probably has + new files. But that leaves open the cases where the local repo has + new files; and where the two repos git branches are in sync, but the + content transfers are lagging behind; and where the transfer scan has + never been run. + + Need to track locally whether we're believed to be in sync with a remote. + This includes: + * All local content has been transferred to it successfully. + * The remote has been scanned once for data to transfer from it, and all + transfers initiated by that scan succeeded. + + Note the complication that, if it's initiated a transfer, our queued + transfer will be thrown out as unnecessary. But if its transfer then + fails, that needs to be noticed. + + If we're going to track failed transfers, we could just set a flag, + and use that flag later to initiate a new transfer scan. We need a flag + in any case, to ensure that a transfer scan is run for each new remote. + The flag could be `.git/annex/transfer/scanned/uuid`. + + But, if failed transfers are tracked, we could also record them, in + order to retry them later, without the scan. I'm thinking about a + directory like `.git/annex/transfer/failed/{upload,download}/uuid/`, + which failed transfer log files could be moved to. diff --git a/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment b/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment new file mode 100644 index 000000000..742dbedc2 --- /dev/null +++ b/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawnY9ObrNrQuRp8Xs0XvdtJJssm5cp4NMZA" + nickname="alan" + subject="Rackspace Cloud Files support?" + date="2012-08-23T21:00:11Z" + content=""" +Any chance I could bribe you to setup Rackspace Cloud Files support? We are using them and would hate to have a S3 bucket only for this. + +https://github.com/rackspace/python-cloudfiles +"""]] |