summaryrefslogtreecommitdiff
path: root/doc/design/assistant/syncing.mdwn
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2012-07-01 21:00:43 -0400
committerGravatar Joey Hess <joey@kitenet.net>2012-07-01 21:00:43 -0400
commit7625319c2c18c1d75a4ba5e4c2819fb0a31641ed (patch)
treec275d60dcf2493a03ed7ad77e7aeb747633fd2c0 /doc/design/assistant/syncing.mdwn
parent397117429c8824bad7e994454a1d9b8e6f4b3b96 (diff)
parent2d2bfe9809f8d8d5862bc12fbe40c2e25b2405a3 (diff)
Merge branch 'master' into assistant
Diffstat (limited to 'doc/design/assistant/syncing.mdwn')
-rw-r--r--doc/design/assistant/syncing.mdwn71
1 files changed, 71 insertions, 0 deletions
diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn
index 50e6fb4f1..5476b56f1 100644
--- a/doc/design/assistant/syncing.mdwn
+++ b/doc/design/assistant/syncing.mdwn
@@ -1,6 +1,37 @@
Once files are added (or removed or moved), need to send those changes to
all the other git clones, at both the git level and the key/value level.
+## action items
+
+* on-disk transfers in progress information files (read/write/enumerate)
+ **done**
+* locking for the files, so redundant transfer races can be detected,
+ and failed transfers noticed **done**
+* transfer info for git-annex-shell (problem: how to add a switch
+ with the necessary info w/o breaking backwards compatability?)
+* update files as transfers proceed. See [[progressbars]]
+ (updating for downloads is easy; for uploads is hard)
+* add Transfer queue TChan
+* enqueue Transfers (Uploads) as new files are added to the annex by
+ Watcher.
+* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by
+ Watcher.
+* add TransferInfo Map to DaemonStatus for tracking transfers in progress.
+* Poll transfer in progress info files for changes (use inotify again!
+ wow! hammer, meet nail..), and update the TransferInfo Map
+* Write basic Transfer handling thread. Multiple such threads need to be
+ able to be run at once. Each will need its own independant copy of the
+ Annex state monad.
+* Write transfer control thread, which decides when to launch transfers.
+* At startup, and possibly periodically, look for files we have that
+ location tracking indicates remotes do not, and enqueue Uploads for
+ them. Also, enqueue Downloads for any files we're missing.
+* Find a way to probe available outgoing bandwidth, to throttle so
+ we don't bufferbloat the network to death.
+* git-annex needs a simple speed control knob, which can be plumbed
+ through to, at least, rsync. A good job for an hour in an
+ airport somewhere.
+
## git syncing
1. Can use `git annex sync`, which already handles bidirectional syncing.
@@ -45,6 +76,46 @@ and with appropriate rate limiting and control facilities.
This probably will need lots of refinements to get working well.
+### first pass: flood syncing
+
+Before mapping the network, the best we can do is flood all files out to every
+reachable remote. This is worth doing first, since it's the simplest way to
+get the basic functionality of the assistant to work. And we'll need this
+anyway.
+
+### transfer tracking
+
+* Upload added to queue by the watcher thread when it adds content.
+* Download added to queue by the watcher thread when it seens new symlinks
+ that lack content.
+* Transfer threads started/stopped as necessary to move data.
+ (May sometimes want multiple threads downloading, or uploading, or even both.)
+
+ type TransferQueue = TChan [Transfer]
+ -- add (M.Map Transfer TransferInfo) to DaemonStatus
+
+ startTransfer :: Transfer -> Annex TransferID
+
+ stopTransfer :: TransferID -> IO ()
+
+The assistant needs to find out when `git-annex-shell` is receiving or
+sending (triggered by another remote), so it can add data for those too.
+This is important to avoid uploading content to a remote that is already
+downloading it from us, or vice versa, as well as to in future let the web
+app manage transfers as user desires.
+
+For files being received, it can see the temp file, but other than lsof
+there's no good way to find the pid (and I'd rather not kill blindly).
+
+For files being sent, there's no filesystem indication. So git-annex-shell
+(and other git-annex transfer processes) should write a status file to disk.
+
+Can use file locking on these status files to claim upload/download rights,
+which will avoid races.
+
+This status file can also be updated periodically to show amount of transfer
+complete (necessary for tracking uploads).
+
## other considerations
This assumes the network is connected. It's often not, so the