From 1f09ae686ef35f8fd2d973754f8e1efd99161f4a Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 27 Jun 2012 21:11:39 -0400 Subject: update --- doc/design/assistant/syncing.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'doc/design/assistant/syncing.mdwn') diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 3e90e6b10..8b681ac10 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -17,7 +17,7 @@ all the other git clones, at both the git level and the key/value level. 1. Also, detect if a push failed due to not being up-to-date, pull, and repush. **done** 2. Use a git merge driver that adds both conflicting files, - so conflicts never break a sync. + so conflicts never break a sync. **done** 3. Investigate the XMPP approach like dvcs-autosync does, or other ways of signaling a change out of band. 4. Add a hook, so when there's a change to sync, a program can be run -- cgit v1.2.3 From e7182ad1191b42d3431f14ced24d0a87ab91495e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 11:59:25 -0400 Subject: further design --- doc/design/assistant/syncing.mdwn | 40 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) (limited to 'doc/design/assistant/syncing.mdwn') diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 8b681ac10..99474928c 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -39,6 +39,46 @@ and with appropriate rate limiting and control facilities. This probably will need lots of refinements to get working well. +### first pass: flood syncing + +Before mapping the network, the best we can do is flood all files out to every +reachable remote. This is worth doing first, since it's the simplest way to +get the basic functionality of the assistant to work. And we'll need this +anyway. + + data ToTransfer = ToUpload Key | ToDownload Key + type ToTransferChan = TChan [ToTransfer] + +* ToUpload added by the watcher thread when it adds content. +* ToDownload added by the watcher thread when it seens new symlinks + that lack content. + +Transfer threads started/stopped as necessary to move data. +May sometimes want multiple threads downloading, or uploading, or even both. + + data TransferID = TransferThread ThreadID | TransferProcess Pid + data Direction = Uploading | Downloading + data Transfer = Transfer Direction Key TransferID EpochTime Integer + -- add [Transfer] to DaemonStatus + +The assistant needs to find out when `git-annex-shell` is receiving or +sending (triggered by another remote), so it can add data for those too. +This is important to avoid uploading content to a remote that is already +downloading it from us, or vice versa, as well as to in future let the web +app manage transfers as user desires. + +For files being received, it can see the temp file, but other than lsof +there's no good way to find the pid (and I'd rather not kill blindly). + +For files being sent, there's no filesystem indication. So git-annex-shell +(and other git-annex transfer processes) should write a status file to disk. + +Can use file locking on these status files to claim upload/download rights, +which will avoid races. + +This status file can also be updated periodically to show amount of transfer +complete (necessary for tracking uploads). + ## other considerations It would be nice if, when a USB drive is connected, -- cgit v1.2.3 From 0ed7db5f3ac87405f56eb27adb9fdaf42bc49125 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 14:03:37 -0400 Subject: add news item for git-annex 3.20120629 --- doc/design/assistant/syncing.mdwn | 2 ++ doc/news/version_3.20120605.mdwn | 11 ----------- doc/news/version_3.20120629.mdwn | 12 ++++++++++++ 3 files changed, 14 insertions(+), 11 deletions(-) delete mode 100644 doc/news/version_3.20120605.mdwn create mode 100644 doc/news/version_3.20120629.mdwn (limited to 'doc/design/assistant/syncing.mdwn') diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 99474928c..7c6ef16d3 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -46,6 +46,8 @@ reachable remote. This is worth doing first, since it's the simplest way to get the basic functionality of the assistant to work. And we'll need this anyway. +### transfer tracking + data ToTransfer = ToUpload Key | ToDownload Key type ToTransferChan = TChan [ToTransfer] diff --git a/doc/news/version_3.20120605.mdwn b/doc/news/version_3.20120605.mdwn deleted file mode 100644 index ed0a09177..000000000 --- a/doc/news/version_3.20120605.mdwn +++ /dev/null @@ -1,11 +0,0 @@ -git-annex 3.20120605 released with [[!toggle text="these changes"]] -[[!toggleable text=""" - * sync: Show a nicer message if a user tries to sync to a special remote. - * lock: Reset unlocked file to index, rather than to branch head. - * import: New subcommand, pulls files from a directory outside the annex - and adds them. - * Fix display of warning message when encountering a file that uses an - unsupported backend. - * Require that the SHA256 backend can be used when building, since it's the - default. - * Preserve parent environment when running hooks of the hook special remote."""]] \ No newline at end of file diff --git a/doc/news/version_3.20120629.mdwn b/doc/news/version_3.20120629.mdwn new file mode 100644 index 000000000..e6b98ae99 --- /dev/null +++ b/doc/news/version_3.20120629.mdwn @@ -0,0 +1,12 @@ +git-annex 3.20120629 released with [[!toggle text="these changes"]] +[[!toggleable text=""" + * cabal: Only try to use inotify on Linux. + * Version build dependency on STM, and allow building without it, + which disables the watch command. + * Avoid ugly failure mode when moving content from a local repository + that is not available. + * Got rid of the last place that did utf8 decoding. + * Accept arbitrarily encoded repository filepaths etc when reading + git config output. This fixes support for remotes with unusual characters + in their names. + * sync: Automatically resolves merge conflicts."""]] \ No newline at end of file -- cgit v1.2.3 From c79625290a9e17e8c9f6f0ed93a0e23a5ef0126c Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 14:12:16 -0400 Subject: improving transfer data types and design --- doc/design/assistant/syncing.mdwn | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) (limited to 'doc/design/assistant/syncing.mdwn') diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 7c6ef16d3..02811f07e 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -48,20 +48,24 @@ anyway. ### transfer tracking - data ToTransfer = ToUpload Key | ToDownload Key - type ToTransferChan = TChan [ToTransfer] - -* ToUpload added by the watcher thread when it adds content. -* ToDownload added by the watcher thread when it seens new symlinks +* Upload added to queue by the watcher thread when it adds content. +* Download added to queue by the watcher thread when it seens new symlinks that lack content. - -Transfer threads started/stopped as necessary to move data. -May sometimes want multiple threads downloading, or uploading, or even both. +* Transfer threads started/stopped as necessary to move data. + (May sometimes want multiple threads downloading, or uploading, or even both.) + + type TransferQueue = TChan [Transfer] + data Transfer = Upload Key Remote | Download Key Remote data TransferID = TransferThread ThreadID | TransferProcess Pid - data Direction = Uploading | Downloading - data Transfer = Transfer Direction Key TransferID EpochTime Integer - -- add [Transfer] to DaemonStatus + type AmountComplete = Integer + type StartedTime = EpochTime + data TransferInfo = TransferInfo TransferID StartedTime AmountComplete + -- add (M.Map Transfer TransferInfo) to DaemonStatus + + startTransfer :: Transfer -> Annex TransferID + + stopTransfer :: TransferID -> IO () The assistant needs to find out when `git-annex-shell` is receiving or sending (triggered by another remote), so it can add data for those too. -- cgit v1.2.3 From 660f81d2b2d8393577771c5f51e9da5f0ba00e22 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 15:44:14 -0400 Subject: blog for the day --- .../blog/day_20__data_transfer_design.mdwn | 51 ++++++++++++++++++++++ doc/design/assistant/progressbars.mdwn | 2 +- doc/design/assistant/syncing.mdwn | 4 +- 3 files changed, 54 insertions(+), 3 deletions(-) create mode 100644 doc/design/assistant/blog/day_20__data_transfer_design.mdwn (limited to 'doc/design/assistant/syncing.mdwn') diff --git a/doc/design/assistant/blog/day_20__data_transfer_design.mdwn b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn new file mode 100644 index 000000000..2733f09bc --- /dev/null +++ b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn @@ -0,0 +1,51 @@ +Today is a planning day. I have only a few days left before I'm off to +Nicaragua for [DebConf](http://debconf12.debconf.org/), where I'll only +have smaller chunks of time without interruptions. So it's important to get +some well-defined smallish chunks designed that I can work on later. See +bulleted action items below. Each should be around 1-2 hours unless it +turns out to be 8 hours... :) + +First, worked on writing down a design, and some data types, for data transfer +tracking (see [[syncing]] page). Found that writing down these simple data +types before I started slinging code has clarified things a lot for me. + +Most importantly, I realized that I will need to modify `git-annex-shell` +to record on disk what transfers it's doing, so the assistant can get that +information and use it to both avoid redundant transfers (potentially a big +problem!), and later to allow the user to control them using the web app. + +So these will be the first steps as I move toward implementing data +transfer tracking and naive flood fill transferring. + +* on-disk transfers in progress information files (read/write/enumerate) +* locking for the files, so redundant transfer races can be detected, + and failed transfers noticed +* update files as transfers proceed. See [[progressbars]] + (updating for downloads is easy; for uploads is hard) +* add Transfer queue TChan +* enqueue Transfers (Uploads) as new files are added to the annex by + Watcher. +* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by + Watcher. +* add TransferInfo Map to DaemonStatus for tracking transfers in progress. +* Poll transfer in progress info files for changes (use inotify again! + wow! hammer, meet nail..), and update the TransferInfo Map +* Write basic Transfer handling thread. Multiple such threads need to be + able to be run at once. Each will need its own independant copy of the + Annex state monad. +* Write transfer control thread, which decides when to launch transfers. +* At startup, and possibly periodically, look for files we have that + location tracking indicates remotes do not, and enqueue Uploads for + them. Also, enqueue Downloads for any files we're missing. + +While eventually the user will be able to use the web app to prioritize +transfers, stop and start, throttle, etc, it's important to get the default +behavior right. So I'm thinking about things like how to prioritize uploads +vs downloads, when it's appropriate to have multiple downloads running at +once, etc. + +* Find a way to probe available outgoing bandwidth, to throttle so + we don't bufferbloat the network to death. +* git-annex needs a simple speed control knob, which can be plumbed + through to, at least, rsync. A good job for an hour in an + airport somewhere. diff --git a/doc/design/assistant/progressbars.mdwn b/doc/design/assistant/progressbars.mdwn index 2ade05aa5..ee7384274 100644 --- a/doc/design/assistant/progressbars.mdwn +++ b/doc/design/assistant/progressbars.mdwn @@ -9,6 +9,6 @@ To get this info for downloads, git-annex can watch the file as it arrives and use its size. TODO: What about uploads? Will i have to parse rsync's progresss output? -Feed it via a named pipe? Ugh. +Feed it via a named pipe? Ugh. Check into librsync. This is one of those potentially hidden but time consuming problems. diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 02811f07e..ce7f9673b 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -58,9 +58,9 @@ anyway. data Transfer = Upload Key Remote | Download Key Remote data TransferID = TransferThread ThreadID | TransferProcess Pid - type AmountComplete = Integer + type BytesComplete = Integer type StartedTime = EpochTime - data TransferInfo = TransferInfo TransferID StartedTime AmountComplete + data TransferInfo = TransferInfo TransferID StartedTime BytesComplete -- add (M.Map Transfer TransferInfo) to DaemonStatus startTransfer :: Transfer -> Annex TransferID -- cgit v1.2.3 From 2d2bfe9809f8d8d5862bc12fbe40c2e25b2405a3 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 1 Jul 2012 20:55:20 -0400 Subject: reorg --- .../blog/day_20__data_transfer_design.mdwn | 33 ++----------------- doc/design/assistant/syncing.mdwn | 37 ++++++++++++++++++---- 2 files changed, 33 insertions(+), 37 deletions(-) (limited to 'doc/design/assistant/syncing.mdwn') diff --git a/doc/design/assistant/blog/day_20__data_transfer_design.mdwn b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn index 2733f09bc..4f47ae63c 100644 --- a/doc/design/assistant/blog/day_20__data_transfer_design.mdwn +++ b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn @@ -2,8 +2,8 @@ Today is a planning day. I have only a few days left before I'm off to Nicaragua for [DebConf](http://debconf12.debconf.org/), where I'll only have smaller chunks of time without interruptions. So it's important to get some well-defined smallish chunks designed that I can work on later. See -bulleted action items below. Each should be around 1-2 hours unless it -turns out to be 8 hours... :) +bulleted action items below (now moved to [[syncing]]. Each +should be around 1-2 hours unless it turns out to be 8 hours... :) First, worked on writing down a design, and some data types, for data transfer tracking (see [[syncing]] page). Found that writing down these simple data @@ -14,38 +14,9 @@ to record on disk what transfers it's doing, so the assistant can get that information and use it to both avoid redundant transfers (potentially a big problem!), and later to allow the user to control them using the web app. -So these will be the first steps as I move toward implementing data -transfer tracking and naive flood fill transferring. - -* on-disk transfers in progress information files (read/write/enumerate) -* locking for the files, so redundant transfer races can be detected, - and failed transfers noticed -* update files as transfers proceed. See [[progressbars]] - (updating for downloads is easy; for uploads is hard) -* add Transfer queue TChan -* enqueue Transfers (Uploads) as new files are added to the annex by - Watcher. -* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by - Watcher. -* add TransferInfo Map to DaemonStatus for tracking transfers in progress. -* Poll transfer in progress info files for changes (use inotify again! - wow! hammer, meet nail..), and update the TransferInfo Map -* Write basic Transfer handling thread. Multiple such threads need to be - able to be run at once. Each will need its own independant copy of the - Annex state monad. -* Write transfer control thread, which decides when to launch transfers. -* At startup, and possibly periodically, look for files we have that - location tracking indicates remotes do not, and enqueue Uploads for - them. Also, enqueue Downloads for any files we're missing. - While eventually the user will be able to use the web app to prioritize transfers, stop and start, throttle, etc, it's important to get the default behavior right. So I'm thinking about things like how to prioritize uploads vs downloads, when it's appropriate to have multiple downloads running at once, etc. -* Find a way to probe available outgoing bandwidth, to throttle so - we don't bufferbloat the network to death. -* git-annex needs a simple speed control knob, which can be plumbed - through to, at least, rsync. A good job for an hour in an - airport somewhere. diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index ce7f9673b..c18badb53 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -1,6 +1,37 @@ Once files are added (or removed or moved), need to send those changes to all the other git clones, at both the git level and the key/value level. +## action items + +* on-disk transfers in progress information files (read/write/enumerate) + **done** +* locking for the files, so redundant transfer races can be detected, + and failed transfers noticed **done** +* transfer info for git-annex-shell (problem: how to add a switch + with the necessary info w/o breaking backwards compatability?) +* update files as transfers proceed. See [[progressbars]] + (updating for downloads is easy; for uploads is hard) +* add Transfer queue TChan +* enqueue Transfers (Uploads) as new files are added to the annex by + Watcher. +* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by + Watcher. +* add TransferInfo Map to DaemonStatus for tracking transfers in progress. +* Poll transfer in progress info files for changes (use inotify again! + wow! hammer, meet nail..), and update the TransferInfo Map +* Write basic Transfer handling thread. Multiple such threads need to be + able to be run at once. Each will need its own independant copy of the + Annex state monad. +* Write transfer control thread, which decides when to launch transfers. +* At startup, and possibly periodically, look for files we have that + location tracking indicates remotes do not, and enqueue Uploads for + them. Also, enqueue Downloads for any files we're missing. +* Find a way to probe available outgoing bandwidth, to throttle so + we don't bufferbloat the network to death. +* git-annex needs a simple speed control knob, which can be plumbed + through to, at least, rsync. A good job for an hour in an + airport somewhere. + ## git syncing 1. Can use `git annex sync`, which already handles bidirectional syncing. @@ -55,12 +86,6 @@ anyway. (May sometimes want multiple threads downloading, or uploading, or even both.) type TransferQueue = TChan [Transfer] - data Transfer = Upload Key Remote | Download Key Remote - - data TransferID = TransferThread ThreadID | TransferProcess Pid - type BytesComplete = Integer - type StartedTime = EpochTime - data TransferInfo = TransferInfo TransferID StartedTime BytesComplete -- add (M.Map Transfer TransferInfo) to DaemonStatus startTransfer :: Transfer -> Annex TransferID -- cgit v1.2.3