Once files are added (or removed or moved), need to send those changes to all the other git clones, at both the git level and the key/value level. ## immediate action items * At startup, and possibly periodically, look for files we have that location tracking indicates remotes do not, and enqueue Uploads for them. Also, enqueue Downloads for any files we're missing. * After git sync, identify content that we don't have that is now available on remotes, and transfer. But first, need to ensure that when a remote receives content, and updates its location log, it syncs that update out. * When MountWatcher detects a newly mounted drive, rescan git remotes in order to get ones on the drive, and do a git sync and file transfers to sync any repositories on it. ## longer-term TODO * Test MountWatcher on Gnome (should work ok) and LXDE (dunno). * git-annex needs a simple speed control knob, which can be plumbed through to, at least, rsync. A good job for an hour in an airport somewhere. * Find a way to probe available outgoing bandwidth, to throttle so we don't bufferbloat the network to death. * Investigate the XMPP approach like dvcs-autosync does, or other ways of signaling a change out of band. * Add a hook, so when there's a change to sync, a program can be run and do its own signaling. * --debug will show often unnecessary work being done. Optimise. * This assumes the network is connected. It's often not, so the [[cloud]] needs to be used to bridge between LANs. * Configurablity, including only enabling git syncing but not data transfer; only uploading new files but not downloading, and only downloading files in some directories and not others. See for use cases: [[forum/Wishlist:_options_for_syncing_meta-data_and_data]] * speed up git syncing by using the cached ssh connection for it too (will need to use `GIT_SSH`, which needs to point to a command to run, not a shell command line) ## data syncing There are two parts to data syncing. First, map the network and second, decide what to sync when. Mapping the network can reuse code in `git annex map`. Once the map is built, we want to find paths through the network that reach all nodes eventually, with the least cost. This is a minimum spanning tree problem, except with a directed graph, so really a Arborescence problem. With the map, we can determine which nodes to push new content to. Then we need to control those data transfers, sending to the cheapest nodes first, and with appropriate rate limiting and control facilities. This probably will need lots of refinements to get working well. ### first pass: flood syncing Before mapping the network, the best we can do is flood all files out to every reachable remote. This is worth doing first, since it's the simplest way to get the basic functionality of the assistant to work. And we'll need this anyway. ## done 1. Can use `git annex sync`, which already handles bidirectional syncing. When a change is committed, launch the part of `git annex sync` that pushes out changes. **done**; changes are pushed out to all remotes in parallel 1. Watch `.git/refs/remotes/` for changes (which would be pushed in from another node via `git annex sync`), and run the part of `git annex sync` that merges in received changes, and follow it by the part that pushes out changes (sending them to any other remotes). [The watching can be done with the existing inotify code! This avoids needing any special mechanism to notify a remote that it's been synced to.] **done** 1. Periodically retry pushes that failed. **done** (every half an hour) 1. Also, detect if a push failed due to not being up-to-date, pull, and repush. **done** 2. Use a git merge driver that adds both conflicting files, so conflicts never break a sync. **done** * on-disk transfers in progress information files (read/write/enumerate) **done** * locking for the files, so redundant transfer races can be detected, and failed transfers noticed **done** * transfer info for git-annex-shell **done** * update files as transfers proceed. See [[progressbars]] (updating for downloads is easy; for uploads is hard) * add Transfer queue TChan **done** * add TransferInfo Map to DaemonStatus for tracking transfers in progress. **done** * Poll transfer in progress info files for changes (use inotify again! wow! hammer, meet nail..), and update the TransferInfo Map **done** * enqueue Transfers (Uploads) as new files are added to the annex by Watcher. **done** * enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by Watcher. **done** (Note: Needs git-annex branch to be merged before the tree is merged, so it knows where to download from. Checked and this is the case.) * Write basic Transfer handling thread. Multiple such threads need to be able to be run at once. Each will need its own independant copy of the Annex state monad. **done** * Write transfer control thread, which decides when to launch transfers. **done** * Transfer watching has a race on kqueue systems, which makes finished fast transfers not be noticed by the TransferWatcher. Which in turn prevents the transfer slot being freed and any further transfers from happening. So, this approach is too fragile to rely on for maintaining the TransferSlots. Instead, need [[todo/assistant_threaded_runtime]], which would allow running something for sure when a transfer thread finishes. **done** * Test MountWatcher on KDE, and add whatever dbus events KDE emits when drives are mounted. **done**