summaryrefslogtreecommitdiff
path: root/doc/design/assistant/desymlink.mdwn
blob: ced1e1587f6e988c5b332d7b6eb0f6b4f575d4cf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
While dropbox allows modifying files in the folder, git-annex freezes
them upon creation, using symlinks.

This is a core design simplification of git-annex.
But it is problematic too:

* To allow directly editing files in its folder, something like [[todo/smudge]] is
  needed, to get rid of the symlinks that stand in for the files.
* OSX seems to have a [[lot_of_problems|bugs/OSX_alias_permissions_and_versions_problem]]
  with stupid programs that follow symlinks and present the git-annex
  hash filename to the user.
* FAT sucks and doesn't support symlinks at all, so [[Android]] can't
  have regular repos on it.

One approach for this would be to hide the git repo away somewhere,
and have the git-annex assistant watch a regular directory, with
regular files.

There would have to be a mapping from files to git-annex objects.
And some intelligent way to determine when a file has been changed
and no longer corresponds to its object. (Not expensive hashing every time,
plz.)

Since this leaves every file open to modification, any such repository
probably needs to be considered untrusted by git-annex. So it doesn't
leave its only copy of a file in such a repository, but instead
syncs it to a proper git-annex repository.

The assistant would make git commits still, of symlinks. It can already do
that with without actual symlinks existing on disk. More difficult is
handling merging; git merge wants a real repository with files it can
really operate on. The assistant would need to calculate merges on its own,
and update the regular directory to reflect changes made in the merge.

Another massive problem with this idea is that it doesn't allow for
[[partial_content]]. The symlinks that everyone loves to hate on are what
make it possible for the contents of some files to not be present on
disk, while the files are still in git and can be retreived as desired.
With real files, some other standin for a missing file would be needed.
Perhaps a 0 length, unreadable, unwritable file? On systems that
support symlinks it could be a broken symlink like is used now, that
is converted to a real file when it becomes present.

## concrete design

* Enable with annex.nosymlink or such config option.
* Use .git/ for the git repo, but `.git/annex/objects` won't be used.
* `git status` and similar will show all files as type changed, and
  `git commit` would be a very bad idea. Just don't support users running
  git commands that affect the repository in this mode.
* However, `git status` and similar also will show deleted and new files,
  which will be helpful for the assistant to use when starting up.
* Cache the mtime, size etc of files, and use this to detect when they've been
  modified when the assistant was not running. This would only need to be
  checked at startup, probably.
* Use dangling symlinks for standins for missing content, as now.
  This allows just cloning one of these repositories normally, and then
  as the files are synced in, they become real files.
* Maintain a local mapping from keys to files in the tree. This is needed
  when sending/receiving keys to know what file to access. Note that a key
  can map to multiple files. And that when a file is deleted or moved, the
  mapping needs to be updated.
* May need a reverse mapping, from files in the tree to keys? TBD
* The existing watch code detects when a file gets closed, and in this
  mode, it could be a new file, or a modified file, or an unchanged file.
  For a modified file, can compare mtime, size, etc, to see if it needs
  to be re-added.
* The inotify/kqueue interface does not tell us when a file is renamed.
  So a rename has to be treated as a delete and an add, so can have a lot
  of overhead, to re-hash the file.
* Note that this could be used without the assistant, as a git remote
  that content is transferred to and from. Without the assistant, changes
  to files in this remote would not be noticed and committed, unless
  a git-annex command were added to do so.
  Getting it basically working as a remote would be a good 1st step.

## TODO

* Deal with files changing as they're being transferred from a direct mode
  repository to another git repository. The remote repo currently will 
  accept the bad data and update the location log to say it has the key.