summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2013-05-28 18:25:47 -0400
committerGravatar Joey Hess <joey@kitenet.net>2013-05-28 18:25:47 -0400
commit778f887d22cef3099cb71a4405b5eb49feebde85 (patch)
tree7ef66d65a5d609de099494906ff1884dee73084e
parentf9aef09f97a864a9bc2663af80736df1a3f0a816 (diff)
parent2133d00a826ac778ee17f5264804469870e6b3ed (diff)
Merge branch 'master' of ssh://git-annex.branchable.com
-rw-r--r--doc/bugs/Can__39__t_add_a_git_repo_to_git_annex:___34__Invalid_path_repo__47__.git__47__X__34___for_many_X/comment_4_cafcc24e98a89f10adaed5e09f75b659._comment19
-rw-r--r--doc/bugs/Direct_mode_repositories_end_up_with_unstaged_changes.mdwn45
-rw-r--r--doc/bugs/Direct_mode_repositories_still_use_symlinks_sometimes.mdwn32
-rw-r--r--doc/bugs/git_annex_sync_in_direct_mode_does_not_honor_skip-worktree.mdwn35
-rw-r--r--doc/design/assistant/blog/day_276__fuzzing_continues.mdwn8
-rw-r--r--doc/design/assistant/blog/day_276__fuzzing_continues/comment_1_f5dd0658511a1063c2eb025b0fe98426._comment14
-rw-r--r--doc/forum/Help_with_syncing_file_contents.mdwn68
-rw-r--r--doc/forum/Help_with_syncing_file_contents/comment_1_7ec34de3140983739080115c82966bf5._comment18
-rw-r--r--doc/forum/Help_with_syncing_file_contents/comment_2_7dba58d3c62d6f64a270298e4e4329a4._comment10
-rw-r--r--doc/forum/Securing_a_shared_ssh_server/comment_4_67533d08e1b8706b844262e9c483d982._comment15
-rw-r--r--doc/install/Debian/comment_7_1bccc7bf7a4ef61a9b30024b9b22ba7d._comment12
11 files changed, 272 insertions, 4 deletions
diff --git a/doc/bugs/Can__39__t_add_a_git_repo_to_git_annex:___34__Invalid_path_repo__47__.git__47__X__34___for_many_X/comment_4_cafcc24e98a89f10adaed5e09f75b659._comment b/doc/bugs/Can__39__t_add_a_git_repo_to_git_annex:___34__Invalid_path_repo__47__.git__47__X__34___for_many_X/comment_4_cafcc24e98a89f10adaed5e09f75b659._comment
new file mode 100644
index 000000000..6b7c9f33b
--- /dev/null
+++ b/doc/bugs/Can__39__t_add_a_git_repo_to_git_annex:___34__Invalid_path_repo__47__.git__47__X__34___for_many_X/comment_4_cafcc24e98a89f10adaed5e09f75b659._comment
@@ -0,0 +1,19 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawkkyBDsfOB7JZvPZ4a8F3rwv0wk6Nb9n48"
+ nickname="Abdó"
+ subject="comment 4"
+ date="2013-05-28T19:21:57Z"
+ content="""
+This \"git is afraid of .git\" issue is the main blocker for finally getting rid of unison. My use case is as follows. Among other things, I have a `~/work` directory infested with little projects versioned by git. I want to sync it between my 3 machines and a cloud server. My current setup involves star-shaped unison syncs to the server. That's not bad, but it has its problems:
+
+ * unison keeps a file index for every pair of machines (laptop-server, office-server, etc). This means that I end up with 3 identical indexes on the server, indexing the same data. Every time I sync a pair, the server rechecks what has changed to update the corresponding index.
+ * also, every time I add a machine, or my disk explodes and I have to setup a new unison sync from scratch, the server has to reindex everything, which is slow.
+ * unison does not know about the entire history, only the current state of the replicas. This may lead to data loss if I delete something I shouldn't delete and propagate. Only in special cases, for instance when I delete everything in one replica, unison asks before throwing it all out the window.
+ * I sometimes want to sync laptop and desktop through the local network, instead of going through the server. Then I have to be very careful in which order I do the syncs + it adds a couple of new redundant indices.
+
+Now, git annex is not a sync tool. But as a side effect of its `git annex sync` feature, it happens to solve those issues in an elegant way, making it an extremely flexible sync tool, far superior to unison in many aspects!
+
+Still, my `~/work` directory is infested with little git repos, so I can't use git annex on `~/work`. Also, I treat my little git projects as things carrying their own history arround. Sometimes I move them, etc. I don't want to use mr in them, nor keep remotes for all my machines on all my little projects. That removes a lot of the flexibility I'd gain by moving to git annex.
+
+The thing is, I don't understand why this git limitation is fairly fundamental. I've been playing around nesting git repos. When I change the inner `.git` directory to `.bar`, the outer git swallows it all right, and after some playing around with commits and checkouts on the inner and outer repos, the internal repo survived the process. Also, I don't think versioning content inside `.git` may disorient git in any way. Every git call knows on which `.git` directory it operates, just go up through the path looking for the first `.git` dir which is NOT a part of the actual path. Is there anything else I am missing? Would it be feasible to patch git adding a config option that makes it treat `.git` dirs as regular dirs? I'd be willing to mess with git's source when I get the time to do it.
+"""]]
diff --git a/doc/bugs/Direct_mode_repositories_end_up_with_unstaged_changes.mdwn b/doc/bugs/Direct_mode_repositories_end_up_with_unstaged_changes.mdwn
new file mode 100644
index 000000000..2b91bf913
--- /dev/null
+++ b/doc/bugs/Direct_mode_repositories_end_up_with_unstaged_changes.mdwn
@@ -0,0 +1,45 @@
+### Please describe the problem.
+
+After running two repositories syncing with one another in direct mode "git status" shows unstaged changes in both.
+
+### What steps will reproduce the problem?
+
+1. Create two direct mode repositories with each other as ssh remotes
+2. Run "git annex assistant" on each
+3. Create files on each and they get synced
+4. Run "git status"
+
+In my current repository the output is:
+
+[[!format sh """
+$ git status
+# On branch master
+# Changes not staged for commit:
+# (use "git add <file>..." to update what will be committed)
+# (use "git checkout -- <file>..." to discard changes in working directory)
+#
+# typechange: fromgolias
+# typechange: fromwintermute
+#
+no changes added to commit (use "git add" and/or "git commit -a")
+"""]]
+
+### What version of git-annex are you using? On what operating system?
+
+[[!format sh """
+$ git annex version
+git-annex version: 4.20130516.1
+build flags: Assistant Webapp Pairing Testsuite S3 WebDAV Inotify DBus XMPP
+local repository version: 4
+default repository version: 3
+supported repository versions: 3 4
+upgrade supported from repository versions: 0 1 2
+
+$ lsb_release -a
+No LSB modules are available.
+Distributor ID: Ubuntu
+Description: Ubuntu 12.04.2 LTS
+Release: 12.04
+Codename: precise
+"""]]
+
diff --git a/doc/bugs/Direct_mode_repositories_still_use_symlinks_sometimes.mdwn b/doc/bugs/Direct_mode_repositories_still_use_symlinks_sometimes.mdwn
new file mode 100644
index 000000000..782b80e5e
--- /dev/null
+++ b/doc/bugs/Direct_mode_repositories_still_use_symlinks_sometimes.mdwn
@@ -0,0 +1,32 @@
+### Please describe the problem.
+
+When a repository is set in direct mode it will still replace files with symlinks when it becomes aware of a change but still hasn't been able to sync the file contents. This can create repositories that are temporarily unusable with files replaced with broken symlinks.
+
+### What steps will reproduce the problem?
+
+1. Create two repositories with each other as remotes
+2. Run the assistant on both
+3. Create some file changes in one and watch the directory in another.
+4. For a brief (or sometimes long) time the destination repository will have it's old version of the file replaced by a broken symlink
+
+This is particularly noticeable when using XMPP as it can often be the case that the two repositories can't connect to each other directly but can talk through XMPP. This breaks using git-annex in direct mode for things like having a synced config directory across machines. Something like having "~/.bashrc" linked into "~/annex-repository/bashrc", doesn't work as there will be times when a machine is broken because .bashrc is linked to a broken symlink while it fetches a new version.
+
+The desired behavior would be to have git-annex in direct mode only replace older versions of files with newer versions of files.
+
+### What version of git-annex are you using? On what operating system?
+
+[[!format sh """
+$ git annex version
+git-annex version: 4.20130516.1
+build flags: Assistant Webapp Pairing Testsuite S3 WebDAV Inotify DBus XMPP
+local repository version: 4
+default repository version: 3
+supported repository versions: 3 4
+upgrade supported from repository versions: 0 1 2
+$ lsb_release -a
+No LSB modules are available.
+Distributor ID: Ubuntu
+Description: Ubuntu 12.04.2 LTS
+Release: 12.04
+Codename: precise
+"""]]
diff --git a/doc/bugs/git_annex_sync_in_direct_mode_does_not_honor_skip-worktree.mdwn b/doc/bugs/git_annex_sync_in_direct_mode_does_not_honor_skip-worktree.mdwn
new file mode 100644
index 000000000..4d60c96c8
--- /dev/null
+++ b/doc/bugs/git_annex_sync_in_direct_mode_does_not_honor_skip-worktree.mdwn
@@ -0,0 +1,35 @@
+### Please describe the problem.
+
+In a direct mode repo (crippled/uncrippled filesystem does not matter), when a symlink is marked using `git update-index --skip-worktree <FILE>` and removed, git annex sync still `git rm`s the symlink. This does not happen in indirect mode (git annex sync leaves the symlink in git intact).
+
+### What steps will reproduce the problem?
+
+[[!format sh """
+mkdir test-repo; cd test-repo
+git init
+git annex init
+echo file1 >file1
+git annex add
+git commit -m"update"
+cd ..
+git clone test-repo test-repo2; cd test-repo2
+git annex init
+git annex direct
+git update-index --skip-worktree file1
+rm file1
+git annex sync
+"""]]
+
+Output of `git annex sync` indicates file has been removed from git. Repeating these steps without the `git annex direct` above to set the second repo to direct mode will succeed in retaining the symlink in git.
+
+### What version of git-annex are you using? On what operating system?
+
+4.20130521 using git-annex-standalone AUR build (uses Linux executable tarball) on Arch Linux
+
+### Please provide any additional information below.
+
+I'd like to use the skip-worktree scheme in order to be able to rm the symlink files (from the filesystem, not git), specifically for my Android devices. Syncing my music annex creates .mp3 symlinks that aren't actually MP3s, which gives the stock apps some fits. This would only be for clearing out symlinks; I fully understand that trying to doing this for downloaded content in a direct repo would be a Class A no-no. :-)
+
+I did a little digging in the code, and it looks like the source of this is the stageDirect step done specifically by `git annex sync` in direct repos (which makes sense, since indirect repos work). It does `git ls-files --others --exclude-standard --stage`. This list includes files marked skip-worktree, which means skip-worktree files would be treated like normal, and deleted because it's no longer there. There is an additional `-t` argument that could be added to ls-files that would provide the tag field to indicate if a file was marked skip-worktree, and they could be filtered out of processing.
+
+I wonder if this would have side effects, or if there are other places in the code where skip-worktree files would need to be handled, though. I'm particularly motivated to solve this, so let me know if it doesn't look like it would get looked at right away, and I'll have an excuse to get a Haskell dev environment setup again and shake the rust off.
diff --git a/doc/design/assistant/blog/day_276__fuzzing_continues.mdwn b/doc/design/assistant/blog/day_276__fuzzing_continues.mdwn
index 57f26bc35..d6fc88b05 100644
--- a/doc/design/assistant/blog/day_276__fuzzing_continues.mdwn
+++ b/doc/design/assistant/blog/day_276__fuzzing_continues.mdwn
@@ -1,12 +1,12 @@
The fuzz testing found a file descriptor leak in the XMPP git push code.
The assistant seems to hold up under fuzzing for quite a while now.
-Have started trying to workaround some versions of Android not letting
-the `am` command be used by regular users to open a web browser on an url.
+Have started trying to work around some versions of Android not letting
+the `am` command be used by regular users to open a web browser on an URL.
Here is my current crazy plan: Hack the terminal emulator's title setting
-code, to get a new escape sequence that requests an url be opened. This
+code, to get a new escape sequence that requests an URL be opened. This
assumes I can just use `startActivity()` from inside the app and it will
work. This may sound a little weird, but it avoids me needing to set up a
-new communications channel from the assistant to the java app. Best of all,
+new communications channel from the assistant to the Java app. Best of all,
I have to write very little Java code. I last wrote Java code in 1995, so
writing much more is probably a good thing to avoid.
diff --git a/doc/design/assistant/blog/day_276__fuzzing_continues/comment_1_f5dd0658511a1063c2eb025b0fe98426._comment b/doc/design/assistant/blog/day_276__fuzzing_continues/comment_1_f5dd0658511a1063c2eb025b0fe98426._comment
new file mode 100644
index 000000000..4a54fa188
--- /dev/null
+++ b/doc/design/assistant/blog/day_276__fuzzing_continues/comment_1_f5dd0658511a1063c2eb025b0fe98426._comment
@@ -0,0 +1,14 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawkGCmVc5qIJaQQgG82Hc5zzBdAVdhe2JEM"
+ nickname="Bruno"
+ subject="comment 1"
+ date="2013-05-28T03:23:50Z"
+ content="""
+Does the Android application use a [WakeLock](https://developer.android.com/reference/android/os/PowerManager.WakeLock.html) to ensure syncing happens even if the device screen goes off?
+
+I'm under the impression that it doesn't since it took a very long time to sync the first time (before I used the terminal's options to prevent the phone from sleeping).
+
+I think that option shouldn't be left on since it would waste the battery and I think the application should block sleep mode only when syncing.
+
+I might be wrong. I never had to use a WakeLock on Android yet.
+"""]]
diff --git a/doc/forum/Help_with_syncing_file_contents.mdwn b/doc/forum/Help_with_syncing_file_contents.mdwn
new file mode 100644
index 000000000..28ac22b2e
--- /dev/null
+++ b/doc/forum/Help_with_syncing_file_contents.mdwn
@@ -0,0 +1,68 @@
+Hi everyone,
+
+everyday I understand more and more how git and git-annex work but I need help with this one.
+I guess I have two questions but let me describe the scenario first:
+
+I have one local repository of mp3s (assume just one file: file1.mp3).
+I clone that repository into a remote git-annex repository over ssh and "git annex get" the file contents for file1.mp3.
+
+I unlock some mp3's locally and modify some mp3 tags.
+Then I notify git-annex of these changes
+with "git annex add *"
+and commit them with "git commit -m 'mp3 tags changed'".
+[git annex locks them again and changes the symlinks to point to the changed file in the annex, git commits the changed symlink]
+
+At this point in time there are two objects in my git annex repository:
+hash(file1.mp3)
+hash(file1.mp3|with modified tags)
+The symlink points now to hash(file1.mp3|with modified tags)
+
+At this point the remote still does not now of this commit and of the new file contents.
+Thus I do "git push" to send the changes to the remote.
+The remote now has a BROKEN symlink because it already points to hash(file1.mp3|with modified tags)
+but the remote annex's object directory only contains hash(file1.mp3).
+
+Then I want my remote repository to also have the updated mp3 tags.
+The only way I see (without scripting) to have the updated mp3 tags in the remote repository is to do an "git annex get file1.mp3" on the remote repository or an "git annex copy --to remote file1.mp3" at my local repository. However, although the binary differences between both files
+file1.mp3 and file1.mp3|with modified tags are small the latter is transferred completely from the local repository to the remote repository.
+
+This is not a problem when just changing one file, but a problem when I have 10GB's of files and when it takes 2days to upload them to the remote because of a low bandwidth.
+
+First question: Did I miss something? Does git-annex already provide means to only transmit the diff between the two objects?
+
+Second question regarding disk space.
+I now have a complete history of all changes to file1.mp3 in my git-annex repository. I have the objects that represent every state of file1.mp3 and I can go back to these states when I checkout the respective commits and thereby the symlinks that link to these "old" objects. This history can take up a lot of space. What is the clean way to forget the past? AFAIK "git drop unused" only drops file contents that are not referenced in any commit?
+
+If one wanted to preserve the entire history but save disk space one could also only store the current content and the patches that allow to reconstruct older versions from the current one. I understand that applying several patches consecutively takes more cpu time but for me going back to an older commit with my binary files only happens rarely.
+
+This is the algorithm I have in mind for an optimized "git annex get file1":
+
+On that repository where the file is missing:
+
+1. Find the newest object that represented the contents of file1 in file1's commit history.
+
+2. Transmit this object identification hash(object) to the remote that has the current version (the one I am getting from).
+
+At that remote:
+
+If the history contains full versions of file contents:
+
+Create a binary diff between the object identified by hash(object) and the current content of file1.
+
+If the history contains only the current version and patches to older versions:
+Collect all patches that represent the change from hash(object) to the current content of file1.
+
+If the list of patches is bigger than the current content of file 1 transmit the current content of file1. Otherwise transmit the patch(es)
+
+On that repository where the file is missing:
+
+Apply the patch(es) to the latest object to obtain the current object.
+
+What's your opinion on this?
+
+Marek
+
+
+
+
+
diff --git a/doc/forum/Help_with_syncing_file_contents/comment_1_7ec34de3140983739080115c82966bf5._comment b/doc/forum/Help_with_syncing_file_contents/comment_1_7ec34de3140983739080115c82966bf5._comment
new file mode 100644
index 000000000..87ef6d130
--- /dev/null
+++ b/doc/forum/Help_with_syncing_file_contents/comment_1_7ec34de3140983739080115c82966bf5._comment
@@ -0,0 +1,18 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawmp1ThsNNAbSn46ju-gwFELfStlhl8usJo"
+ nickname="donkeyicydragon"
+ subject="Possible quick solution with rsync"
+ date="2013-05-28T10:52:30Z"
+ content="""
+One could achieve the effect of only transmitting the changes of file contents using rsync.
+
+On the repository that lacks the current version:
+
+cp latestversionavailable currentversion
+rsync remoterepository/currentversion ./currentversion
+
+
+
+
+
+"""]]
diff --git a/doc/forum/Help_with_syncing_file_contents/comment_2_7dba58d3c62d6f64a270298e4e4329a4._comment b/doc/forum/Help_with_syncing_file_contents/comment_2_7dba58d3c62d6f64a270298e4e4329a4._comment
new file mode 100644
index 000000000..2d427cccb
--- /dev/null
+++ b/doc/forum/Help_with_syncing_file_contents/comment_2_7dba58d3c62d6f64a270298e4e4329a4._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawmp1ThsNNAbSn46ju-gwFELfStlhl8usJo"
+ nickname="donkeyicydragon"
+ subject="I should have read git annex unused first"
+ date="2013-05-28T22:14:46Z"
+ content="""
+Apparently \"git annex unused\" finds previous versions of the object that are not symlinked anymore. However, if a file is renamed then the oldest version before the rename is not marked as unused.
+
+
+"""]]
diff --git a/doc/forum/Securing_a_shared_ssh_server/comment_4_67533d08e1b8706b844262e9c483d982._comment b/doc/forum/Securing_a_shared_ssh_server/comment_4_67533d08e1b8706b844262e9c483d982._comment
new file mode 100644
index 000000000..107c7b3cc
--- /dev/null
+++ b/doc/forum/Securing_a_shared_ssh_server/comment_4_67533d08e1b8706b844262e9c483d982._comment
@@ -0,0 +1,15 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawkwjBDXkP9HAQKhjTgThGOxUa1B99y_WRA"
+ nickname="Franck"
+ subject="comment 4"
+ date="2013-05-28T06:34:04Z"
+ content="""
+Thanks, but my server is a synology nas and as you know from another thread of comments, having git-annex work on it is not that simple. ;-)
+Moreover, I'd like to be able to use ssh accounts where I don't have a root access and not necessarily git. So, a general method to restrict ssh would interest me.
+
+But your answer seems to suggest that almost arbitrary rsync commands may be given. If so, I agree that there are few hopes to build a secured jail around this... But if really a limited subset of commands is used, I think it should be possible to check them securely.
+
+Now on I'm focused on having git-annex work because this looks like the most promising way. But I'll have another question regarding it: I noticed that we can restrict access to a specific repository using an appropriate environnement variable. But it's it possible to provide a list of repositories instead of just one? My collaborators will typically have access to several shares but not to all of them.
+
+Thanks for your responsiveness, after trying tens of candidates git-annex appears to be the only serious solution to replace Dropbox and I'm really glad that you actively help your users!
+"""]]
diff --git a/doc/install/Debian/comment_7_1bccc7bf7a4ef61a9b30024b9b22ba7d._comment b/doc/install/Debian/comment_7_1bccc7bf7a4ef61a9b30024b9b22ba7d._comment
new file mode 100644
index 000000000..6d8cae2f0
--- /dev/null
+++ b/doc/install/Debian/comment_7_1bccc7bf7a4ef61a9b30024b9b22ba7d._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="http://mey.vn/"
+ ip="46.65.14.106"
+ subject="libc6 dep version on amd64"
+ date="2013-05-28T15:28:47Z"
+ content="""
+hi Joey,
+
+i see from the release notes of the 4.20130521 release that the Debian package should now be built with libc6 2.13, which appears to be the case except for the amd64 arch (hence the amd64 package won't install as is on Wheezy on amd64) - is this a build glitch or is 2.14 needed on amd64 (i imagine as a dependency of one of git-annex's deps on that arch)?
+
+thanks!
+"""]]