summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2011-03-16 00:09:35 -0400
committerGravatar Joey Hess <joey@kitenet.net>2011-03-16 00:09:35 -0400
commit539083b847a9551c276e67eb364fc80955dcdff1 (patch)
tree171fadfdcc5badcea7063f7acf23460a764392a9
parent09a7689bc30faaf938a0b32a417d38ac093a6f7a (diff)
parentfd2f04694f8ba52d9b67e35c95f1bddc33bbe292 (diff)
Merge remote-tracking branch 'origin/master' into reorg
-rw-r--r--doc/bugs/free_space_checking/comment_1_a868e805be43c5a7c19c41f1af8e41e6._comment10
-rw-r--r--doc/bugs/free_space_checking/comment_2_8a65f6d3dcf5baa3f7f2dbe1346e2615._comment8
-rw-r--r--doc/bugs/git_rename_detection_on_file_move/comment_2_7101d07400ad5935f880dc00d89bf90e._comment27
-rw-r--r--doc/bugs/git_rename_detection_on_file_move/comment_3_57010bcaca42089b451ad8659a1e018e._comment8
-rw-r--r--doc/comments.mdwn9
-rw-r--r--doc/forum/can_git-annex_replace_ddm__63__/comment_2_008554306dd082d7f543baf283510e92._comment19
-rw-r--r--doc/forum/can_git-annex_replace_ddm__63__/comment_3_4c69097fe2ee81359655e59a03a9bb8d._comment12
-rw-r--r--doc/forum/hashing_objects_directories/comment_2_504c96959c779176f991f4125ea22009._comment14
-rw-r--r--doc/forum/hashing_objects_directories/comment_3_9134bde0a13aac0b6a4e5ebabd7f22e8._comment12
-rw-r--r--doc/index.mdwn1
-rw-r--r--doc/todo/object_dir_reorg_v2.mdwn2
-rw-r--r--doc/todo/object_dir_reorg_v2/comment_1_ba03333dc76ff49eccaba375e68cb525._comment8
-rw-r--r--doc/todo/object_dir_reorg_v2/comment_2_81276ac309959dc741bc90101c213ab7._comment8
-rw-r--r--doc/todo/object_dir_reorg_v2/comment_3_79bdf9c51dec9f52372ce95b53233bb2._comment12
-rw-r--r--doc/todo/object_dir_reorg_v2/comment_4_93aada9b1680fed56cc6f0f7c3aca5e5._comment12
15 files changed, 162 insertions, 0 deletions
diff --git a/doc/bugs/free_space_checking/comment_1_a868e805be43c5a7c19c41f1af8e41e6._comment b/doc/bugs/free_space_checking/comment_1_a868e805be43c5a7c19c41f1af8e41e6._comment
new file mode 100644
index 000000000..954433deb
--- /dev/null
+++ b/doc/bugs/free_space_checking/comment_1_a868e805be43c5a7c19c41f1af8e41e6._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U"
+ nickname="Richard"
+ subject="comment 1"
+ date="2011-03-15T14:11:27Z"
+ content="""
+Keep in mind that lots of small files may have significant overhead, so a warning that it's not possible to make sure there's enough space would make sense for certain corner cases. Actually finding out the exact overhead is beyond git-annex' scope and, given transparent compression etc, ability, but a warning, optionally with a \"do you want to continue\" prompt can't hurt.
+
+-- RichiH
+"""]]
diff --git a/doc/bugs/free_space_checking/comment_2_8a65f6d3dcf5baa3f7f2dbe1346e2615._comment b/doc/bugs/free_space_checking/comment_2_8a65f6d3dcf5baa3f7f2dbe1346e2615._comment
new file mode 100644
index 000000000..9a43fe3f2
--- /dev/null
+++ b/doc/bugs/free_space_checking/comment_2_8a65f6d3dcf5baa3f7f2dbe1346e2615._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="http://joey.kitenet.net/"
+ nickname="joey"
+ subject="comment 2"
+ date="2011-03-16T03:04:50Z"
+ content="""
+Right. You probably don't want git-annex to fill up your entire drive anyway, so if it tries to reseve 10 mb or 1% or whatever (probably configurable) for overhead, that should be good enough.
+"""]]
diff --git a/doc/bugs/git_rename_detection_on_file_move/comment_2_7101d07400ad5935f880dc00d89bf90e._comment b/doc/bugs/git_rename_detection_on_file_move/comment_2_7101d07400ad5935f880dc00d89bf90e._comment
new file mode 100644
index 000000000..7d50c58d1
--- /dev/null
+++ b/doc/bugs/git_rename_detection_on_file_move/comment_2_7101d07400ad5935f880dc00d89bf90e._comment
@@ -0,0 +1,27 @@
+[[!comment format=mdwn
+ username="praet"
+ ip="81.240.159.215"
+ subject="Use variable symlinks, relative to the repo's root ?"
+ date="2011-03-10T16:50:28Z"
+ content="""
+It all boils down to the fact that the path to a relative symlink's target is determined relative to the symlink itself.
+
+Now, if we define the symlink's target relative to the git repo's root (eg. using the $GIT_DIR environment variable, which can be a relative or absolute path itself), this unfortunately results in an absolute symlink, which would -for obvious reasons- only be usable locally:
+
+ user@host:~$ mkdir -p tmp/{.git/annex,somefolder}
+ user@host:~$ export GIT_DIR=~/tmp
+ user@host:~$ touch $GIT_DIR/.git/annex/realfile
+ user@host:~$ ln -s $GIT_DIR/.git/annex/realfile $GIT_DIR/somefolder/file
+ user@host:~$ ls -al $GIT_DIR/somefolder/
+ total 12
+ drwxr-x--- 2 user group 4096 2011-03-10 16:54 .
+ drwxr-x--- 4 user group 4096 2011-03-10 16:53 ..
+ lrwxrwxrwx 1 user group 33 2011-03-10 16:54 file -> /home/user/tmp/.git/annex/realfile
+ user@host:~$
+
+So, what we need is the ability to record the actual variable name (instead of it's value) in our symlinks.
+
+It *is* possible, using [variable/variant symlinks](http://en.wikipedia.org/wiki/Symbolic_link#Variable_symbolic_links), yet I'm unsure as to whether or not this is available on Linux systems, and even if it is, it would introduce compatibility issues in multi-OS environments.
+
+Thoughts on this?
+"""]]
diff --git a/doc/bugs/git_rename_detection_on_file_move/comment_3_57010bcaca42089b451ad8659a1e018e._comment b/doc/bugs/git_rename_detection_on_file_move/comment_3_57010bcaca42089b451ad8659a1e018e._comment
new file mode 100644
index 000000000..534723254
--- /dev/null
+++ b/doc/bugs/git_rename_detection_on_file_move/comment_3_57010bcaca42089b451ad8659a1e018e._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="http://joey.kitenet.net/"
+ nickname="joey"
+ subject="comment 3"
+ date="2011-03-16T03:03:19Z"
+ content="""
+Interesting, I had not heard of variable symlinks before. AFAIK linux does not have them.
+"""]]
diff --git a/doc/comments.mdwn b/doc/comments.mdwn
new file mode 100644
index 000000000..e19962b92
--- /dev/null
+++ b/doc/comments.mdwn
@@ -0,0 +1,9 @@
+[[!sidebar content="""
+[[!inline pages="comment_pending(*)" feedfile=pendingmoderation
+description="comments pending moderation" show=-1]]
+Comments in the [[!commentmoderation desc="moderation queue"]]:
+[[!pagecount pages="comment_pending(*)"]]
+"""]]
+
+Recent comments posted to this site:
+[[!inline pages="comment(*)" template="comment"]]
diff --git a/doc/forum/can_git-annex_replace_ddm__63__/comment_2_008554306dd082d7f543baf283510e92._comment b/doc/forum/can_git-annex_replace_ddm__63__/comment_2_008554306dd082d7f543baf283510e92._comment
new file mode 100644
index 000000000..ab114bb1c
--- /dev/null
+++ b/doc/forum/can_git-annex_replace_ddm__63__/comment_2_008554306dd082d7f543baf283510e92._comment
@@ -0,0 +1,19 @@
+[[!comment format=mdwn
+ username="http://dieter-be.myopenid.com/"
+ nickname="dieter"
+ subject="comment 2"
+ date="2011-02-16T21:32:04Z"
+ content="""
+thanks Joey,
+
+is it possible to run some git annex command that tells me, for a specific directory, which files are available in an other remote? (and which remote, and which filenames?)
+I guess I could run that, do my own policy thingie, and run `git annex get` for the files I want.
+
+For your podcast use case (and some of my use cases) don't you think git [annex] might actually be overkill? For example your podcasts use case, what value does git annex give over a simple rsync/rm script?
+such a script wouldn't even need a data store to store its state, unlike git. it seems simpler and cleaner to me.
+
+for the mpd thing, check http://alip.github.com/mpdcron/ (bad project name, it's a plugin based \"event handler\")
+you should be able to write a simple plugin for mpdcron that does what you want (or even interface with mpd yourself from perl/python/.. to use its idle mode to get events)
+
+Dieter
+"""]]
diff --git a/doc/forum/can_git-annex_replace_ddm__63__/comment_3_4c69097fe2ee81359655e59a03a9bb8d._comment b/doc/forum/can_git-annex_replace_ddm__63__/comment_3_4c69097fe2ee81359655e59a03a9bb8d._comment
new file mode 100644
index 000000000..5cdd6aa0c
--- /dev/null
+++ b/doc/forum/can_git-annex_replace_ddm__63__/comment_3_4c69097fe2ee81359655e59a03a9bb8d._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="http://joey.kitenet.net/"
+ nickname="joey"
+ subject="comment 3"
+ date="2011-03-16T03:01:17Z"
+ content="""
+Whups, the comment above got stuck in moderation queue for 27 days. I will try to check that more frequently.
+
+In the meantime, I've implemented \"git annex whereis\" -- enjoy!
+
+I find keeping my podcasts in the annex useful because it allows me to download individual episodes or poscasts easily when low bandwidth is available (ie, dialup), or over sneakernet. And generally keeps everything organised.
+"""]]
diff --git a/doc/forum/hashing_objects_directories/comment_2_504c96959c779176f991f4125ea22009._comment b/doc/forum/hashing_objects_directories/comment_2_504c96959c779176f991f4125ea22009._comment
new file mode 100644
index 000000000..64f1e16b5
--- /dev/null
+++ b/doc/forum/hashing_objects_directories/comment_2_504c96959c779176f991f4125ea22009._comment
@@ -0,0 +1,14 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U"
+ nickname="Richard"
+ subject="comment 2"
+ date="2011-03-15T13:52:16Z"
+ content="""
+Can't you just use an underscore instead of a colon?
+
+Would it be feasible to split directories dynamically? I.e. start with SHA1_123456789abcdef0123456789abcdef012345678/SHA1_123456789abcdef0123456789abcdef012345678 and, at a certain cut-off point, switch to shorter directory names? This could even be done per subdirectory and based purely on a locally-configured number. Different annexes on different file systems or with different file subsets might even have different thresholds. This would ensure scale while not forcing you to segment from the start. Also, while segmenting with longer directory names means a flatter tree, segments longer than four characters might not make too much sense. Segmenting too often could lead to some directories becoming too populated, bringing us back to the dynamic segmentation.
+
+All of the above would make merging annexes by hand a _lot_ harder, but I don't know if this is a valid use case. And if all else fails, one could merge everything with the unsegemented directory names and start again from there.
+
+-- RichiH
+"""]]
diff --git a/doc/forum/hashing_objects_directories/comment_3_9134bde0a13aac0b6a4e5ebabd7f22e8._comment b/doc/forum/hashing_objects_directories/comment_3_9134bde0a13aac0b6a4e5ebabd7f22e8._comment
new file mode 100644
index 000000000..51deb2f95
--- /dev/null
+++ b/doc/forum/hashing_objects_directories/comment_3_9134bde0a13aac0b6a4e5ebabd7f22e8._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="http://joey.kitenet.net/"
+ nickname="joey"
+ subject="comment 3"
+ date="2011-03-16T03:13:39Z"
+ content="""
+It is unfortunatly not possible to do system-dependant hashing, so long as git-annex stores symlinks to the content in git.
+
+It might be possible to start without hashing, and add hashing for new files after a cutoff point. It would add complexity.
+
+I'm currently looking at a 2 character hash directory segment, based on an md5sum of the key, which splits it into 1024 buckets. git uses just 256 buckets for its object directory, but then its objects tend to get packed away. I sorta hope that one level is enough, but guess I could go to 2 levels (objects/ab/cd/key), which would provide 1048576 buckets, probably plenty, as if you are storing more than a million files, you are probably using a modern enough system to have a filesystem that doesn't need hashing.
+"""]]
diff --git a/doc/index.mdwn b/doc/index.mdwn
index 47682349f..4f967b71f 100644
--- a/doc/index.mdwn
+++ b/doc/index.mdwn
@@ -10,6 +10,7 @@ To get a feel for it, see the [[walkthrough]].
* [[bugs]]
* [[todo]]
* [[forum]]
+* [[comments]]
* [[contact]]
[[News]]:
diff --git a/doc/todo/object_dir_reorg_v2.mdwn b/doc/todo/object_dir_reorg_v2.mdwn
index db1885699..1c2d2f21b 100644
--- a/doc/todo/object_dir_reorg_v2.mdwn
+++ b/doc/todo/object_dir_reorg_v2.mdwn
@@ -6,6 +6,8 @@ all users, so this should be the *last* reorg in the forseeable future.
2. Add hashing, since some filesystems do suck (like er, fat at least :)
[[forum/hashing_objects_directories]]
+ (Also, may as well hash .git-annex/* while at it -- that's what
+ really gets big.)
3. Add filesize metadata for [[bugs/free_space_checking]]. (Currently only
present in WORM, and in an ad-hoc way.)
diff --git a/doc/todo/object_dir_reorg_v2/comment_1_ba03333dc76ff49eccaba375e68cb525._comment b/doc/todo/object_dir_reorg_v2/comment_1_ba03333dc76ff49eccaba375e68cb525._comment
new file mode 100644
index 000000000..261c2a51f
--- /dev/null
+++ b/doc/todo/object_dir_reorg_v2/comment_1_ba03333dc76ff49eccaba375e68cb525._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U"
+ nickname="Richard"
+ subject="comment 1"
+ date="2011-03-16T01:16:48Z"
+ content="""
+If you support generic meta-data, keep in mind that you will need to do conflict resolution. Timestamps may not be synched across all systems, so keeping a log of old metadata could be used, sorting by history and using the latest. Which leaves the situation of two incompatible changes. This would probably mean manual conflict resolution. You will probably have thought of this already, but I still wanted to make sure this is recorded. -- RichiH
+"""]]
diff --git a/doc/todo/object_dir_reorg_v2/comment_2_81276ac309959dc741bc90101c213ab7._comment b/doc/todo/object_dir_reorg_v2/comment_2_81276ac309959dc741bc90101c213ab7._comment
new file mode 100644
index 000000000..9785f1989
--- /dev/null
+++ b/doc/todo/object_dir_reorg_v2/comment_2_81276ac309959dc741bc90101c213ab7._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U"
+ nickname="Richard"
+ subject="comment 2"
+ date="2011-03-16T01:19:25Z"
+ content="""
+Hmm, I added quite a few comments at work, but they are stuck in moderation. Maybe I forgot to log in before adding them. I am surprised this one appeared immediately. -- RichiH
+"""]]
diff --git a/doc/todo/object_dir_reorg_v2/comment_3_79bdf9c51dec9f52372ce95b53233bb2._comment b/doc/todo/object_dir_reorg_v2/comment_3_79bdf9c51dec9f52372ce95b53233bb2._comment
new file mode 100644
index 000000000..886941be7
--- /dev/null
+++ b/doc/todo/object_dir_reorg_v2/comment_3_79bdf9c51dec9f52372ce95b53233bb2._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U"
+ nickname="Richard"
+ subject="comment 1"
+ date="2011-03-15T14:08:41Z"
+ content="""
+What is the potential time-frame for this change? As I am not using git-annex for production yet, I can see myself waiting to avoid any potential hassle.
+
+Supporting generic metadata seems like a great idea. Though if you are going this path, wouldn't it make sense to avoid metastore for mtime etc and support this natively without outside dependencies?
+
+-- RichiH
+"""]]
diff --git a/doc/todo/object_dir_reorg_v2/comment_4_93aada9b1680fed56cc6f0f7c3aca5e5._comment b/doc/todo/object_dir_reorg_v2/comment_4_93aada9b1680fed56cc6f0f7c3aca5e5._comment
new file mode 100644
index 000000000..475359abb
--- /dev/null
+++ b/doc/todo/object_dir_reorg_v2/comment_4_93aada9b1680fed56cc6f0f7c3aca5e5._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="http://joey.kitenet.net/"
+ nickname="joey"
+ subject="comment 4"
+ date="2011-03-16T03:22:45Z"
+ content="""
+Well, I spent a few hours playing this evening in the 'reorg' branch in git. It seems to be shaping up pretty well; type-based refactoring in haskell makes these kind of big systematic changes a matter of editing until it compiles. And it compiles and test suite passes. But, so far I've only covered 1. 3. and 4. on the list, and have yet to deal with upgrades.
+
+I'd recommend you not wait before using git-annex. I am committed to provide upgradability between annexes created with all versions of git-annex, going forward. This is important because we can have offline archival drives that sit unused for years. Git-annex will upgrade a repository to current standard the first time it sees it, and I hope the upgrade will be pretty smooth. It was not bad for the annex.version 0 to 1 upgrade earlier. The only annoyance with upgrades is that it will result in some big commits to git, as every symlink in the repo gets changed, and log files get moved to new names.
+
+(The metadata being stored with keys is data that a particular backend can use, and is static to a given key, so there are no merge issues (and it won't be used to preserve mtimes, etc).)
+"""]]