summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2015-06-16 18:56:20 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2015-06-16 18:56:20 -0400
commitbf609dc9bf422363cd1b880619b83b69ce7b4734 (patch)
tree9c7622118f45119057e64046403d64c81516928b
parenteeee3a66e6711171e71624e4f761b8936cbd5ff1 (diff)
parent0e1c2100529da0643050a2119f27aba14008570b (diff)
Merge branch 'master' of ssh://git-annex.branchable.com
-rw-r--r--doc/forum/Handling_a_large_number_of_files.mdwn3
-rw-r--r--doc/todo/S3_fsck_support/comment_2_7a1ce64d362b8f75adf22709771a7787._comment11
-rw-r--r--doc/todo/git-hook_to_sanity-check_git-annex_branch_pushes.mdwn9
3 files changed, 21 insertions, 2 deletions
diff --git a/doc/forum/Handling_a_large_number_of_files.mdwn b/doc/forum/Handling_a_large_number_of_files.mdwn
new file mode 100644
index 000000000..ccf8360a5
--- /dev/null
+++ b/doc/forum/Handling_a_large_number_of_files.mdwn
@@ -0,0 +1,3 @@
+I have noticed performance getting really slow when adding files (git annex add . ) to a directory already containing several hundred thousand files. When using git annex, is it more recommended to split large numbers of files into multiple directories containing fewer files? Is there a particular recommended way of handling large numbers of files (say getting into the millions) in git annex?
+
+Thanks
diff --git a/doc/todo/S3_fsck_support/comment_2_7a1ce64d362b8f75adf22709771a7787._comment b/doc/todo/S3_fsck_support/comment_2_7a1ce64d362b8f75adf22709771a7787._comment
new file mode 100644
index 000000000..a27ed8e56
--- /dev/null
+++ b/doc/todo/S3_fsck_support/comment_2_7a1ce64d362b8f75adf22709771a7787._comment
@@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="anarcat"
+ subject="comment 2"
+ date="2015-06-16T20:10:50Z"
+ content="""
+understood: i thought `-f` was `--from`... hence my confusion.
+
+as for `remoteFsck`, i guess what i am saying is exactly that: there *does* seem to be a way to do a remote checksum of the file *without* downloading it. it seems to be a critical advantage over having to download the whole repository to check it... maybe `--fast` could use that technique and `non--fast` would download?
+
+as for the on-wire MD5 stuff, that does seem to be overkill...
+"""]]
diff --git a/doc/todo/git-hook_to_sanity-check_git-annex_branch_pushes.mdwn b/doc/todo/git-hook_to_sanity-check_git-annex_branch_pushes.mdwn
index 2297c4aca..7eb02c3ff 100644
--- a/doc/todo/git-hook_to_sanity-check_git-annex_branch_pushes.mdwn
+++ b/doc/todo/git-hook_to_sanity-check_git-annex_branch_pushes.mdwn
@@ -8,11 +8,11 @@ hook to do this. --[[Joey]]
There are two levels of checking it seems such a command could do:
-1. Only allow certian files to be changed. For example, maye clients are only
+1. Only allow certain files to be changed. For example, maybe clients are only
expected to change location tracking files, and the activity.log
file, but not others like trust.log.
-2. Only allow moidiciations of data about a specific UUID. The UUID
+2. Only allow modifications of data about a specific UUID. The UUID
would be provided to the command (and could be determined based on a
per-client ssh key or etc).
@@ -34,3 +34,8 @@ This might be too limiting for some situations:
changes to remote.log, which the first level of checking would not allow.
And, it would add another UUID, which the second level of checking would
need to be configured to allow.
+
+Python implementation
+---------------------
+
+I started doing an implementation of this in Python here. For technical reasons the git repo is not publicly available, but here's a [dump](http://paste.debian.net/232563/) of the code. I went through what seems to be a rather convoluted process with libgit there because I wanted to have some proper unit tests and generating git commands by hand in a shell script is rather painful.Also, it currently adopts a "blocking" approach, ie. it blocks known problems, but maybe it should be based on an "allow" approach, that is: only allow certain things to go through. So far it only forbids removals and changes to trust.log. A bunch of stuff is still missing like parameters (to allow changing the list of protected files) and checking the log tracking info. Feedback welcome. --[[anarcat]]