summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2011-03-16 00:08:02 -0400
committerGravatar Joey Hess <joey@kitenet.net>2011-03-16 00:08:02 -0400
commit09a7689bc30faaf938a0b32a417d38ac093a6f7a (patch)
treed102f43eba76579e2e1366b76d8757ecc3a4c8d0
parentdd5448eb075c3774aa173cb9f2e4344ce62b3e13 (diff)
update and bug closures for v2 layout
-rw-r--r--debian/changelog7
-rw-r--r--doc/bugs/fat_support.mdwn3
-rw-r--r--doc/forum/hashing_objects_directories.mdwn8
-rw-r--r--doc/internals.mdwn10
4 files changed, 25 insertions, 3 deletions
diff --git a/debian/changelog b/debian/changelog
index 0d1832374..ac7c854ff 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,6 +1,13 @@
git-annex (0.24) UNRELEASED; urgency=low
* Reorganized annexed object store. annex.version=2
+ * Colons are now avoided in filenames, so bare clones of git repos
+ can be put on USB thumb drives formatted with vFAT or similar
+ filesystems.
+ * Added two levels of hashing to object directory and .git-annex logs,
+ to improve scalability with enormous numbers of annexed
+ objects. (With one hundred million annexed objects, each
+ directory would contain fewer than 1024 files.)
* The setkey, fromkey, and dropkey subcommands have changed how
the key is specified. --backend is no longer used with these.
* Add Suggests on graphviz. Closes: #618039
diff --git a/doc/bugs/fat_support.mdwn b/doc/bugs/fat_support.mdwn
index 2c6c97385..60633c29b 100644
--- a/doc/bugs/fat_support.mdwn
+++ b/doc/bugs/fat_support.mdwn
@@ -10,3 +10,6 @@ be VFAT formatted:
[[!tag wishlist]]
+[[Done]]; in annex.version 2 repos, colons are entirely avoided in
+filenames. So a bare git clone can be put on VFAT, and git-annex
+used to move stuff --to and --from it, for sneakernet.
diff --git a/doc/forum/hashing_objects_directories.mdwn b/doc/forum/hashing_objects_directories.mdwn
index 715e972ca..5b7708fb5 100644
--- a/doc/forum/hashing_objects_directories.mdwn
+++ b/doc/forum/hashing_objects_directories.mdwn
@@ -17,3 +17,11 @@ or anything in between to a paranoid
Also the use of a colon specifically breaks FAT32 ([[bugs/fat_support]]), must it be a colon or could an extra directory be used? i.e. `.git/annex/objects/SHA1/*/...`
`git annex init` could also create all but the last level directory on initialization. I'm thinking `SHA1/1/1, SHA1/1/2, ..., SHA256/f/f, ..., URL/f/f, ..., WORM/f/f`
+
+> This is done now with a 2-level hash. It also hashes .git-annex/ log
+> files which were the worse problem really. Scales to hundreds of millions
+> of files with each dir having 1024 or fewer contents. Example:
+>
+> `me -> .git/annex/objects/71/9t/WORM-s3-m1300247299--me/WORM-s3-m1300247299--me`
+>
+> --[[Joey]]
diff --git a/doc/internals.mdwn b/doc/internals.mdwn
index 3f680dd8f..a133320b4 100644
--- a/doc/internals.mdwn
+++ b/doc/internals.mdwn
@@ -2,12 +2,15 @@ In the world of git, we're not scared about internal implementation
details, and sometimes we like to dive in and tweak things by hand. Here's
some documentation to that end.
-## `.git/annex/objects/*/*`
+## `.git/annex/objects/aa/bb/*/*`
This is where locally available file contents are actually stored.
Files added to the annex get a symlink checked into git that points
to the file content.
+First there are two levels of directories used for hashing, to prevent
+too many things ending up in any one directory.
+
Each subdirectory has the name of a key in one of the
[[key-value_backends|backends]]. The file inside also has the name of the key.
This two-level structure is used because it allows the write bit to be removed
@@ -41,10 +44,11 @@ Example:
e605dca6-446a-11e0-8b2a-002170d25c55 1
26339d22-446b-11e0-9101-002170d25c55 ?
-## `.git-annex/*.log`
+## `.git-annex/aa/bb/*.log`
The remainder of the log files record [[location_tracking]] information
-for file contents. The name of the key is the filename, and the content
+for file contents. Again these are placed in two levels of subdirectories
+for hashing. The name of the key is the filename, and the content
consists of a timestamp, either 1 (present) or 0 (not present), and
the UUID of the repository that has or lacks the file content.