summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/bugs/fat_support.mdwn3
-rw-r--r--doc/bugs/free_space_checking.mdwn10
-rw-r--r--doc/forum/hashing_objects_directories.mdwn8
-rw-r--r--doc/git-annex.mdwn25
-rw-r--r--doc/internals.mdwn10
-rw-r--r--doc/news/version_0.20.mdwn12
-rw-r--r--doc/news/version_0.20110316.mdwn24
-rw-r--r--doc/news/version_0.24.mdwn6
-rw-r--r--doc/todo/object_dir_reorg_v2.mdwn4
-rw-r--r--doc/upgrades.mdwn67
-rw-r--r--doc/walkthrough/modifying_annexed_files.mdwn2
-rw-r--r--doc/walkthrough/moving_file_content_between_repositories.mdwn2
-rw-r--r--doc/walkthrough/unused_data.mdwn4
-rw-r--r--doc/walkthrough/using_ssh_remotes.mdwn2
-rw-r--r--doc/walkthrough/using_the_URL_backend.mdwn2
15 files changed, 146 insertions, 35 deletions
diff --git a/doc/bugs/fat_support.mdwn b/doc/bugs/fat_support.mdwn
index 2c6c97385..60633c29b 100644
--- a/doc/bugs/fat_support.mdwn
+++ b/doc/bugs/fat_support.mdwn
@@ -10,3 +10,6 @@ be VFAT formatted:
[[!tag wishlist]]
+[[Done]]; in annex.version 2 repos, colons are entirely avoided in
+filenames. So a bare git clone can be put on VFAT, and git-annex
+used to move stuff --to and --from it, for sneakernet.
diff --git a/doc/bugs/free_space_checking.mdwn b/doc/bugs/free_space_checking.mdwn
index 34528a7b3..eaa3294d6 100644
--- a/doc/bugs/free_space_checking.mdwn
+++ b/doc/bugs/free_space_checking.mdwn
@@ -6,3 +6,13 @@ file around.
* And, need a way to tell the size of a file before copying it from
a remote, to check local disk space.
+
+ As of annex.version 2, this metadata can be available for any type
+ of backend. Newly added files will always have file size metadata,
+ while files that used a SHA backend and were added before the upgrade
+ won't.
+
+ So, need a migration process from eg SHA1 to SHA1+filesize. It will
+ find files that lack size info, and rename their keys to add the size
+ info. Users with old repos can run this on them, to get the missing
+ info recorded.
diff --git a/doc/forum/hashing_objects_directories.mdwn b/doc/forum/hashing_objects_directories.mdwn
index 715e972ca..5b7708fb5 100644
--- a/doc/forum/hashing_objects_directories.mdwn
+++ b/doc/forum/hashing_objects_directories.mdwn
@@ -17,3 +17,11 @@ or anything in between to a paranoid
Also the use of a colon specifically breaks FAT32 ([[bugs/fat_support]]), must it be a colon or could an extra directory be used? i.e. `.git/annex/objects/SHA1/*/...`
`git annex init` could also create all but the last level directory on initialization. I'm thinking `SHA1/1/1, SHA1/1/2, ..., SHA256/f/f, ..., URL/f/f, ..., WORM/f/f`
+
+> This is done now with a 2-level hash. It also hashes .git-annex/ log
+> files which were the worse problem really. Scales to hundreds of millions
+> of files with each dir having 1024 or fewer contents. Example:
+>
+> `me -> .git/annex/objects/71/9t/WORM-s3-m1300247299--me/WORM-s3-m1300247299--me`
+>
+> --[[Joey]]
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index 4998a6491..ee4019068 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -234,11 +234,11 @@ Many git-annex commands will stage changes for later `git commit` by you.
This can be used to manually set up a file to link to a specified key
in the key-value backend. How you determine an existing key in the backend
- varies. For the URL backend, the key is just a URL to the content.
+ varies. For the URL backend, the key is based on an URL to the content.
Example:
- git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile
+ git annex fromkey --key=URL--http://www.archive.org/somefile somefile
* dropkey [key ...]
@@ -248,24 +248,24 @@ Many git-annex commands will stage changes for later `git commit` by you.
This can be used to drop content for arbitrary keys, which do not need
to have a file in the git repository pointing at them.
- A backend will typically need to be specified with --backend. If none
- is specified, the first configured backend is used.
-
Example:
- git annex dropkey --backend=SHA1 7da006579dd64330eb2456001fd01948430572f2
+ git annex dropkey SHA1-s10-7da006579dd64330eb2456001fd01948430572f2
* setkey file
This plumbing-level command sets the annexed data for a key to the content of
the specified file, and then removes the file.
- A backend will typically need to be specified with --backend. If none
- is specified, the first configured backend is used.
-
Example:
- git annex setkey --backend=WORM --key=1287765018:3 /tmp/file
+ git annex setkey --key=WORM-s3-m1287765018--file /tmp/file
+
+* upgrade
+
+ Upgrades the repository to current layout. Upgrades are done automatically
+ whenever a newer git annex encounters an old repository; this command
+ allows explcitly starting an upgrade.
# OPTIONS
@@ -302,7 +302,10 @@ Many git-annex commands will stage changes for later `git commit` by you.
* --backend=name
- Specifies which key-value backend to use.
+ Specifies which key-value backend to use. This can be used when
+ adding a file to the annex, or migrating a file. Once files
+ are in the annex, their backend is known and this option is not
+ necessary.
* --key=name
diff --git a/doc/internals.mdwn b/doc/internals.mdwn
index 3f680dd8f..a133320b4 100644
--- a/doc/internals.mdwn
+++ b/doc/internals.mdwn
@@ -2,12 +2,15 @@ In the world of git, we're not scared about internal implementation
details, and sometimes we like to dive in and tweak things by hand. Here's
some documentation to that end.
-## `.git/annex/objects/*/*`
+## `.git/annex/objects/aa/bb/*/*`
This is where locally available file contents are actually stored.
Files added to the annex get a symlink checked into git that points
to the file content.
+First there are two levels of directories used for hashing, to prevent
+too many things ending up in any one directory.
+
Each subdirectory has the name of a key in one of the
[[key-value_backends|backends]]. The file inside also has the name of the key.
This two-level structure is used because it allows the write bit to be removed
@@ -41,10 +44,11 @@ Example:
e605dca6-446a-11e0-8b2a-002170d25c55 1
26339d22-446b-11e0-9101-002170d25c55 ?
-## `.git-annex/*.log`
+## `.git-annex/aa/bb/*.log`
The remainder of the log files record [[location_tracking]] information
-for file contents. The name of the key is the filename, and the content
+for file contents. Again these are placed in two levels of subdirectories
+for hashing. The name of the key is the filename, and the content
consists of a timestamp, either 1 (present) or 0 (not present), and
the UUID of the repository that has or lacks the file content.
diff --git a/doc/news/version_0.20.mdwn b/doc/news/version_0.20.mdwn
deleted file mode 100644
index 9b95b652e..000000000
--- a/doc/news/version_0.20.mdwn
+++ /dev/null
@@ -1,12 +0,0 @@
-git-annex 0.20 released with [[!toggle text="these changes"]]
-[[!toggleable text="""
- * Preserve specified file ordering when instructed to act on multiple
- files or directories. For example, "git annex get a b" will now always
- get "a" before "b". Previously it could operate in either order.
- * unannex: Commit staged changes at end, to avoid some confusing behavior
- with the pre-commit hook, which would see some types of commits after
- an unannex as checking in of an unlocked file.
- * map: New subcommand that uses graphviz to display a nice map of
- the git repository network.
- * Deal with the mtl/monads-fd conflict.
- * configure: Check for sha1sum."""]] \ No newline at end of file
diff --git a/doc/news/version_0.20110316.mdwn b/doc/news/version_0.20110316.mdwn
new file mode 100644
index 000000000..5654c15bc
--- /dev/null
+++ b/doc/news/version_0.20110316.mdwn
@@ -0,0 +1,24 @@
+News for git-annex 0.20110316:
+
+ This version reorganises the layout of git-annex's files in your repository.
+ There is an upgrade process to convert a repository from the old git-annex
+ to this version. While git-annex will attempt to transparently handle
+ upgrades, you may want to drive the upgrade process by hand.
+ See <http://git-annex.branchable.com/upgrades/> or
+ /usr/share/doc/git-annex/html/upgrades.html
+
+git-annex 0.20110316 released with [[!toggle text="these changes"]]
+[[!toggleable text="""
+ * New repository format, annex.version=2.
+ * The first time git-annex is run in an old format repository, it
+ will automatically upgrade it to the new format, staging all
+ necessary changes to git. Also added a "git annex upgrade" command.
+ * Colons are now avoided in filenames, so bare clones of git repos
+ can be put on USB thumb drives formatted with vFAT or similar
+ filesystems.
+ * Added two levels of hashing to object directory and .git-annex logs,
+ to improve scalability with enormous numbers of annexed
+ objects. (With one hundred million annexed objects, each
+ directory would contain fewer than 1024 files.)
+ * The setkey, fromkey, and dropkey subcommands have changed how
+ the key is specified. --backend is no longer used with these."""]] \ No newline at end of file
diff --git a/doc/news/version_0.24.mdwn b/doc/news/version_0.24.mdwn
index 2d94a0e9b..81b013a26 100644
--- a/doc/news/version_0.24.mdwn
+++ b/doc/news/version_0.24.mdwn
@@ -1,6 +1,6 @@
-Branched the 0.24 series, which will be maintained for a while to
-support v1 git-annex repos, while main development moves to the 0.2011
-series, with v2 git-annex repos.
+Branched the 0.24 series, which will be maintained for a while (in the
+stable branch in git) to support v1 git-annex repos, while main development
+moves to the 0.2011 series, with v2 git-annex repos.
git-annex 0.24 released with [[!toggle text="these changes"]]
[[!toggleable text="""
diff --git a/doc/todo/object_dir_reorg_v2.mdwn b/doc/todo/object_dir_reorg_v2.mdwn
index 1c2d2f21b..49666ddc7 100644
--- a/doc/todo/object_dir_reorg_v2.mdwn
+++ b/doc/todo/object_dir_reorg_v2.mdwn
@@ -19,3 +19,7 @@ all users, so this should be the *last* reorg in the forseeable future.
(Probably everything after ",k" should be part of the key, even if it
contains the "," separator character. Otherwise an escaping mechanism
would be needed.)
+
+[[done]] now!
+
+Although [[bugs/free_space_checking]] is not quite there --[[Joey]]
diff --git a/doc/upgrades.mdwn b/doc/upgrades.mdwn
new file mode 100644
index 000000000..1371dc033
--- /dev/null
+++ b/doc/upgrades.mdwn
@@ -0,0 +1,67 @@
+Occasionally improvments are made to how git-annex stores its data,
+that require an upgrade process to convert repositories made with an older
+version to be used by a newer version. It's annoying, it should happen
+rarely, but sometimes, it's worth it.
+
+There's a committment that git-annex will always support upgrades from all
+past versions. After all, you may have offline drives from an earlier
+git-annex, and might want to use them with a newer git-annex.
+
+## Upgrade process
+
+git-annex will automatically notice if it is run in a repository that
+needs an upgrade, and perform the upgrade before running whatever it
+was asked to do. Or you can use the "git annex upgrade" command to
+explicitly do an upgrade. The upgrade can tend to take a while,
+if you have a lot of files.
+
+Each clone of a repository should be individually upgraded.
+Until a repository's remotes have been upgraded, git-annex
+may refuse to communicate with them.
+
+Generally, start by upgrading one repository, and then you can commit
+the changes git-annex staged during upgrade, and push them out to other
+repositories. And then upgrade those other repositories. Doing it this
+way avoids git-annex doing some duplicate work during the upgrade.
+
+The upgrade process is guaranteed to be conflict-free. Unless you
+already have git conflicts in your repository or between repositories.
+Upgrading a repository with conflicts is not recommended; resolve the
+conflicts first before upgrading git-annex.
+
+Example upgrade process:
+
+ cd localrepo
+ git pull
+ git annex upgrade
+ (Upgrading object directory layout v1 to v2...)
+ git commit -m "upgrade v1 to v2"
+ git push
+
+ ssh remote
+ cd remoterepo
+ git pull
+ git annex upgrade
+ ...
+
+## Upgrade events, so far
+
+### v1 -> v2 (git-annex version 0.23 to version 0.20110316)
+
+Involved adding hashing to .git/annex/ and changing the names of all keys.
+Symlinks changed.
+
+Also, hashing was added to location log files in .git-annex/.
+And .gitattributes needed to have another line added to it.
+
+Handled transparently.
+
+### v0 -> v1 (git-annex version 0.03 to version 0.04)
+
+Involved a reogranisation of the layout of .git/annex/. Symlinks changed.
+
+Handled more or less transparently, although git-annex was just 2 weeks
+old at the time, and had few users other than Joey.
+
+This upgrade is belived to still be supported, but has not been tested
+lately.
diff --git a/doc/walkthrough/modifying_annexed_files.mdwn b/doc/walkthrough/modifying_annexed_files.mdwn
index 3ad4e82ea..f75b73a24 100644
--- a/doc/walkthrough/modifying_annexed_files.mdwn
+++ b/doc/walkthrough/modifying_annexed_files.mdwn
@@ -27,7 +27,7 @@ and this symlink is what gets committed to git in the end.
add my_cool_big_file ok
[master 64cda67] changed an annexed file
2 files changed, 2 insertions(+), 1 deletions(-)
- create mode 100644 .git-annex/WORM:1289672605:30:file.log
+ create mode 100644 .git-annex/WORM-s30-m1289672605--file.log
There is one problem with using `git commit` like this: Git wants to first
stage the entire contents of the file in its index. That can be slow for
diff --git a/doc/walkthrough/moving_file_content_between_repositories.mdwn b/doc/walkthrough/moving_file_content_between_repositories.mdwn
index d7150f109..6b3e3f4e8 100644
--- a/doc/walkthrough/moving_file_content_between_repositories.mdwn
+++ b/doc/walkthrough/moving_file_content_between_repositories.mdwn
@@ -9,5 +9,5 @@ makes it very easy.
move my_cool_big_file (moving to usbdrive...) ok
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
move video/hackity_hack_and_kaxxt.mov (moving from fileserver...)
- WORM:1274316523:86050597:hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02
+ WORM-s86050597-m1274316523--hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02
ok
diff --git a/doc/walkthrough/unused_data.mdwn b/doc/walkthrough/unused_data.mdwn
index 69a581fe1..9be32577c 100644
--- a/doc/walkthrough/unused_data.mdwn
+++ b/doc/walkthrough/unused_data.mdwn
@@ -12,8 +12,8 @@ eliminate it to save space.
unused (checking for unused data...)
Some annexed data is no longer pointed to by any files in the repository.
NUMBER KEY
- 1 WORM:1289672605:3:file
- 2 WORM:1289672605:14:file
+ 1 WORM-s3-m1289672605--file
+ 2 WORM-s14-m1289672605--file
(To see where data was previously used, try: git log --stat -S'KEY')
(To remove unwanted data: git-annex dropunused NUMBER)
ok
diff --git a/doc/walkthrough/using_ssh_remotes.mdwn b/doc/walkthrough/using_ssh_remotes.mdwn
index 6af9e1f47..4c2f830de 100644
--- a/doc/walkthrough/using_ssh_remotes.mdwn
+++ b/doc/walkthrough/using_ssh_remotes.mdwn
@@ -13,7 +13,7 @@ Now you can get files and they will be transferred (using `rsync` via `ssh`):
# git annex get my_cool_big_file
get my_cool_big_file (getting UUID for origin...) (copying from origin...)
- WORM:1285650548:2159:my_cool_big_file 100% 2159 2.1KB/s 00:00
+ WORM-s2159-m1285650548--my_cool_big_file 100% 2159 2.1KB/s 00:00
ok
When you drop files, git-annex will ssh over to the remote and make
diff --git a/doc/walkthrough/using_the_URL_backend.mdwn b/doc/walkthrough/using_the_URL_backend.mdwn
index fe79a6be2..585fd0668 100644
--- a/doc/walkthrough/using_the_URL_backend.mdwn
+++ b/doc/walkthrough/using_the_URL_backend.mdwn
@@ -5,7 +5,7 @@ Another handy backend is the URL backend, which can fetch file's content
from remote URLs. Here's how to set up some files in your repository
that use this backend:
- # git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile
+ # git annex fromkey --key=URL--http://www.archive.org/somefile somefile
fromkey somefile ok
# git commit -m "added a file from the Internet Archive"