diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/bugs/fat_support.mdwn | 3 | ||||
-rw-r--r-- | doc/bugs/free_space_checking.mdwn | 10 | ||||
-rw-r--r-- | doc/forum/hashing_objects_directories.mdwn | 8 | ||||
-rw-r--r-- | doc/git-annex.mdwn | 25 | ||||
-rw-r--r-- | doc/internals.mdwn | 10 | ||||
-rw-r--r-- | doc/news/version_0.20.mdwn | 12 | ||||
-rw-r--r-- | doc/news/version_0.20110316.mdwn | 24 | ||||
-rw-r--r-- | doc/news/version_0.24.mdwn | 6 | ||||
-rw-r--r-- | doc/todo/object_dir_reorg_v2.mdwn | 4 | ||||
-rw-r--r-- | doc/upgrades.mdwn | 67 | ||||
-rw-r--r-- | doc/walkthrough/modifying_annexed_files.mdwn | 2 | ||||
-rw-r--r-- | doc/walkthrough/moving_file_content_between_repositories.mdwn | 2 | ||||
-rw-r--r-- | doc/walkthrough/unused_data.mdwn | 4 | ||||
-rw-r--r-- | doc/walkthrough/using_ssh_remotes.mdwn | 2 | ||||
-rw-r--r-- | doc/walkthrough/using_the_URL_backend.mdwn | 2 |
15 files changed, 146 insertions, 35 deletions
diff --git a/doc/bugs/fat_support.mdwn b/doc/bugs/fat_support.mdwn index 2c6c97385..60633c29b 100644 --- a/doc/bugs/fat_support.mdwn +++ b/doc/bugs/fat_support.mdwn @@ -10,3 +10,6 @@ be VFAT formatted: [[!tag wishlist]] +[[Done]]; in annex.version 2 repos, colons are entirely avoided in +filenames. So a bare git clone can be put on VFAT, and git-annex +used to move stuff --to and --from it, for sneakernet. diff --git a/doc/bugs/free_space_checking.mdwn b/doc/bugs/free_space_checking.mdwn index 34528a7b3..eaa3294d6 100644 --- a/doc/bugs/free_space_checking.mdwn +++ b/doc/bugs/free_space_checking.mdwn @@ -6,3 +6,13 @@ file around. * And, need a way to tell the size of a file before copying it from a remote, to check local disk space. + + As of annex.version 2, this metadata can be available for any type + of backend. Newly added files will always have file size metadata, + while files that used a SHA backend and were added before the upgrade + won't. + + So, need a migration process from eg SHA1 to SHA1+filesize. It will + find files that lack size info, and rename their keys to add the size + info. Users with old repos can run this on them, to get the missing + info recorded. diff --git a/doc/forum/hashing_objects_directories.mdwn b/doc/forum/hashing_objects_directories.mdwn index 715e972ca..5b7708fb5 100644 --- a/doc/forum/hashing_objects_directories.mdwn +++ b/doc/forum/hashing_objects_directories.mdwn @@ -17,3 +17,11 @@ or anything in between to a paranoid Also the use of a colon specifically breaks FAT32 ([[bugs/fat_support]]), must it be a colon or could an extra directory be used? i.e. `.git/annex/objects/SHA1/*/...` `git annex init` could also create all but the last level directory on initialization. I'm thinking `SHA1/1/1, SHA1/1/2, ..., SHA256/f/f, ..., URL/f/f, ..., WORM/f/f` + +> This is done now with a 2-level hash. It also hashes .git-annex/ log +> files which were the worse problem really. Scales to hundreds of millions +> of files with each dir having 1024 or fewer contents. Example: +> +> `me -> .git/annex/objects/71/9t/WORM-s3-m1300247299--me/WORM-s3-m1300247299--me` +> +> --[[Joey]] diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 4998a6491..ee4019068 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -234,11 +234,11 @@ Many git-annex commands will stage changes for later `git commit` by you. This can be used to manually set up a file to link to a specified key in the key-value backend. How you determine an existing key in the backend - varies. For the URL backend, the key is just a URL to the content. + varies. For the URL backend, the key is based on an URL to the content. Example: - git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile + git annex fromkey --key=URL--http://www.archive.org/somefile somefile * dropkey [key ...] @@ -248,24 +248,24 @@ Many git-annex commands will stage changes for later `git commit` by you. This can be used to drop content for arbitrary keys, which do not need to have a file in the git repository pointing at them. - A backend will typically need to be specified with --backend. If none - is specified, the first configured backend is used. - Example: - git annex dropkey --backend=SHA1 7da006579dd64330eb2456001fd01948430572f2 + git annex dropkey SHA1-s10-7da006579dd64330eb2456001fd01948430572f2 * setkey file This plumbing-level command sets the annexed data for a key to the content of the specified file, and then removes the file. - A backend will typically need to be specified with --backend. If none - is specified, the first configured backend is used. - Example: - git annex setkey --backend=WORM --key=1287765018:3 /tmp/file + git annex setkey --key=WORM-s3-m1287765018--file /tmp/file + +* upgrade + + Upgrades the repository to current layout. Upgrades are done automatically + whenever a newer git annex encounters an old repository; this command + allows explcitly starting an upgrade. # OPTIONS @@ -302,7 +302,10 @@ Many git-annex commands will stage changes for later `git commit` by you. * --backend=name - Specifies which key-value backend to use. + Specifies which key-value backend to use. This can be used when + adding a file to the annex, or migrating a file. Once files + are in the annex, their backend is known and this option is not + necessary. * --key=name diff --git a/doc/internals.mdwn b/doc/internals.mdwn index 3f680dd8f..a133320b4 100644 --- a/doc/internals.mdwn +++ b/doc/internals.mdwn @@ -2,12 +2,15 @@ In the world of git, we're not scared about internal implementation details, and sometimes we like to dive in and tweak things by hand. Here's some documentation to that end. -## `.git/annex/objects/*/*` +## `.git/annex/objects/aa/bb/*/*` This is where locally available file contents are actually stored. Files added to the annex get a symlink checked into git that points to the file content. +First there are two levels of directories used for hashing, to prevent +too many things ending up in any one directory. + Each subdirectory has the name of a key in one of the [[key-value_backends|backends]]. The file inside also has the name of the key. This two-level structure is used because it allows the write bit to be removed @@ -41,10 +44,11 @@ Example: e605dca6-446a-11e0-8b2a-002170d25c55 1 26339d22-446b-11e0-9101-002170d25c55 ? -## `.git-annex/*.log` +## `.git-annex/aa/bb/*.log` The remainder of the log files record [[location_tracking]] information -for file contents. The name of the key is the filename, and the content +for file contents. Again these are placed in two levels of subdirectories +for hashing. The name of the key is the filename, and the content consists of a timestamp, either 1 (present) or 0 (not present), and the UUID of the repository that has or lacks the file content. diff --git a/doc/news/version_0.20.mdwn b/doc/news/version_0.20.mdwn deleted file mode 100644 index 9b95b652e..000000000 --- a/doc/news/version_0.20.mdwn +++ /dev/null @@ -1,12 +0,0 @@ -git-annex 0.20 released with [[!toggle text="these changes"]] -[[!toggleable text=""" - * Preserve specified file ordering when instructed to act on multiple - files or directories. For example, "git annex get a b" will now always - get "a" before "b". Previously it could operate in either order. - * unannex: Commit staged changes at end, to avoid some confusing behavior - with the pre-commit hook, which would see some types of commits after - an unannex as checking in of an unlocked file. - * map: New subcommand that uses graphviz to display a nice map of - the git repository network. - * Deal with the mtl/monads-fd conflict. - * configure: Check for sha1sum."""]]
\ No newline at end of file diff --git a/doc/news/version_0.20110316.mdwn b/doc/news/version_0.20110316.mdwn new file mode 100644 index 000000000..5654c15bc --- /dev/null +++ b/doc/news/version_0.20110316.mdwn @@ -0,0 +1,24 @@ +News for git-annex 0.20110316: + + This version reorganises the layout of git-annex's files in your repository. + There is an upgrade process to convert a repository from the old git-annex + to this version. While git-annex will attempt to transparently handle + upgrades, you may want to drive the upgrade process by hand. + See <http://git-annex.branchable.com/upgrades/> or + /usr/share/doc/git-annex/html/upgrades.html + +git-annex 0.20110316 released with [[!toggle text="these changes"]] +[[!toggleable text=""" + * New repository format, annex.version=2. + * The first time git-annex is run in an old format repository, it + will automatically upgrade it to the new format, staging all + necessary changes to git. Also added a "git annex upgrade" command. + * Colons are now avoided in filenames, so bare clones of git repos + can be put on USB thumb drives formatted with vFAT or similar + filesystems. + * Added two levels of hashing to object directory and .git-annex logs, + to improve scalability with enormous numbers of annexed + objects. (With one hundred million annexed objects, each + directory would contain fewer than 1024 files.) + * The setkey, fromkey, and dropkey subcommands have changed how + the key is specified. --backend is no longer used with these."""]]
\ No newline at end of file diff --git a/doc/news/version_0.24.mdwn b/doc/news/version_0.24.mdwn index 2d94a0e9b..81b013a26 100644 --- a/doc/news/version_0.24.mdwn +++ b/doc/news/version_0.24.mdwn @@ -1,6 +1,6 @@ -Branched the 0.24 series, which will be maintained for a while to -support v1 git-annex repos, while main development moves to the 0.2011 -series, with v2 git-annex repos. +Branched the 0.24 series, which will be maintained for a while (in the +stable branch in git) to support v1 git-annex repos, while main development +moves to the 0.2011 series, with v2 git-annex repos. git-annex 0.24 released with [[!toggle text="these changes"]] [[!toggleable text=""" diff --git a/doc/todo/object_dir_reorg_v2.mdwn b/doc/todo/object_dir_reorg_v2.mdwn index 1c2d2f21b..49666ddc7 100644 --- a/doc/todo/object_dir_reorg_v2.mdwn +++ b/doc/todo/object_dir_reorg_v2.mdwn @@ -19,3 +19,7 @@ all users, so this should be the *last* reorg in the forseeable future. (Probably everything after ",k" should be part of the key, even if it contains the "," separator character. Otherwise an escaping mechanism would be needed.) + +[[done]] now! + +Although [[bugs/free_space_checking]] is not quite there --[[Joey]] diff --git a/doc/upgrades.mdwn b/doc/upgrades.mdwn new file mode 100644 index 000000000..1371dc033 --- /dev/null +++ b/doc/upgrades.mdwn @@ -0,0 +1,67 @@ +Occasionally improvments are made to how git-annex stores its data, +that require an upgrade process to convert repositories made with an older +version to be used by a newer version. It's annoying, it should happen +rarely, but sometimes, it's worth it. + +There's a committment that git-annex will always support upgrades from all +past versions. After all, you may have offline drives from an earlier +git-annex, and might want to use them with a newer git-annex. + +## Upgrade process + +git-annex will automatically notice if it is run in a repository that +needs an upgrade, and perform the upgrade before running whatever it +was asked to do. Or you can use the "git annex upgrade" command to +explicitly do an upgrade. The upgrade can tend to take a while, +if you have a lot of files. + +Each clone of a repository should be individually upgraded. +Until a repository's remotes have been upgraded, git-annex +may refuse to communicate with them. + +Generally, start by upgrading one repository, and then you can commit +the changes git-annex staged during upgrade, and push them out to other +repositories. And then upgrade those other repositories. Doing it this +way avoids git-annex doing some duplicate work during the upgrade. + +The upgrade process is guaranteed to be conflict-free. Unless you +already have git conflicts in your repository or between repositories. +Upgrading a repository with conflicts is not recommended; resolve the +conflicts first before upgrading git-annex. + +Example upgrade process: + + cd localrepo + git pull + git annex upgrade + (Upgrading object directory layout v1 to v2...) + git commit -m "upgrade v1 to v2" + git push + + ssh remote + cd remoterepo + git pull + git annex upgrade + ... + +## Upgrade events, so far + +### v1 -> v2 (git-annex version 0.23 to version 0.20110316) + +Involved adding hashing to .git/annex/ and changing the names of all keys. +Symlinks changed. + +Also, hashing was added to location log files in .git-annex/. +And .gitattributes needed to have another line added to it. + +Handled transparently. + +### v0 -> v1 (git-annex version 0.03 to version 0.04) + +Involved a reogranisation of the layout of .git/annex/. Symlinks changed. + +Handled more or less transparently, although git-annex was just 2 weeks +old at the time, and had few users other than Joey. + +This upgrade is belived to still be supported, but has not been tested +lately. diff --git a/doc/walkthrough/modifying_annexed_files.mdwn b/doc/walkthrough/modifying_annexed_files.mdwn index 3ad4e82ea..f75b73a24 100644 --- a/doc/walkthrough/modifying_annexed_files.mdwn +++ b/doc/walkthrough/modifying_annexed_files.mdwn @@ -27,7 +27,7 @@ and this symlink is what gets committed to git in the end. add my_cool_big_file ok [master 64cda67] changed an annexed file 2 files changed, 2 insertions(+), 1 deletions(-) - create mode 100644 .git-annex/WORM:1289672605:30:file.log + create mode 100644 .git-annex/WORM-s30-m1289672605--file.log There is one problem with using `git commit` like this: Git wants to first stage the entire contents of the file in its index. That can be slow for diff --git a/doc/walkthrough/moving_file_content_between_repositories.mdwn b/doc/walkthrough/moving_file_content_between_repositories.mdwn index d7150f109..6b3e3f4e8 100644 --- a/doc/walkthrough/moving_file_content_between_repositories.mdwn +++ b/doc/walkthrough/moving_file_content_between_repositories.mdwn @@ -9,5 +9,5 @@ makes it very easy. move my_cool_big_file (moving to usbdrive...) ok # git annex move video/hackity_hack_and_kaxxt.mov --from fileserver move video/hackity_hack_and_kaxxt.mov (moving from fileserver...) - WORM:1274316523:86050597:hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02 + WORM-s86050597-m1274316523--hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02 ok diff --git a/doc/walkthrough/unused_data.mdwn b/doc/walkthrough/unused_data.mdwn index 69a581fe1..9be32577c 100644 --- a/doc/walkthrough/unused_data.mdwn +++ b/doc/walkthrough/unused_data.mdwn @@ -12,8 +12,8 @@ eliminate it to save space. unused (checking for unused data...) Some annexed data is no longer pointed to by any files in the repository. NUMBER KEY - 1 WORM:1289672605:3:file - 2 WORM:1289672605:14:file + 1 WORM-s3-m1289672605--file + 2 WORM-s14-m1289672605--file (To see where data was previously used, try: git log --stat -S'KEY') (To remove unwanted data: git-annex dropunused NUMBER) ok diff --git a/doc/walkthrough/using_ssh_remotes.mdwn b/doc/walkthrough/using_ssh_remotes.mdwn index 6af9e1f47..4c2f830de 100644 --- a/doc/walkthrough/using_ssh_remotes.mdwn +++ b/doc/walkthrough/using_ssh_remotes.mdwn @@ -13,7 +13,7 @@ Now you can get files and they will be transferred (using `rsync` via `ssh`): # git annex get my_cool_big_file get my_cool_big_file (getting UUID for origin...) (copying from origin...) - WORM:1285650548:2159:my_cool_big_file 100% 2159 2.1KB/s 00:00 + WORM-s2159-m1285650548--my_cool_big_file 100% 2159 2.1KB/s 00:00 ok When you drop files, git-annex will ssh over to the remote and make diff --git a/doc/walkthrough/using_the_URL_backend.mdwn b/doc/walkthrough/using_the_URL_backend.mdwn index fe79a6be2..585fd0668 100644 --- a/doc/walkthrough/using_the_URL_backend.mdwn +++ b/doc/walkthrough/using_the_URL_backend.mdwn @@ -5,7 +5,7 @@ Another handy backend is the URL backend, which can fetch file's content from remote URLs. Here's how to set up some files in your repository that use this backend: - # git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile + # git annex fromkey --key=URL--http://www.archive.org/somefile somefile fromkey somefile ok # git commit -m "added a file from the Internet Archive" |