aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2015-12-27 15:59:59 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2015-12-27 15:59:59 -0400
commit60c88820987596809091ee010e6be2a083888bc8 (patch)
treedc2540c6deadfcf3efee1fd95948bcbd6f219db5 /doc
parent17490f3685aee698e10555c5dc3e915a317c2250 (diff)
annex.thin
Decided it's too scary to make v6 unlocked files have 1 copy by default, but that should be available to those who need it. This is consistent with git-annex not dropping unused content without --force, etc. * Added annex.thin setting, which makes unlocked files in v6 repositories be hard linked to their content, instead of a copy. This saves disk space but means any modification of an unlocked file will lose the local (and possibly only) copy of the old version. * Enable annex.thin by default on upgrade from direct mode to v6, since direct mode made the same tradeoff. * fix: Adjusts unlocked files as configured by annex.thin.
Diffstat (limited to 'doc')
-rw-r--r--doc/git-annex-fix.mdwn9
-rw-r--r--doc/git-annex-unlock.mdwn10
-rw-r--r--doc/git-annex.mdwn8
-rw-r--r--doc/tips/unlocked_files.mdwn83
-rw-r--r--doc/todo/smudge.mdwn32
-rw-r--r--doc/upgrades.mdwn4
6 files changed, 88 insertions, 58 deletions
diff --git a/doc/git-annex-fix.mdwn b/doc/git-annex-fix.mdwn
index bd6653550..e505ea406 100644
--- a/doc/git-annex-fix.mdwn
+++ b/doc/git-annex-fix.mdwn
@@ -1,6 +1,6 @@
# NAME
-git-annex fix - fix up symlinks to point to annexed content
+git-annex fix - fix up links to annexed content
# SYNOPSIS
@@ -11,8 +11,11 @@ git annex fix `[path ...]`
Fixes up symlinks that have become broken to again point to annexed
content.
-This is useful to run if you have been moving the symlinks around,
-but is done automatically when committing a change with git too.
+This is useful to run manually when you have been moving the symlinks
+around, but is done automatically when committing a change with git too.
+
+Also, adjusts unlocked files to be copies or hard links as
+configured by annex.thin.
# OPTIONS
diff --git a/doc/git-annex-unlock.mdwn b/doc/git-annex-unlock.mdwn
index 123146836..4b2b809fd 100644
--- a/doc/git-annex-unlock.mdwn
+++ b/doc/git-annex-unlock.mdwn
@@ -10,7 +10,7 @@ git annex unlock `[path ...]`
Normally, the content of annexed files is protected from being changed.
Unlocking an annexed file allows it to be modified. This replaces the
-symlink for each specified file with a copy of the file's content.
+symlink for each specified file with the file's content.
You can then modify it and `git annex add` (or `git commit`) to save your
changes.
@@ -22,6 +22,14 @@ can use `git add` to add a fie to the annex in unlocked form. This allows
workflows where a file starts out unlocked, is modified as necessary, and
is locked once it reaches its final version.
+Normally, unlocking a file requires a copy to be made of its content,
+so that its original content is preserved, while the copy can be modified.
+To use less space, annex.thin can be set to true; this makes a hard link
+to the content be made instead of a copy. (Only when supported by the file
+system, and only in repository version 6.) While this can save considerable
+disk space, any modification made to a file will cause the old version of the
+file to be lost from the local repository. So, enable annex.thin with care.
+
# OPTIONS
* file matching options
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index 1a2fd6e67..299428d1e 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -904,6 +904,14 @@ Here are all the supported configuration settings.
will automatically set annex.hardlink and mark the repository as
untrusted.
+* `annex.thin`
+
+ Set this to `true` to make unlocked files be a hard link to their content
+ in the annex, rather than a second copy. (Only when supported by the file
+ system, and only in repository version 6.) This can save considerable
+ disk space, but modification made to a file will lose the local (and
+ possibly only) copy of the old version. So, enable with care.
+
* `annex.delayadd`
Makes the watch and assistant commands delay for the specified number of
diff --git a/doc/tips/unlocked_files.mdwn b/doc/tips/unlocked_files.mdwn
index 220c46e51..fc43dada8 100644
--- a/doc/tips/unlocked_files.mdwn
+++ b/doc/tips/unlocked_files.mdwn
@@ -49,9 +49,11 @@ Or, you can init a new repository in v6 mode.
# git init
# git annex init --version=6
+## using it
+
Using a v6 repository is easy! Just use regular git commands to add
-and commit files. Under the hood, git will use git-annex to store the file
-contents.
+and commit files. git will use git-annex to store the file contents,
+and the files will be left unlocked.
[[!template id=note text="""
Want `git add` to add some file contents to the annex, but store the contents of
@@ -70,8 +72,8 @@ smaller files in git itself? Configure annex.largefiles to match the former.
# git annex find
my_cool_big_file
-You can make whatever changes you like to committed files, and commit your
-changes.
+You can make whatever modifications you want to unlocked files, and commit
+your changes.
# echo more stuff >> my_cool_big_file
# git mv my_cool_big_file my_cool_bigger_file
@@ -81,47 +83,62 @@ changes.
delete mode 100644 my_cool_big_file
create mode 100644 my_cool_bigger_file
-Under the hood, this uses git's [[todo/smudge]] filter interface,
-and git-annex converts between the content of the big file and a pointer file,
+Under the hood, this uses git's [[todo/smudge]] filter interface, and
+git-annex converts between the content of the big file and a pointer file,
which is what gets committed to git.
-A v6 repository can have both locked and unlocked files. You can switch
+A v6 repository can contain both locked and unlocked files. You can switch
a file back and forth using the `git annex lock` and `git annex unlock`
commands. This changes what's stored in git between a git-annex symlink
-(locked) and a git-annex pointer file (unlocked).
+(locked) and a git-annex pointer file (unlocked). To add a file to
+the repository in locked mode, use `git annex add`; to add a file in
+unlocked mode, use `git add`.
+
+## using less disk space
+
+Unlocked files are handy, but they have one significant disadvantage
+compared with locked files: They use more disk space.
+While only one copy of a locked file has to be stored, normally,
+two copies of an unlocked file are stored on disk. One copy is in
+the git work tree, where you can use and modify it,
+and the other is stashed away in `.git/annex/objects` (see [[internals]]).
+
+The reason for that second copy is to preserve the old version of the file,
+if you modify the unlocked file in the work tree. Being able to access
+old versions of files is an important part of git after all.
-## danger will robinson
+That's a good safe default. But there are ways to use git-annex that
+make the second copy not be worth keeping:
[[!template id=note text="""
-Double the disk space is used on systems like Windows that don't support
-hard links.
+When a [[direct_mode]] repository is upgraded, annex.thin is automatically
+set, because direct mode made the same single-copy tradeoff.
"""]]
-In contrast with locked files, which are quite safe, using unlocked files is a
-little bit dangerous. git-annex tries to avoid storing a duplicate copy of an
-unlocked file in your local repository, in order to not use double the disk
-space. But this means that an unlocked file can be the only copy of that
-version of the file's content. Modify it, and oops, you lost the old version!
+* When you're using git-annex to sync the current version of files acrosss
+ devices, and don't care much about previous versions.
+* When you have set up a backup repository, and use git-annex to copy
+ your files to the backup.
+
+In situations like these, you may want to avoid the overhead of the second
+local copy of unlocked files. There's config setting for that.
+
+ git config annex.thin true
+
+After changing annex.thin, you'll want to fix up the work tree to
+match the new setting:
-In fact, that happened in the examples above, and you probably didn't notice
-until now.
+ git annex fix
- # git checkout HEAD^
- HEAD is now at 92f2725 added my_cool_big_file to the annex
- # cat my_cool_big_file
- /annex/objects/SHA256E-s30--e7aaf46f227886c10c98f8f76cae681afd0521438c78f958fc27114674b391a4
+Note that setting annex.thin only has any effect on systems that support
+hard links. Ie, not Windows, and not FAT filesystems.
-Woah, what's all that?! Well, it's the pointer file that gets checked into
-git. You'd see the same thing if you had used `git annex drop` to drop
-the content of the file from your repository.
+## tradeoffs
-In the example above, the content wasn't explicitly dropped, but it was
-modified while it was unlocked... and so the old version of the content
-was lost.
+Setting annex.thin can save a lot of disk space, but it's a tradeoff
+between disk usage and safety.
-If this is worrying -- and it should be -- you'll want to keep files locked
-most of the time, or set up a remote and have git-annex copy the content of
-files to the remote as a backup.
+Keeping files locked is safer and also avoids using unnecessary
+disk space, but trades off easy modification of files.
-By the way, don't worry about deleting an unlocked file. That *won't* lose
-its content.
+Pick the tradeoff that's right for you.
diff --git a/doc/todo/smudge.mdwn b/doc/todo/smudge.mdwn
index 1e61a945f..03e253952 100644
--- a/doc/todo/smudge.mdwn
+++ b/doc/todo/smudge.mdwn
@@ -13,10 +13,11 @@ git-annex should use smudge/clean filters.
# because it doesn't know it has that name
# git commit clears up this mess
* Interaction with shared clones. Should avoid hard linking from/to a
- object in a shared clone if either repository has the object unlocked.
- (And should avoid unlocking an object if it's hard linked to a shared clone,
- but that's already accomplished because it avoids unlocking an object if
- it's hard linked at all)
+ object in a shared clone if either repository has the object unlocked
+ with a hard link in place.
+ (And should avoid unlocking an object with a hard link if it's hard
+ linked to a shared clone, but that's already accomplished because it
+ avoids unlocking an object if it's hard linked at all)
* Make automatic merge conflict resolution work for pointer files.
- Should probably automatically handle merge conflicts between annex
symlinks and pointer files too. Maybe by always resulting in a pointer
@@ -46,7 +47,7 @@ git-annex should use smudge/clean filters.
* Eventually (but not yet), make v6 the default for new repositories.
Note that the assistant forces repos into direct mode; that will need to
- be changed then.
+ be changed then, and it should enable annex.thin.
* Later still, remove support for direct mode, and enable automatic
v5 to v6 upgrades.
@@ -158,7 +159,7 @@ cannot directly write to the file or git gets unhappy.
.. Are very important, otherwise a repo can't scale past the size of the
smallest client's disk!
-It would be nice if the smudge filter could hard link or symlink a work
+It would be nice if the smudge filter could hard link a work
tree file to the annex object.
But currently, the smudge filter can't modify the work tree file on its own
@@ -184,7 +185,9 @@ smudged file in the work tree when renaming it. It instead deletes the old
file and asks the smudge filter to smudge the new filename.
So, copies need to be maintained in .git/annex/objects, though it's ok
-to use hard links to the work tree files.
+to use hard links to the work tree files. (Although somewhat unsafe
+since modification of the file will lose the old version. annex.thin
+setting can enable this.)
Even if hard links are used, smudge needs to output the content of an
annexed file, which will result in duplication when merging in renames of
@@ -241,21 +244,16 @@ git-annex clean:
Generate annex key from filename and content from stdin.
- Hard link .git/annex/objects to the file, if it doesn't already exist.
- (On platforms not supporting hardlinks, copy the file to
- .git/annex/objects.)
+ Hard link (annex.thin) or copy .git/annex/objects to the file,
+ if it doesn't already exist.
This is done to prevent losing the only copy of a file when eg
doing a git checkout of a different branch, or merging a commit that
- renames or deletes a file. But, no attempt is made to
+ renames or deletes a file. But, with annex.thin no attempt is made to
protect the object from being modified. If a user wants to
protect object contents from modification, they should use
- `git annex add`, not `git add`, or they can `git annex lock` after adding,.
-
- There could be a configuration knob to cause a copy to be made to
- .git/annex/objects -- useful for those crippled filesystems. It might
- also drop that copy once the object gets uploaded to another repo ...
- But that gets complicated quickly.
+ `git annex add`, not `git add`, or they can `git annex lock` after adding,
+ or not enable annex.thin.
Update file map.
diff --git a/doc/upgrades.mdwn b/doc/upgrades.mdwn
index 9d30c2f14..b3deab715 100644
--- a/doc/upgrades.mdwn
+++ b/doc/upgrades.mdwn
@@ -72,10 +72,6 @@ The behavior of some commands changes in an upgraded repository:
* `git annex unlock` and `git annex lock` change how the pointer to
the annexed content is stored in git.
-If a repository is only used in indirect mode, you can use git-annex
-v5 and v6 in different clones of the same indirect mode repository without
-problems.
-
On upgrade, all files in a direct mode repository will be converted to
unlocked files. The upgrade will stage changes to all annexed files in
the git repository, which you can then commit.