From 98e246b49b3c4fed319fe7bc1e900ba20ebfc9e1 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 27 Feb 2011 12:45:48 -0400 Subject: split the walkthrough and inline back together --- doc/walkthrough.mdwn | 441 +-------------------- doc/walkthrough/adding_a_remote.mdwn | 19 + doc/walkthrough/adding_files.mdwn | 11 + doc/walkthrough/backups.mdwn | 25 ++ doc/walkthrough/creating_a_repository.mdwn | 6 + doc/walkthrough/fsck:_verifying_your_data.mdwn | 16 + doc/walkthrough/fsck:_when_things_go_wrong.mdwn | 13 + doc/walkthrough/getting_file_content.mdwn | 16 + .../migrating_data_to_a_new_backend.mdwn | 16 + doc/walkthrough/modifying_annexed_files.mdwn | 43 ++ .../moving_file_content_between_repositories.mdwn | 13 + doc/walkthrough/removing_files.mdwn | 6 + .../removing_files:_When_things_go_wrong.mdwn | 24 ++ doc/walkthrough/renaming_files.mdwn | 13 + .../transferring_files:_When_things_go_wrong.mdwn | 18 + doc/walkthrough/untrusted_repositories.mdwn | 28 ++ doc/walkthrough/unused_data.mdwn | 30 ++ doc/walkthrough/using_ssh_remotes.mdwn | 33 ++ doc/walkthrough/using_the_SHA1_backend.mdwn | 11 + doc/walkthrough/using_the_URL_backend.mdwn | 24 ++ 20 files changed, 386 insertions(+), 420 deletions(-) create mode 100644 doc/walkthrough/adding_a_remote.mdwn create mode 100644 doc/walkthrough/adding_files.mdwn create mode 100644 doc/walkthrough/backups.mdwn create mode 100644 doc/walkthrough/creating_a_repository.mdwn create mode 100644 doc/walkthrough/fsck:_verifying_your_data.mdwn create mode 100644 doc/walkthrough/fsck:_when_things_go_wrong.mdwn create mode 100644 doc/walkthrough/getting_file_content.mdwn create mode 100644 doc/walkthrough/migrating_data_to_a_new_backend.mdwn create mode 100644 doc/walkthrough/modifying_annexed_files.mdwn create mode 100644 doc/walkthrough/moving_file_content_between_repositories.mdwn create mode 100644 doc/walkthrough/removing_files.mdwn create mode 100644 doc/walkthrough/removing_files:_When_things_go_wrong.mdwn create mode 100644 doc/walkthrough/renaming_files.mdwn create mode 100644 doc/walkthrough/transferring_files:_When_things_go_wrong.mdwn create mode 100644 doc/walkthrough/untrusted_repositories.mdwn create mode 100644 doc/walkthrough/unused_data.mdwn create mode 100644 doc/walkthrough/using_ssh_remotes.mdwn create mode 100644 doc/walkthrough/using_the_SHA1_backend.mdwn create mode 100644 doc/walkthrough/using_the_URL_backend.mdwn diff --git a/doc/walkthrough.mdwn b/doc/walkthrough.mdwn index d08b247f7..896b560ec 100644 --- a/doc/walkthrough.mdwn +++ b/doc/walkthrough.mdwn @@ -2,423 +2,24 @@ A walkthrough of the basic features of git-annex. [[!toc]] -## creating a repository - -This is very straightforward. Just tell it a description of the repository. - - # mkdir ~/annex - # cd ~/annex - # git init - # git annex init "my laptop" - -## adding a remote - -Like any other git repository, git-annex repositories have remotes. -Let's start by adding a USB drive as a remote. - - # sudo mount /media/usb - # cd /media/usb - # git clone ~/annex - # cd annex - # git annex init "portable USB drive" - # git remote add laptop ~/annex - # cd ~/annex - # git remote add usbdrive /media/usb - -This is all standard ad-hoc distributed git repository setup. -The only git-annex specific part is telling it the name -of the new repository created on the USB drive. - -Notice that both repos are set up as remotes of one another. This lets -either get annexed files from the other. You'll want to do that even -if you are using git in a more centralized fashion. - -## adding files - - # cd ~/annex - # cp /tmp/big_file . - # cp /tmp/debian.iso . - # git annex add . - add big_file ok - add debian.iso ok - # git commit -a -m added - -When you add a file to the annex and commit it, only a symlink to -the annexed content is committed. The content itself is stored in -git-annex's backend. - -## renaming files - - # cd ~/annex - # git mv big_file my_cool_big_file - # mkdir iso - # git mv debian.iso iso/ - # git commit -m moved - -You can use any normal git operations to move files around, or even -make copies or delete them. - -Notice that, since annexed files are represented by symlinks, -the symlink will break when the file is moved into a subdirectory. -But, git-annex will fix this up for you when you commit -- -it has a pre-commit hook that watches for and corrects broken symlinks. - -## getting file content - -A repository does not always have all annexed file contents available. -When you need the content of a file, you can use "git annex get" to -make it available. - -We can use this to copy everything in the laptop's annex to the -USB drive. - - # cd /media/usb/annex - # git pull laptop master - # git annex get . - get my_cool_big_file (copying from laptop...) ok - get iso/debian.iso (copying from laptop...) ok - -Notice that you had to git pull from laptop first, this lets git-annex know -what has changed in laptop, and so it knows about the files present there and -can get them. - -## transferring files: When things go wrong - -After a while, you'll have several annexes, with different file contents. -You don't have to try to keep all that straight; git-annex does -[[location_tracking]] for you. If you ask it to get a file and the drive -or file server is not accessible, it will let you know what it needs to get -it: - - # git annex get video/hackity_hack_and_kaxxt.mov - get video/_why_hackity_hack_and_kaxxt.mov (not available) - Unable to access these remotes: usbdrive, server - Try making some of these repositories available: - 5863d8c0-d9a9-11df-adb2-af51e6559a49 -- my home file server - 58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive - ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive - failed - # sudo mount /media/usb - # git annex get video/hackity_hack_and_kaxxt.mov - get video/hackity_hack_and_kaxxt.mov (copying from usbdrive...) ok - # git commit -a -m "got a video I want to rewatch on the plane" - -## removing files - -You can always drop files safely. Git-annex checks that some other annex -has the file before removing it. - - # git annex drop iso/debian.iso - drop iso/Debian_5.0.iso ok - # git commit -a -m "freed up space" - -## removing files: When things go wrong - -Before dropping a file, git-annex wants to be able to look at other -remotes, and verify that they still have a file. After all, it could -have been dropped from them too. If the remotes are not mounted/available, -you'll see something like this. - - # git annex drop important_file other.iso - drop important_file (unsafe) - Could only verify the existence of 0 out of 1 necessary copies - Unable to access these remotes: usbdrive - Try making some of these repositories available: - 58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive - ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive - (Use --force to override this check, or adjust annex.numcopies.) - failed - drop other.iso (unsafe) - Could only verify the existence of 0 out of 1 necessary copies - No other repository is known to contain the file. - (Use --force to override this check, or adjust annex.numcopies.) - failed - -Here you might --force it to drop `important_file` if you [[trust]] your backup. -But `other.iso` looks to have never been copied to anywhere else, so if -it's something you want to hold onto, you'd need to transfer it to -some other repository before dropping it. - -## modifying annexed files - -Normally, the content of files in the annex is prevented from being modified. -That's a good thing, because it might be the only copy, you wouldn't -want to lose it in a fumblefingered mistake. - - # echo oops > my_cool_big_file - bash: my_cool_big_file: Permission denied - -In order to modify a file, it should first be unlocked. - - # git annex unlock my_cool_big_file - unlock my_cool_big_file (copying...) ok - -That replaces the symlink that normally points at its content with a copy -of the content. You can then modify the file like any regular file. Because -it is a regular file. - -(If you decide you don't need to modify the file after all, or want to discard -modifications, just use `git annex lock`.) - -When you `git commit`, git-annex's pre-commit hook will automatically -notice that you are committing an unlocked file, and add its new content -to the annex. The file will be replaced with a symlink to the new content, -and this symlink is what gets committed to git in the end. - - # echo "now smaller, but even cooler" > my_cool_big_file - # git commit my_cool_big_file -m "changed an annexed file" - add my_cool_big_file ok - [master 64cda67] changed an annexed file - 2 files changed, 2 insertions(+), 1 deletions(-) - create mode 100644 .git-annex/WORM:1289672605:30:file.log - -There is one problem with using `git commit` like this: Git wants to first -stage the entire contents of the file in its index. That can be slow for -big files (sorta why git-annex exists in the first place). So, the -automatic handling on commit is a nice safety feature, since it prevents -the file content being accidentally committed into git. But when working with -big files, it's faster to explicitly add them to the annex yourself -before committing. - - # echo "now smaller, but even cooler yet" > my_cool_big_file - # git annex add my_cool_big_file - add my_cool_big_file ok - # git commit my_cool_big_file -m "changed an annexed file" - -## using ssh remotes - -So far in this walkthrough, git-annex has been used with a remote -repository on a USB drive. But it can also be used with a git remote -that is truely remote, a host accessed by ssh. - -Say you have a desktop on the same network as your laptop and want -to clone the laptop's annex to it: - - # git clone ssh://mylaptop/home/me/annex ~/annex - # cd ~/annex - # git annex init "my desktop" - -Now you can get files and they will be transferred (using `rsync`): - - # git annex get my_cool_big_file - get my_cool_big_file (getting UUID for origin...) (copying from origin...) - WORM:1285650548:2159:my_cool_big_file 100% 2159 2.1KB/s 00:00 - ok - -When you drop files, git-annex will ssh over to the remote and make -sure the file's content is still there before removing it locally: - - # git annex drop my_cool_big_file - drop my_cool_big_file (checking origin..) ok - -Note that normally git-annex prefers to use non-ssh remotes, like -a USB drive, before ssh remotes. They are assumed to be faster/cheaper to -access, if available. There is a annex-cost setting you can configure in -`.git/config` to adjust which repositories it prefers. See -[[the_man_page|git-annex]] for details. - -Also, note that you need full shell access for this to work -- -git-annex needs to be able to ssh in and run commands. - -## moving file content between repositories - -Often you will want to move some file contents from a repository to some -other one. For example, your laptop's disk is getting full; time to move -some files to an external disk before moving another file from a file -server to your laptop. Doing that by hand (by using `git annex get` and -`git annex drop`) is possible, but a bit of a pain. `git annex move` -makes it very easy. - - # git annex move my_cool_big_file --to usbdrive - move my_cool_big_file (moving to usbdrive...) ok - # git annex move video/hackity_hack_and_kaxxt.mov --from fileserver - move video/hackity_hack_and_kaxxt.mov (moving from fileserver...) - WORM:1274316523:86050597:hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02 - ok - -## using the URL backend - -git-annex has multiple key-value [[backends]]. So far this walkthrough has -demonstrated the default, WORM (Write Once, Read Many) backend. - -Another handy backend is the URL backend, which can fetch file's content -from remote URLs. Here's how to set up some files in your repository -that use this backend: - - # git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile - fromkey somefile ok - # git commit -m "added a file from the Internet Archive" - -Now you if you ask git-annex to get that file, it will download it, -and cache it locally. - - # git annex get somefile - get somefile (downloading) - #########################################################################100.0% - ok - -You can always drop files downloaded by the URL backend. It is assumed -that the URL is stable; no local backup is kept. - - # git annex drop somefile - drop somefile (ok) - -## using the SHA1 backend - -Another handy alternative to the default [[backend|backends]] is the -SHA1 backend. This backend provides more git-style assurance that your data -has not been damaged. And the checksum means that when you add the same -content to the annex twice, only one copy need be stored in the backend. - -The only reason it's not the default is that it needs to checksum -files when they're added to the annex, and this can slow things down -significantly for really big files. To make SHA1 the default, just -add something like this to `.gitattributes`: - - * annex.backend=SHA1 - -## migrating data to a new backend - -Maybe you started out using the WORM backend, and have now configured -git-annex to use SHA1. But files you added to the annex before still -use the WORM backend. There is a simple command that can migrate that -data: - - # git annex migrate my_cool_big_file - migrate my_cool_big_file (checksum...) ok - -You can only migrate files whose content is currently available. Other -files will be skipped. - -After migrating a file to a new backend, the old content in the old backend -will still be present. That is necessary because multiple files -can point to the same content. The `git annex unused` subcommand can be -used to clear up that detritus later. Note that hard links are used, -to avoid wasting disk space. - -## unused data - -It's possible for data to accumulate in the annex that no files point to -anymore. One way it can happen is if you `git rm` a file without -first calling `git annex drop`. And, when you modify an annexed file, the old -content of the file remains in the annex. Another way is when migrating -between backends. - -This might be historical data you want to preserve, so git-annex defaults to -preserving it. So from time to time, you may want to check for such data and -eliminate it to save space. - - # git annex unused - unused (checking for unused data...) - Some annexed data is no longer pointed to by any files in the repository. - NUMBER KEY - 1 WORM:1289672605:3:file - 2 WORM:1289672605:14:file - (To see where data was previously used, try: git log --stat -S'KEY') - (To remove unwanted data: git-annex dropunused NUMBER) - ok - -After running `git annex unused`, you can follow the instructions to examine -the history of files that used the data, and if you decide you don't need that -data anymore, you can easily remove it: - - # git annex dropunused 1 - dropunused 1 ok - -Hint: To drop a lot of unused data, use a command like this: - - # git annex dropunused `seq 1 1000` - -## fsck: verifying your data - -You can use the fsck subcommand to check for problems in your data. -What can be checked depends on the [[backend|backends]] you've used to store -the data. For example, when you use the SHA1 backend, fsck will verify that -the checksums of your files are good. Fsck also checks that the annex.numcopies -setting is satisfied for all files. - - # git annex fsck - unused (checking for unused data...) ok - fsck my_cool_big_file (checksum...) ok - ... - -You can also specify the files to check. This is particularly useful if -you're using sha1 and don't want to spend a long time checksumming everything. - - # git annex fsck my_cool_big_file - fsck my_cool_big_file (checksum...) ok - -## fsck: When things go wrong - -Fsck never deletes possibly bad data; instead it will be moved to -`.git/annex/bad/` for you to recover. Here is a sample of what fsck -might say about a badly messed up annex: - - # git annex fsck - fsck my_cool_big_file (checksum...) - git-annex: Bad file content; moved to .git/annex/bad/SHA1:7da006579dd64330eb2456001fd01948430572f2 - git-annex: ** No known copies of the file exist! - failed - fsck important_file - git-annex: Only 1 of 2 copies exist. Run git annex get somewhere else to back it up. - failed - git-annex: 2 failed - -## backups - -git-annex can be configured to require more than one copy of a file exists, -as a simple backup for your data. This is controlled by the "annex.numcopies" -setting, which defaults to 1 copy. Let's change that to require 2 copies, -and send a copy of every file to a USB drive. - - # echo "* annex.numcopies=2" >> .gitattributes - # git annex copy . --to usbdrive - -Now when we try to `git annex drop` a file, it will verify that it -knows of 2 other repositories that have a copy before removing its -content from the current repository. - -You can also vary the number of copies needed, depending on the file name. -So, if you want 3 copies of all your flac files, but only 1 copy of oggs: - - # echo "*.ogg annex.numcopies=1" >> .gitattributes - # echo "*.flac annex.numcopies=3" >> .gitattributes - -Or, you might want to make a directory for important stuff, and configure -it so anything put in there is backed up more thoroughly: - - # mkdir important_stuff - # echo "* annex.numcopies=3" > important_stuff/.gitattributes - -For more details about the numcopies setting, see [[copies]]. - -## untrusted repositories - -Suppose you have a USB thumb drive and are using it as a git annex -repository. You don't trust the drive, because you could lose it, or -accidentally run it through the laundry. Or, maybe you have a drive that -you know is dying, and you'd like to be warned if there are any files -on it not backed up somewhere else. Maybe the drive has already died -or been lost. - -You can let git-annex know that you don't trust a repository, and it will -adjust its behavior to avoid relying on that repositories's continued -availability. - - # git annex untrust usbdrive - untrust usbdrive ok - -Now when you do a fsck, you'll be warned appropriately: - - # git annex fsck . - fsck my_big_file - Only these untrusted locations may have copies of this file! - 05e296c4-2989-11e0-bf40-bad1535567fe -- portable USB drive - Back it up to trusted locations with git-annex copy. - failed - -Also, git-annex will refuse to drop a file from elsewhere just because -it can see a copy on the untrusted repository. - -It's also possible to tell git-annex that you have an unusually high -level of trust for a repository. See [[trust]] for details. +[[!inline feeds=no pagenames=""" + creating_a_repository + adding_a_remote + adding_files + renaming_files + getting_file_content + transferring_files:_When_things_go_wrong + removing_files + removing_files:_When_things_go_wrong + modifying_annexed_files + using_ssh_remotes + moving_file_content_between_repositories + using_the_URL_backend + using_the_SHA1_backend + migrating_data_to_a_new_backend + unused_data + fsck:_verifying_your_data + fsck:_when_things_go_wrong + backups + untrusted_repositories +"""]] diff --git a/doc/walkthrough/adding_a_remote.mdwn b/doc/walkthrough/adding_a_remote.mdwn new file mode 100644 index 000000000..be8e8e7fe --- /dev/null +++ b/doc/walkthrough/adding_a_remote.mdwn @@ -0,0 +1,19 @@ +Like any other git repository, git-annex repositories have remotes. +Let's start by adding a USB drive as a remote. + + # sudo mount /media/usb + # cd /media/usb + # git clone ~/annex + # cd annex + # git annex init "portable USB drive" + # git remote add laptop ~/annex + # cd ~/annex + # git remote add usbdrive /media/usb + +This is all standard ad-hoc distributed git repository setup. +The only git-annex specific part is telling it the name +of the new repository created on the USB drive. + +Notice that both repos are set up as remotes of one another. This lets +either get annexed files from the other. You'll want to do that even +if you are using git in a more centralized fashion. diff --git a/doc/walkthrough/adding_files.mdwn b/doc/walkthrough/adding_files.mdwn new file mode 100644 index 000000000..77a7fbc15 --- /dev/null +++ b/doc/walkthrough/adding_files.mdwn @@ -0,0 +1,11 @@ + # cd ~/annex + # cp /tmp/big_file . + # cp /tmp/debian.iso . + # git annex add . + add big_file ok + add debian.iso ok + # git commit -a -m added + +When you add a file to the annex and commit it, only a symlink to +the annexed content is committed. The content itself is stored in +git-annex's backend. diff --git a/doc/walkthrough/backups.mdwn b/doc/walkthrough/backups.mdwn new file mode 100644 index 000000000..9723022b4 --- /dev/null +++ b/doc/walkthrough/backups.mdwn @@ -0,0 +1,25 @@ +git-annex can be configured to require more than one copy of a file exists, +as a simple backup for your data. This is controlled by the "annex.numcopies" +setting, which defaults to 1 copy. Let's change that to require 2 copies, +and send a copy of every file to a USB drive. + + # echo "* annex.numcopies=2" >> .gitattributes + # git annex copy . --to usbdrive + +Now when we try to `git annex drop` a file, it will verify that it +knows of 2 other repositories that have a copy before removing its +content from the current repository. + +You can also vary the number of copies needed, depending on the file name. +So, if you want 3 copies of all your flac files, but only 1 copy of oggs: + + # echo "*.ogg annex.numcopies=1" >> .gitattributes + # echo "*.flac annex.numcopies=3" >> .gitattributes + +Or, you might want to make a directory for important stuff, and configure +it so anything put in there is backed up more thoroughly: + + # mkdir important_stuff + # echo "* annex.numcopies=3" > important_stuff/.gitattributes + +For more details about the numcopies setting, see [[copies]]. diff --git a/doc/walkthrough/creating_a_repository.mdwn b/doc/walkthrough/creating_a_repository.mdwn new file mode 100644 index 000000000..51ff1c72b --- /dev/null +++ b/doc/walkthrough/creating_a_repository.mdwn @@ -0,0 +1,6 @@ +This is very straightforward. Just tell it a description of the repository. + + # mkdir ~/annex + # cd ~/annex + # git init + # git annex init "my laptop" diff --git a/doc/walkthrough/fsck:_verifying_your_data.mdwn b/doc/walkthrough/fsck:_verifying_your_data.mdwn new file mode 100644 index 000000000..cd3a47a8a --- /dev/null +++ b/doc/walkthrough/fsck:_verifying_your_data.mdwn @@ -0,0 +1,16 @@ +You can use the fsck subcommand to check for problems in your data. +What can be checked depends on the [[backend|backends]] you've used to store +the data. For example, when you use the SHA1 backend, fsck will verify that +the checksums of your files are good. Fsck also checks that the annex.numcopies +setting is satisfied for all files. + + # git annex fsck + unused (checking for unused data...) ok + fsck my_cool_big_file (checksum...) ok + ... + +You can also specify the files to check. This is particularly useful if +you're using sha1 and don't want to spend a long time checksumming everything. + + # git annex fsck my_cool_big_file + fsck my_cool_big_file (checksum...) ok diff --git a/doc/walkthrough/fsck:_when_things_go_wrong.mdwn b/doc/walkthrough/fsck:_when_things_go_wrong.mdwn new file mode 100644 index 000000000..05b9f385c --- /dev/null +++ b/doc/walkthrough/fsck:_when_things_go_wrong.mdwn @@ -0,0 +1,13 @@ +Fsck never deletes possibly bad data; instead it will be moved to +`.git/annex/bad/` for you to recover. Here is a sample of what fsck +might say about a badly messed up annex: + + # git annex fsck + fsck my_cool_big_file (checksum...) + git-annex: Bad file content; moved to .git/annex/bad/SHA1:7da006579dd64330eb2456001fd01948430572f2 + git-annex: ** No known copies of the file exist! + failed + fsck important_file + git-annex: Only 1 of 2 copies exist. Run git annex get somewhere else to back it up. + failed + git-annex: 2 failed diff --git a/doc/walkthrough/getting_file_content.mdwn b/doc/walkthrough/getting_file_content.mdwn new file mode 100644 index 000000000..a863303ce --- /dev/null +++ b/doc/walkthrough/getting_file_content.mdwn @@ -0,0 +1,16 @@ +A repository does not always have all annexed file contents available. +When you need the content of a file, you can use "git annex get" to +make it available. + +We can use this to copy everything in the laptop's annex to the +USB drive. + + # cd /media/usb/annex + # git pull laptop master + # git annex get . + get my_cool_big_file (copying from laptop...) ok + get iso/debian.iso (copying from laptop...) ok + +Notice that you had to git pull from laptop first, this lets git-annex know +what has changed in laptop, and so it knows about the files present there and +can get them. diff --git a/doc/walkthrough/migrating_data_to_a_new_backend.mdwn b/doc/walkthrough/migrating_data_to_a_new_backend.mdwn new file mode 100644 index 000000000..b9acb8bd1 --- /dev/null +++ b/doc/walkthrough/migrating_data_to_a_new_backend.mdwn @@ -0,0 +1,16 @@ +Maybe you started out using the WORM backend, and have now configured +git-annex to use SHA1. But files you added to the annex before still +use the WORM backend. There is a simple command that can migrate that +data: + + # git annex migrate my_cool_big_file + migrate my_cool_big_file (checksum...) ok + +You can only migrate files whose content is currently available. Other +files will be skipped. + +After migrating a file to a new backend, the old content in the old backend +will still be present. That is necessary because multiple files +can point to the same content. The `git annex unused` subcommand can be +used to clear up that detritus later. Note that hard links are used, +to avoid wasting disk space. diff --git a/doc/walkthrough/modifying_annexed_files.mdwn b/doc/walkthrough/modifying_annexed_files.mdwn new file mode 100644 index 000000000..3ad4e82ea --- /dev/null +++ b/doc/walkthrough/modifying_annexed_files.mdwn @@ -0,0 +1,43 @@ +Normally, the content of files in the annex is prevented from being modified. +That's a good thing, because it might be the only copy, you wouldn't +want to lose it in a fumblefingered mistake. + + # echo oops > my_cool_big_file + bash: my_cool_big_file: Permission denied + +In order to modify a file, it should first be unlocked. + + # git annex unlock my_cool_big_file + unlock my_cool_big_file (copying...) ok + +That replaces the symlink that normally points at its content with a copy +of the content. You can then modify the file like any regular file. Because +it is a regular file. + +(If you decide you don't need to modify the file after all, or want to discard +modifications, just use `git annex lock`.) + +When you `git commit`, git-annex's pre-commit hook will automatically +notice that you are committing an unlocked file, and add its new content +to the annex. The file will be replaced with a symlink to the new content, +and this symlink is what gets committed to git in the end. + + # echo "now smaller, but even cooler" > my_cool_big_file + # git commit my_cool_big_file -m "changed an annexed file" + add my_cool_big_file ok + [master 64cda67] changed an annexed file + 2 files changed, 2 insertions(+), 1 deletions(-) + create mode 100644 .git-annex/WORM:1289672605:30:file.log + +There is one problem with using `git commit` like this: Git wants to first +stage the entire contents of the file in its index. That can be slow for +big files (sorta why git-annex exists in the first place). So, the +automatic handling on commit is a nice safety feature, since it prevents +the file content being accidentally committed into git. But when working with +big files, it's faster to explicitly add them to the annex yourself +before committing. + + # echo "now smaller, but even cooler yet" > my_cool_big_file + # git annex add my_cool_big_file + add my_cool_big_file ok + # git commit my_cool_big_file -m "changed an annexed file" diff --git a/doc/walkthrough/moving_file_content_between_repositories.mdwn b/doc/walkthrough/moving_file_content_between_repositories.mdwn new file mode 100644 index 000000000..d7150f109 --- /dev/null +++ b/doc/walkthrough/moving_file_content_between_repositories.mdwn @@ -0,0 +1,13 @@ +Often you will want to move some file contents from a repository to some +other one. For example, your laptop's disk is getting full; time to move +some files to an external disk before moving another file from a file +server to your laptop. Doing that by hand (by using `git annex get` and +`git annex drop`) is possible, but a bit of a pain. `git annex move` +makes it very easy. + + # git annex move my_cool_big_file --to usbdrive + move my_cool_big_file (moving to usbdrive...) ok + # git annex move video/hackity_hack_and_kaxxt.mov --from fileserver + move video/hackity_hack_and_kaxxt.mov (moving from fileserver...) + WORM:1274316523:86050597:hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02 + ok diff --git a/doc/walkthrough/removing_files.mdwn b/doc/walkthrough/removing_files.mdwn new file mode 100644 index 000000000..85a7d50a6 --- /dev/null +++ b/doc/walkthrough/removing_files.mdwn @@ -0,0 +1,6 @@ +You can always drop files safely. Git-annex checks that some other annex +has the file before removing it. + + # git annex drop iso/debian.iso + drop iso/Debian_5.0.iso ok + # git commit -a -m "freed up space" diff --git a/doc/walkthrough/removing_files:_When_things_go_wrong.mdwn b/doc/walkthrough/removing_files:_When_things_go_wrong.mdwn new file mode 100644 index 000000000..2d3c0cde0 --- /dev/null +++ b/doc/walkthrough/removing_files:_When_things_go_wrong.mdwn @@ -0,0 +1,24 @@ +Before dropping a file, git-annex wants to be able to look at other +remotes, and verify that they still have a file. After all, it could +have been dropped from them too. If the remotes are not mounted/available, +you'll see something like this. + + # git annex drop important_file other.iso + drop important_file (unsafe) + Could only verify the existence of 0 out of 1 necessary copies + Unable to access these remotes: usbdrive + Try making some of these repositories available: + 58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive + ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive + (Use --force to override this check, or adjust annex.numcopies.) + failed + drop other.iso (unsafe) + Could only verify the existence of 0 out of 1 necessary copies + No other repository is known to contain the file. + (Use --force to override this check, or adjust annex.numcopies.) + failed + +Here you might --force it to drop `important_file` if you [[trust]] your backup. +But `other.iso` looks to have never been copied to anywhere else, so if +it's something you want to hold onto, you'd need to transfer it to +some other repository before dropping it. diff --git a/doc/walkthrough/renaming_files.mdwn b/doc/walkthrough/renaming_files.mdwn new file mode 100644 index 000000000..85964d1ea --- /dev/null +++ b/doc/walkthrough/renaming_files.mdwn @@ -0,0 +1,13 @@ + # cd ~/annex + # git mv big_file my_cool_big_file + # mkdir iso + # git mv debian.iso iso/ + # git commit -m moved + +You can use any normal git operations to move files around, or even +make copies or delete them. + +Notice that, since annexed files are represented by symlinks, +the symlink will break when the file is moved into a subdirectory. +But, git-annex will fix this up for you when you commit -- +it has a pre-commit hook that watches for and corrects broken symlinks. diff --git a/doc/walkthrough/transferring_files:_When_things_go_wrong.mdwn b/doc/walkthrough/transferring_files:_When_things_go_wrong.mdwn new file mode 100644 index 000000000..d8f0a19bd --- /dev/null +++ b/doc/walkthrough/transferring_files:_When_things_go_wrong.mdwn @@ -0,0 +1,18 @@ +After a while, you'll have several annexes, with different file contents. +You don't have to try to keep all that straight; git-annex does +[[location_tracking]] for you. If you ask it to get a file and the drive +or file server is not accessible, it will let you know what it needs to get +it: + + # git annex get video/hackity_hack_and_kaxxt.mov + get video/_why_hackity_hack_and_kaxxt.mov (not available) + Unable to access these remotes: usbdrive, server + Try making some of these repositories available: + 5863d8c0-d9a9-11df-adb2-af51e6559a49 -- my home file server + 58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive + ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive + failed + # sudo mount /media/usb + # git annex get video/hackity_hack_and_kaxxt.mov + get video/hackity_hack_and_kaxxt.mov (copying from usbdrive...) ok + # git commit -a -m "got a video I want to rewatch on the plane" diff --git a/doc/walkthrough/untrusted_repositories.mdwn b/doc/walkthrough/untrusted_repositories.mdwn new file mode 100644 index 000000000..cdb5da7c3 --- /dev/null +++ b/doc/walkthrough/untrusted_repositories.mdwn @@ -0,0 +1,28 @@ +Suppose you have a USB thumb drive and are using it as a git annex +repository. You don't trust the drive, because you could lose it, or +accidentally run it through the laundry. Or, maybe you have a drive that +you know is dying, and you'd like to be warned if there are any files +on it not backed up somewhere else. Maybe the drive has already died +or been lost. + +You can let git-annex know that you don't trust a repository, and it will +adjust its behavior to avoid relying on that repositories's continued +availability. + + # git annex untrust usbdrive + untrust usbdrive ok + +Now when you do a fsck, you'll be warned appropriately: + + # git annex fsck . + fsck my_big_file + Only these untrusted locations may have copies of this file! + 05e296c4-2989-11e0-bf40-bad1535567fe -- portable USB drive + Back it up to trusted locations with git-annex copy. + failed + +Also, git-annex will refuse to drop a file from elsewhere just because +it can see a copy on the untrusted repository. + +It's also possible to tell git-annex that you have an unusually high +level of trust for a repository. See [[trust]] for details. diff --git a/doc/walkthrough/unused_data.mdwn b/doc/walkthrough/unused_data.mdwn new file mode 100644 index 000000000..69a581fe1 --- /dev/null +++ b/doc/walkthrough/unused_data.mdwn @@ -0,0 +1,30 @@ +It's possible for data to accumulate in the annex that no files point to +anymore. One way it can happen is if you `git rm` a file without +first calling `git annex drop`. And, when you modify an annexed file, the old +content of the file remains in the annex. Another way is when migrating +between backends. + +This might be historical data you want to preserve, so git-annex defaults to +preserving it. So from time to time, you may want to check for such data and +eliminate it to save space. + + # git annex unused + unused (checking for unused data...) + Some annexed data is no longer pointed to by any files in the repository. + NUMBER KEY + 1 WORM:1289672605:3:file + 2 WORM:1289672605:14:file + (To see where data was previously used, try: git log --stat -S'KEY') + (To remove unwanted data: git-annex dropunused NUMBER) + ok + +After running `git annex unused`, you can follow the instructions to examine +the history of files that used the data, and if you decide you don't need that +data anymore, you can easily remove it: + + # git annex dropunused 1 + dropunused 1 ok + +Hint: To drop a lot of unused data, use a command like this: + + # git annex dropunused `seq 1 1000` diff --git a/doc/walkthrough/using_ssh_remotes.mdwn b/doc/walkthrough/using_ssh_remotes.mdwn new file mode 100644 index 000000000..831746ac0 --- /dev/null +++ b/doc/walkthrough/using_ssh_remotes.mdwn @@ -0,0 +1,33 @@ +So far in this walkthrough, git-annex has been used with a remote +repository on a USB drive. But it can also be used with a git remote +that is truely remote, a host accessed by ssh. + +Say you have a desktop on the same network as your laptop and want +to clone the laptop's annex to it: + + # git clone ssh://mylaptop/home/me/annex ~/annex + # cd ~/annex + # git annex init "my desktop" + +Now you can get files and they will be transferred (using `rsync`): + + # git annex get my_cool_big_file + get my_cool_big_file (getting UUID for origin...) (copying from origin...) + WORM:1285650548:2159:my_cool_big_file 100% 2159 2.1KB/s 00:00 + ok + +When you drop files, git-annex will ssh over to the remote and make +sure the file's content is still there before removing it locally: + + # git annex drop my_cool_big_file + drop my_cool_big_file (checking origin..) ok + +Note that normally git-annex prefers to use non-ssh remotes, like +a USB drive, before ssh remotes. They are assumed to be faster/cheaper to +access, if available. There is a annex-cost setting you can configure in +`.git/config` to adjust which repositories it prefers. See +[[the_man_page|git-annex]] for details. + +Also, note that you need full shell access for this to work -- +git-annex needs to be able to ssh in and run commands. Or at least, +your shell needs to be able to run the [[git-annex-shell]] command. diff --git a/doc/walkthrough/using_the_SHA1_backend.mdwn b/doc/walkthrough/using_the_SHA1_backend.mdwn new file mode 100644 index 000000000..c04729e2c --- /dev/null +++ b/doc/walkthrough/using_the_SHA1_backend.mdwn @@ -0,0 +1,11 @@ +Another handy alternative to the default [[backend|backends]] is the +SHA1 backend. This backend provides more git-style assurance that your data +has not been damaged. And the checksum means that when you add the same +content to the annex twice, only one copy need be stored in the backend. + +The only reason it's not the default is that it needs to checksum +files when they're added to the annex, and this can slow things down +significantly for really big files. To make SHA1 the default, just +add something like this to `.gitattributes`: + + * annex.backend=SHA1 diff --git a/doc/walkthrough/using_the_URL_backend.mdwn b/doc/walkthrough/using_the_URL_backend.mdwn new file mode 100644 index 000000000..fe79a6be2 --- /dev/null +++ b/doc/walkthrough/using_the_URL_backend.mdwn @@ -0,0 +1,24 @@ +git-annex has multiple key-value [[backends]]. So far this walkthrough has +demonstrated the default, WORM (Write Once, Read Many) backend. + +Another handy backend is the URL backend, which can fetch file's content +from remote URLs. Here's how to set up some files in your repository +that use this backend: + + # git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile + fromkey somefile ok + # git commit -m "added a file from the Internet Archive" + +Now you if you ask git-annex to get that file, it will download it, +and cache it locally. + + # git annex get somefile + get somefile (downloading) + #########################################################################100.0% + ok + +You can always drop files downloaded by the URL backend. It is assumed +that the URL is stable; no local backup is kept. + + # git annex drop somefile + drop somefile (ok) -- cgit v1.2.3