diff options
Diffstat (limited to 'doc/walkthrough.mdwn')
-rw-r--r-- | doc/walkthrough.mdwn | 441 |
1 files changed, 21 insertions, 420 deletions
diff --git a/doc/walkthrough.mdwn b/doc/walkthrough.mdwn index d08b247f7..896b560ec 100644 --- a/doc/walkthrough.mdwn +++ b/doc/walkthrough.mdwn @@ -2,423 +2,24 @@ A walkthrough of the basic features of git-annex. [[!toc]] -## creating a repository - -This is very straightforward. Just tell it a description of the repository. - - # mkdir ~/annex - # cd ~/annex - # git init - # git annex init "my laptop" - -## adding a remote - -Like any other git repository, git-annex repositories have remotes. -Let's start by adding a USB drive as a remote. - - # sudo mount /media/usb - # cd /media/usb - # git clone ~/annex - # cd annex - # git annex init "portable USB drive" - # git remote add laptop ~/annex - # cd ~/annex - # git remote add usbdrive /media/usb - -This is all standard ad-hoc distributed git repository setup. -The only git-annex specific part is telling it the name -of the new repository created on the USB drive. - -Notice that both repos are set up as remotes of one another. This lets -either get annexed files from the other. You'll want to do that even -if you are using git in a more centralized fashion. - -## adding files - - # cd ~/annex - # cp /tmp/big_file . - # cp /tmp/debian.iso . - # git annex add . - add big_file ok - add debian.iso ok - # git commit -a -m added - -When you add a file to the annex and commit it, only a symlink to -the annexed content is committed. The content itself is stored in -git-annex's backend. - -## renaming files - - # cd ~/annex - # git mv big_file my_cool_big_file - # mkdir iso - # git mv debian.iso iso/ - # git commit -m moved - -You can use any normal git operations to move files around, or even -make copies or delete them. - -Notice that, since annexed files are represented by symlinks, -the symlink will break when the file is moved into a subdirectory. -But, git-annex will fix this up for you when you commit -- -it has a pre-commit hook that watches for and corrects broken symlinks. - -## getting file content - -A repository does not always have all annexed file contents available. -When you need the content of a file, you can use "git annex get" to -make it available. - -We can use this to copy everything in the laptop's annex to the -USB drive. - - # cd /media/usb/annex - # git pull laptop master - # git annex get . - get my_cool_big_file (copying from laptop...) ok - get iso/debian.iso (copying from laptop...) ok - -Notice that you had to git pull from laptop first, this lets git-annex know -what has changed in laptop, and so it knows about the files present there and -can get them. - -## transferring files: When things go wrong - -After a while, you'll have several annexes, with different file contents. -You don't have to try to keep all that straight; git-annex does -[[location_tracking]] for you. If you ask it to get a file and the drive -or file server is not accessible, it will let you know what it needs to get -it: - - # git annex get video/hackity_hack_and_kaxxt.mov - get video/_why_hackity_hack_and_kaxxt.mov (not available) - Unable to access these remotes: usbdrive, server - Try making some of these repositories available: - 5863d8c0-d9a9-11df-adb2-af51e6559a49 -- my home file server - 58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive - ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive - failed - # sudo mount /media/usb - # git annex get video/hackity_hack_and_kaxxt.mov - get video/hackity_hack_and_kaxxt.mov (copying from usbdrive...) ok - # git commit -a -m "got a video I want to rewatch on the plane" - -## removing files - -You can always drop files safely. Git-annex checks that some other annex -has the file before removing it. - - # git annex drop iso/debian.iso - drop iso/Debian_5.0.iso ok - # git commit -a -m "freed up space" - -## removing files: When things go wrong - -Before dropping a file, git-annex wants to be able to look at other -remotes, and verify that they still have a file. After all, it could -have been dropped from them too. If the remotes are not mounted/available, -you'll see something like this. - - # git annex drop important_file other.iso - drop important_file (unsafe) - Could only verify the existence of 0 out of 1 necessary copies - Unable to access these remotes: usbdrive - Try making some of these repositories available: - 58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive - ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive - (Use --force to override this check, or adjust annex.numcopies.) - failed - drop other.iso (unsafe) - Could only verify the existence of 0 out of 1 necessary copies - No other repository is known to contain the file. - (Use --force to override this check, or adjust annex.numcopies.) - failed - -Here you might --force it to drop `important_file` if you [[trust]] your backup. -But `other.iso` looks to have never been copied to anywhere else, so if -it's something you want to hold onto, you'd need to transfer it to -some other repository before dropping it. - -## modifying annexed files - -Normally, the content of files in the annex is prevented from being modified. -That's a good thing, because it might be the only copy, you wouldn't -want to lose it in a fumblefingered mistake. - - # echo oops > my_cool_big_file - bash: my_cool_big_file: Permission denied - -In order to modify a file, it should first be unlocked. - - # git annex unlock my_cool_big_file - unlock my_cool_big_file (copying...) ok - -That replaces the symlink that normally points at its content with a copy -of the content. You can then modify the file like any regular file. Because -it is a regular file. - -(If you decide you don't need to modify the file after all, or want to discard -modifications, just use `git annex lock`.) - -When you `git commit`, git-annex's pre-commit hook will automatically -notice that you are committing an unlocked file, and add its new content -to the annex. The file will be replaced with a symlink to the new content, -and this symlink is what gets committed to git in the end. - - # echo "now smaller, but even cooler" > my_cool_big_file - # git commit my_cool_big_file -m "changed an annexed file" - add my_cool_big_file ok - [master 64cda67] changed an annexed file - 2 files changed, 2 insertions(+), 1 deletions(-) - create mode 100644 .git-annex/WORM:1289672605:30:file.log - -There is one problem with using `git commit` like this: Git wants to first -stage the entire contents of the file in its index. That can be slow for -big files (sorta why git-annex exists in the first place). So, the -automatic handling on commit is a nice safety feature, since it prevents -the file content being accidentally committed into git. But when working with -big files, it's faster to explicitly add them to the annex yourself -before committing. - - # echo "now smaller, but even cooler yet" > my_cool_big_file - # git annex add my_cool_big_file - add my_cool_big_file ok - # git commit my_cool_big_file -m "changed an annexed file" - -## using ssh remotes - -So far in this walkthrough, git-annex has been used with a remote -repository on a USB drive. But it can also be used with a git remote -that is truely remote, a host accessed by ssh. - -Say you have a desktop on the same network as your laptop and want -to clone the laptop's annex to it: - - # git clone ssh://mylaptop/home/me/annex ~/annex - # cd ~/annex - # git annex init "my desktop" - -Now you can get files and they will be transferred (using `rsync`): - - # git annex get my_cool_big_file - get my_cool_big_file (getting UUID for origin...) (copying from origin...) - WORM:1285650548:2159:my_cool_big_file 100% 2159 2.1KB/s 00:00 - ok - -When you drop files, git-annex will ssh over to the remote and make -sure the file's content is still there before removing it locally: - - # git annex drop my_cool_big_file - drop my_cool_big_file (checking origin..) ok - -Note that normally git-annex prefers to use non-ssh remotes, like -a USB drive, before ssh remotes. They are assumed to be faster/cheaper to -access, if available. There is a annex-cost setting you can configure in -`.git/config` to adjust which repositories it prefers. See -[[the_man_page|git-annex]] for details. - -Also, note that you need full shell access for this to work -- -git-annex needs to be able to ssh in and run commands. - -## moving file content between repositories - -Often you will want to move some file contents from a repository to some -other one. For example, your laptop's disk is getting full; time to move -some files to an external disk before moving another file from a file -server to your laptop. Doing that by hand (by using `git annex get` and -`git annex drop`) is possible, but a bit of a pain. `git annex move` -makes it very easy. - - # git annex move my_cool_big_file --to usbdrive - move my_cool_big_file (moving to usbdrive...) ok - # git annex move video/hackity_hack_and_kaxxt.mov --from fileserver - move video/hackity_hack_and_kaxxt.mov (moving from fileserver...) - WORM:1274316523:86050597:hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02 - ok - -## using the URL backend - -git-annex has multiple key-value [[backends]]. So far this walkthrough has -demonstrated the default, WORM (Write Once, Read Many) backend. - -Another handy backend is the URL backend, which can fetch file's content -from remote URLs. Here's how to set up some files in your repository -that use this backend: - - # git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile - fromkey somefile ok - # git commit -m "added a file from the Internet Archive" - -Now you if you ask git-annex to get that file, it will download it, -and cache it locally. - - # git annex get somefile - get somefile (downloading) - #########################################################################100.0% - ok - -You can always drop files downloaded by the URL backend. It is assumed -that the URL is stable; no local backup is kept. - - # git annex drop somefile - drop somefile (ok) - -## using the SHA1 backend - -Another handy alternative to the default [[backend|backends]] is the -SHA1 backend. This backend provides more git-style assurance that your data -has not been damaged. And the checksum means that when you add the same -content to the annex twice, only one copy need be stored in the backend. - -The only reason it's not the default is that it needs to checksum -files when they're added to the annex, and this can slow things down -significantly for really big files. To make SHA1 the default, just -add something like this to `.gitattributes`: - - * annex.backend=SHA1 - -## migrating data to a new backend - -Maybe you started out using the WORM backend, and have now configured -git-annex to use SHA1. But files you added to the annex before still -use the WORM backend. There is a simple command that can migrate that -data: - - # git annex migrate my_cool_big_file - migrate my_cool_big_file (checksum...) ok - -You can only migrate files whose content is currently available. Other -files will be skipped. - -After migrating a file to a new backend, the old content in the old backend -will still be present. That is necessary because multiple files -can point to the same content. The `git annex unused` subcommand can be -used to clear up that detritus later. Note that hard links are used, -to avoid wasting disk space. - -## unused data - -It's possible for data to accumulate in the annex that no files point to -anymore. One way it can happen is if you `git rm` a file without -first calling `git annex drop`. And, when you modify an annexed file, the old -content of the file remains in the annex. Another way is when migrating -between backends. - -This might be historical data you want to preserve, so git-annex defaults to -preserving it. So from time to time, you may want to check for such data and -eliminate it to save space. - - # git annex unused - unused (checking for unused data...) - Some annexed data is no longer pointed to by any files in the repository. - NUMBER KEY - 1 WORM:1289672605:3:file - 2 WORM:1289672605:14:file - (To see where data was previously used, try: git log --stat -S'KEY') - (To remove unwanted data: git-annex dropunused NUMBER) - ok - -After running `git annex unused`, you can follow the instructions to examine -the history of files that used the data, and if you decide you don't need that -data anymore, you can easily remove it: - - # git annex dropunused 1 - dropunused 1 ok - -Hint: To drop a lot of unused data, use a command like this: - - # git annex dropunused `seq 1 1000` - -## fsck: verifying your data - -You can use the fsck subcommand to check for problems in your data. -What can be checked depends on the [[backend|backends]] you've used to store -the data. For example, when you use the SHA1 backend, fsck will verify that -the checksums of your files are good. Fsck also checks that the annex.numcopies -setting is satisfied for all files. - - # git annex fsck - unused (checking for unused data...) ok - fsck my_cool_big_file (checksum...) ok - ... - -You can also specify the files to check. This is particularly useful if -you're using sha1 and don't want to spend a long time checksumming everything. - - # git annex fsck my_cool_big_file - fsck my_cool_big_file (checksum...) ok - -## fsck: When things go wrong - -Fsck never deletes possibly bad data; instead it will be moved to -`.git/annex/bad/` for you to recover. Here is a sample of what fsck -might say about a badly messed up annex: - - # git annex fsck - fsck my_cool_big_file (checksum...) - git-annex: Bad file content; moved to .git/annex/bad/SHA1:7da006579dd64330eb2456001fd01948430572f2 - git-annex: ** No known copies of the file exist! - failed - fsck important_file - git-annex: Only 1 of 2 copies exist. Run git annex get somewhere else to back it up. - failed - git-annex: 2 failed - -## backups - -git-annex can be configured to require more than one copy of a file exists, -as a simple backup for your data. This is controlled by the "annex.numcopies" -setting, which defaults to 1 copy. Let's change that to require 2 copies, -and send a copy of every file to a USB drive. - - # echo "* annex.numcopies=2" >> .gitattributes - # git annex copy . --to usbdrive - -Now when we try to `git annex drop` a file, it will verify that it -knows of 2 other repositories that have a copy before removing its -content from the current repository. - -You can also vary the number of copies needed, depending on the file name. -So, if you want 3 copies of all your flac files, but only 1 copy of oggs: - - # echo "*.ogg annex.numcopies=1" >> .gitattributes - # echo "*.flac annex.numcopies=3" >> .gitattributes - -Or, you might want to make a directory for important stuff, and configure -it so anything put in there is backed up more thoroughly: - - # mkdir important_stuff - # echo "* annex.numcopies=3" > important_stuff/.gitattributes - -For more details about the numcopies setting, see [[copies]]. - -## untrusted repositories - -Suppose you have a USB thumb drive and are using it as a git annex -repository. You don't trust the drive, because you could lose it, or -accidentally run it through the laundry. Or, maybe you have a drive that -you know is dying, and you'd like to be warned if there are any files -on it not backed up somewhere else. Maybe the drive has already died -or been lost. - -You can let git-annex know that you don't trust a repository, and it will -adjust its behavior to avoid relying on that repositories's continued -availability. - - # git annex untrust usbdrive - untrust usbdrive ok - -Now when you do a fsck, you'll be warned appropriately: - - # git annex fsck . - fsck my_big_file - Only these untrusted locations may have copies of this file! - 05e296c4-2989-11e0-bf40-bad1535567fe -- portable USB drive - Back it up to trusted locations with git-annex copy. - failed - -Also, git-annex will refuse to drop a file from elsewhere just because -it can see a copy on the untrusted repository. - -It's also possible to tell git-annex that you have an unusually high -level of trust for a repository. See [[trust]] for details. +[[!inline feeds=no pagenames=""" + creating_a_repository + adding_a_remote + adding_files + renaming_files + getting_file_content + transferring_files:_When_things_go_wrong + removing_files + removing_files:_When_things_go_wrong + modifying_annexed_files + using_ssh_remotes + moving_file_content_between_repositories + using_the_URL_backend + using_the_SHA1_backend + migrating_data_to_a_new_backend + unused_data + fsck:_verifying_your_data + fsck:_when_things_go_wrong + backups + untrusted_repositories +"""]] |