diff options
Diffstat (limited to 'doc/tips')
-rw-r--r-- | doc/tips/Internet_Archive_via_S3.mdwn | 49 | ||||
-rw-r--r-- | doc/tips/migrating_data_to_a_new_backend.mdwn | 16 | ||||
-rw-r--r-- | doc/tips/powerful_file_matching.mdwn | 36 | ||||
-rw-r--r-- | doc/tips/recover_data_from_lost+found.mdwn | 19 | ||||
-rw-r--r-- | doc/tips/untrusted_repositories.mdwn | 28 | ||||
-rw-r--r-- | doc/tips/using_Amazon_S3.mdwn | 37 | ||||
-rw-r--r-- | doc/tips/using_the_SHA1_backend.mdwn | 11 | ||||
-rw-r--r-- | doc/tips/using_the_web.mdwn | 32 | ||||
-rw-r--r-- | doc/tips/what_to_do_when_you_lose_a_repository.mdwn | 19 |
9 files changed, 247 insertions, 0 deletions
diff --git a/doc/tips/Internet_Archive_via_S3.mdwn b/doc/tips/Internet_Archive_via_S3.mdwn new file mode 100644 index 000000000..8c0f2dde7 --- /dev/null +++ b/doc/tips/Internet_Archive_via_S3.mdwn @@ -0,0 +1,49 @@ +[The Internet Archive](http://www.archive.org/) allows members to upload +collections using an Amazon S3 +[compatible API](http://www.archive.org/help/abouts3.txt), and this can +be used with git-annex's [[special_remotes/S3]] support. + +So, you can locally archive things with git-annex, define remotes that +correspond to "items" at the Internet Archive, and use git-annex to upload +your files to there. Of course, your use of the Internet Archive must +comply with their [terms of service](http://www.archive.org/about/terms.php). + +Sign up for an account, and get your access keys here: +<http://www.archive.org/account/s3.php> + + # export AWS_ACCESS_KEY_ID=blahblah + # export AWS_SECRET_ACCESS_KEY=xxxxxxx + +Specify `host=s3.us.archive.org` when doing `initremote` to set up +a remote at the Archive. This will enable a special Internet Archive mode: +Encryption is not allowed; you are required to specify a bucket name +rather than having git-annex pick a random one; and you can optionally +specify `x-archive-meta*` headers to add metadata as explained in their +[documentation](http://www.archive.org/help/abouts3.txt). + +[[!template id=note text=""" +/!\ There seems to be a bug in either hS3 or the archive that breaks +authentication when the bucket name contains spaces or upper-case letters.. +use all lowercase and no spaces when making the bucket with `initremote`. +"""]] + + # git annex initremote archive-panama type=S3 \ + host=s3.us.archive.org bucket=panama-canal-lock-blueprints \ + x-archive-meta-mediatype=texts x-archive-meta-language=eng \ + x-archive-meta-title="original Panama Canal lock design blueprints" + initremote archive-panama (Internet Archive mode) ok + # git annex describe archive-panama "a man, a plan, a canal: panama" + describe archive-panama ok + +Then you can annex files and copy them to the remote as usual: + + # git annex add photo1.jpeg --backend=SHA1E + add photo1.jpeg (checksum...) ok + # git annex copy photo1.jpeg --fast --to archive-panama + copy (to archive-panama...) ok + +Note the use of the SHA1E [[backend|backends]]. It makes most sense +to use the WORM or SHA1E backend for files that will be stored in +the Internet Archive, since the key name will be exposed as the filename +there, and since the Archive does special processing of files based on +their extension. diff --git a/doc/tips/migrating_data_to_a_new_backend.mdwn b/doc/tips/migrating_data_to_a_new_backend.mdwn new file mode 100644 index 000000000..b9acb8bd1 --- /dev/null +++ b/doc/tips/migrating_data_to_a_new_backend.mdwn @@ -0,0 +1,16 @@ +Maybe you started out using the WORM backend, and have now configured +git-annex to use SHA1. But files you added to the annex before still +use the WORM backend. There is a simple command that can migrate that +data: + + # git annex migrate my_cool_big_file + migrate my_cool_big_file (checksum...) ok + +You can only migrate files whose content is currently available. Other +files will be skipped. + +After migrating a file to a new backend, the old content in the old backend +will still be present. That is necessary because multiple files +can point to the same content. The `git annex unused` subcommand can be +used to clear up that detritus later. Note that hard links are used, +to avoid wasting disk space. diff --git a/doc/tips/powerful_file_matching.mdwn b/doc/tips/powerful_file_matching.mdwn new file mode 100644 index 000000000..d5d29377c --- /dev/null +++ b/doc/tips/powerful_file_matching.mdwn @@ -0,0 +1,36 @@ +git-annex has a powerful syntax for making it act on only certian files. + +The simplest thing is to exclude some files, using wild cards: + + git annex get --exclude '*.mp3' --exclude '*.ogg' + +But you can also exclude files that git-annex's [[location_tracking]] +information indicates are present in a given repository. For example, +if you want to populate newarchive with files, but not those already +on oldarchive, you could do it like this: + + git annex copy --not --in oldarchive --to newarchive + +Without the --not, --in makes it act on files that *are* in the specified +repository. So, to remove files that are on oldarchive: + + git annex drop --in oldarchive + +Or maybe you're curious which files have a lot of copies, and then +also want to know which files have only one copy: + + git annex find --copies 7 + git annex find --not --copies 2 + +The above are the simple examples of specifying what files git-annex +should act on. But you can specify anything you can dream up by combining +the things above, with --and --or -( and -). Those last two strange-looking +options are parentheses, for grouping other options. You will probably +have to escape them from your shell. + +Here are the mp3 files that are in either of two repositories, but have +less than 3 copies: + + git annex find --not --exclude '*.mp3' --and \ + -\( --in usbdrive --or --in archive -\) --and \ + --not --copies 3 diff --git a/doc/tips/recover_data_from_lost+found.mdwn b/doc/tips/recover_data_from_lost+found.mdwn new file mode 100644 index 000000000..48ef2a1d7 --- /dev/null +++ b/doc/tips/recover_data_from_lost+found.mdwn @@ -0,0 +1,19 @@ +Suppose something goes wrong, and fsck puts all the files in lost+found. +It's actually very easy to recover from this disaster. + +First, check out the git repository again. Then, in the new checkout: + + $ mkdir recovered-content + $ sudo mv ../lost+found/* recovered-content + $ sudo chown you:you recovered-content + $ chmod -R u+w recovered-content + $ git annex add recovered-content + $ git rm recovered-content + $ git commit -m "recovered some content" + $ git annex fsck + +The way that works is that when git-annex adds the same content that was in +the repository before, all the old links to that content start working +again. This works particularly well if the SHA* backends are used, but even +with the default backend it will work pretty well, as long as fsck +preserved the modification time of the files. diff --git a/doc/tips/untrusted_repositories.mdwn b/doc/tips/untrusted_repositories.mdwn new file mode 100644 index 000000000..cdb5da7c3 --- /dev/null +++ b/doc/tips/untrusted_repositories.mdwn @@ -0,0 +1,28 @@ +Suppose you have a USB thumb drive and are using it as a git annex +repository. You don't trust the drive, because you could lose it, or +accidentally run it through the laundry. Or, maybe you have a drive that +you know is dying, and you'd like to be warned if there are any files +on it not backed up somewhere else. Maybe the drive has already died +or been lost. + +You can let git-annex know that you don't trust a repository, and it will +adjust its behavior to avoid relying on that repositories's continued +availability. + + # git annex untrust usbdrive + untrust usbdrive ok + +Now when you do a fsck, you'll be warned appropriately: + + # git annex fsck . + fsck my_big_file + Only these untrusted locations may have copies of this file! + 05e296c4-2989-11e0-bf40-bad1535567fe -- portable USB drive + Back it up to trusted locations with git-annex copy. + failed + +Also, git-annex will refuse to drop a file from elsewhere just because +it can see a copy on the untrusted repository. + +It's also possible to tell git-annex that you have an unusually high +level of trust for a repository. See [[trust]] for details. diff --git a/doc/tips/using_Amazon_S3.mdwn b/doc/tips/using_Amazon_S3.mdwn new file mode 100644 index 000000000..b59ca9b4f --- /dev/null +++ b/doc/tips/using_Amazon_S3.mdwn @@ -0,0 +1,37 @@ +git-annex extends git's usual remotes with some [[special_remotes]], that +are not git repositories. This way you can set up a remote using say, +Amazon S3, and use git-annex to transfer files into the cloud. + +First, export your S3 credentials: + + # export ANNEX_S3_ACCESS_KEY_ID="08TJMT99S3511WOZEP91" + # export ANNEX_S3_SECRET_ACCESS_KEY="s3kr1t" + +Now, create a gpg key, if you don't already have one. This will be used +to encrypt everything stored in S3, for your privacy. Once you have +a gpg key, run `gpg --list-secret-keys` to look up its key id, something +like "2512E3C7" + +Next, create the S3 remote, and describe it. + + # git annex initremote cloud type=S3 encryption=2512E3C7 + initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok + # git annex describe cloud "at Amazon's US datacenter" + describe cloud ok + +The configuration for the S3 remote is stored in git. So to make another +repository use the same S3 remote is easy: + + # cd /media/usb/annex + # git pull laptop + # git annex initremote cloud + initremote cloud (gpg) (checking bucket) ok + +Now the remote can be used like any other remote. + + # git annex copy my_cool_big_file --to cloud + copy my_cool_big_file (gpg) (checking cloud...) (to cloud...) ok + # git annex move video/hackity_hack_and_kaxxt.mov --to cloud + move video/hackity_hack_and_kaxxt.mov (checking cloud...) (to cloud...) ok + +See [[special_remotes/S3]] for details. diff --git a/doc/tips/using_the_SHA1_backend.mdwn b/doc/tips/using_the_SHA1_backend.mdwn new file mode 100644 index 000000000..70dc2ef75 --- /dev/null +++ b/doc/tips/using_the_SHA1_backend.mdwn @@ -0,0 +1,11 @@ +A handy alternative to the default [[backend|backends]] is the +SHA1 backend. This backend provides more git-style assurance that your data +has not been damaged. And the checksum means that when you add the same +content to the annex twice, only one copy need be stored in the backend. + +The only reason it's not the default is that it needs to checksum +files when they're added to the annex, and this can slow things down +significantly for really big files. To make SHA1 the default, just +add something like this to `.gitattributes`: + + * annex.backend=SHA1 diff --git a/doc/tips/using_the_web.mdwn b/doc/tips/using_the_web.mdwn new file mode 100644 index 000000000..8009927a4 --- /dev/null +++ b/doc/tips/using_the_web.mdwn @@ -0,0 +1,32 @@ +The web can be used as a [[special_remote|special_remotes]] too. + + # git annex addurl http://example.com/video.mpeg + addurl example.com_video.mpeg (downloading http://example.com/video.mpeg) + ########################################################## 100.0% + ok + +Now the file is downloaded, and has been added to the annex like any other +file. So it can be renamed, copied to other repositories, and so on. + +Note that git-annex assumes that, if the web site does not 404, the file is +still present on the web, and this counts as one [[copy|copies]] of the +file. So it will let you remove your last copy, trusting it can be +downloaded again: + + # git annex drop example.com_video.mpeg + drop example.com_video.mpeg (checking http://example.com/video.mpeg) ok + +If you don't [[trust]] the web to this degree, just let git-annex know: + + # git annex untrust web + untrust web ok + +With the result that it will hang onto files: + + # git annex drop example.com_video.mpeg + drop example.com_video.mpeg (unsafe) + Could only verify the existence of 0 out of 1 necessary copies + Also these untrusted repositories may contain the file: + 00000000-0000-0000-0000-000000000001 -- web + (Use --force to override this check, or adjust annex.numcopies.) + failed diff --git a/doc/tips/what_to_do_when_you_lose_a_repository.mdwn b/doc/tips/what_to_do_when_you_lose_a_repository.mdwn new file mode 100644 index 000000000..16a55b37b --- /dev/null +++ b/doc/tips/what_to_do_when_you_lose_a_repository.mdwn @@ -0,0 +1,19 @@ +So you lost a thumb drive containing a git-annex repository. Or a hard +drive died or some other misfortune has befallen your data. + +Unless you configured backups, git-annex can't get your data back. But it +can help you deal with the loss. + +First, go somewhere that knows about the lost repository, and mark it as +untrusted. + + git annex untrust usbdrive + +To remind yourself later what happened, you can change its description, too: + + git annex describe usbdrive "USB drive lost in Timbuktu. Probably gone forever." + +This retains the [[location_tracking]] information for the repository. +Maybe you'll find the drive later. Maybe that's impossible. Either way, +this lets git-annex tell you why a file is no longer accessible, and +it avoids it relying on that drive to hold any content. |