aboutsummaryrefslogtreecommitdiff
path: root/doc/tips
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2011-10-17 13:56:36 -0400
committerGravatar Joey Hess <joey@kitenet.net>2011-10-17 13:56:36 -0400
commit617bdc740f76e0b5cb8d73a8b122cd2b3e6fe961 (patch)
tree92c932685e19b1df6bd453810e9a4052cdf92f3e /doc/tips
parent66fa4c947c30ca9848121912229f3e84a855a74f (diff)
reorg
Diffstat (limited to 'doc/tips')
-rw-r--r--doc/tips/Internet_Archive_via_S3.mdwn49
-rw-r--r--doc/tips/migrating_data_to_a_new_backend.mdwn16
-rw-r--r--doc/tips/powerful_file_matching.mdwn36
-rw-r--r--doc/tips/recover_data_from_lost+found.mdwn19
-rw-r--r--doc/tips/untrusted_repositories.mdwn28
-rw-r--r--doc/tips/using_Amazon_S3.mdwn37
-rw-r--r--doc/tips/using_the_SHA1_backend.mdwn11
-rw-r--r--doc/tips/using_the_web.mdwn32
-rw-r--r--doc/tips/what_to_do_when_you_lose_a_repository.mdwn19
9 files changed, 247 insertions, 0 deletions
diff --git a/doc/tips/Internet_Archive_via_S3.mdwn b/doc/tips/Internet_Archive_via_S3.mdwn
new file mode 100644
index 000000000..8c0f2dde7
--- /dev/null
+++ b/doc/tips/Internet_Archive_via_S3.mdwn
@@ -0,0 +1,49 @@
+[The Internet Archive](http://www.archive.org/) allows members to upload
+collections using an Amazon S3
+[compatible API](http://www.archive.org/help/abouts3.txt), and this can
+be used with git-annex's [[special_remotes/S3]] support.
+
+So, you can locally archive things with git-annex, define remotes that
+correspond to "items" at the Internet Archive, and use git-annex to upload
+your files to there. Of course, your use of the Internet Archive must
+comply with their [terms of service](http://www.archive.org/about/terms.php).
+
+Sign up for an account, and get your access keys here:
+<http://www.archive.org/account/s3.php>
+
+ # export AWS_ACCESS_KEY_ID=blahblah
+ # export AWS_SECRET_ACCESS_KEY=xxxxxxx
+
+Specify `host=s3.us.archive.org` when doing `initremote` to set up
+a remote at the Archive. This will enable a special Internet Archive mode:
+Encryption is not allowed; you are required to specify a bucket name
+rather than having git-annex pick a random one; and you can optionally
+specify `x-archive-meta*` headers to add metadata as explained in their
+[documentation](http://www.archive.org/help/abouts3.txt).
+
+[[!template id=note text="""
+/!\ There seems to be a bug in either hS3 or the archive that breaks
+authentication when the bucket name contains spaces or upper-case letters..
+use all lowercase and no spaces when making the bucket with `initremote`.
+"""]]
+
+ # git annex initremote archive-panama type=S3 \
+ host=s3.us.archive.org bucket=panama-canal-lock-blueprints \
+ x-archive-meta-mediatype=texts x-archive-meta-language=eng \
+ x-archive-meta-title="original Panama Canal lock design blueprints"
+ initremote archive-panama (Internet Archive mode) ok
+ # git annex describe archive-panama "a man, a plan, a canal: panama"
+ describe archive-panama ok
+
+Then you can annex files and copy them to the remote as usual:
+
+ # git annex add photo1.jpeg --backend=SHA1E
+ add photo1.jpeg (checksum...) ok
+ # git annex copy photo1.jpeg --fast --to archive-panama
+ copy (to archive-panama...) ok
+
+Note the use of the SHA1E [[backend|backends]]. It makes most sense
+to use the WORM or SHA1E backend for files that will be stored in
+the Internet Archive, since the key name will be exposed as the filename
+there, and since the Archive does special processing of files based on
+their extension.
diff --git a/doc/tips/migrating_data_to_a_new_backend.mdwn b/doc/tips/migrating_data_to_a_new_backend.mdwn
new file mode 100644
index 000000000..b9acb8bd1
--- /dev/null
+++ b/doc/tips/migrating_data_to_a_new_backend.mdwn
@@ -0,0 +1,16 @@
+Maybe you started out using the WORM backend, and have now configured
+git-annex to use SHA1. But files you added to the annex before still
+use the WORM backend. There is a simple command that can migrate that
+data:
+
+ # git annex migrate my_cool_big_file
+ migrate my_cool_big_file (checksum...) ok
+
+You can only migrate files whose content is currently available. Other
+files will be skipped.
+
+After migrating a file to a new backend, the old content in the old backend
+will still be present. That is necessary because multiple files
+can point to the same content. The `git annex unused` subcommand can be
+used to clear up that detritus later. Note that hard links are used,
+to avoid wasting disk space.
diff --git a/doc/tips/powerful_file_matching.mdwn b/doc/tips/powerful_file_matching.mdwn
new file mode 100644
index 000000000..d5d29377c
--- /dev/null
+++ b/doc/tips/powerful_file_matching.mdwn
@@ -0,0 +1,36 @@
+git-annex has a powerful syntax for making it act on only certian files.
+
+The simplest thing is to exclude some files, using wild cards:
+
+ git annex get --exclude '*.mp3' --exclude '*.ogg'
+
+But you can also exclude files that git-annex's [[location_tracking]]
+information indicates are present in a given repository. For example,
+if you want to populate newarchive with files, but not those already
+on oldarchive, you could do it like this:
+
+ git annex copy --not --in oldarchive --to newarchive
+
+Without the --not, --in makes it act on files that *are* in the specified
+repository. So, to remove files that are on oldarchive:
+
+ git annex drop --in oldarchive
+
+Or maybe you're curious which files have a lot of copies, and then
+also want to know which files have only one copy:
+
+ git annex find --copies 7
+ git annex find --not --copies 2
+
+The above are the simple examples of specifying what files git-annex
+should act on. But you can specify anything you can dream up by combining
+the things above, with --and --or -( and -). Those last two strange-looking
+options are parentheses, for grouping other options. You will probably
+have to escape them from your shell.
+
+Here are the mp3 files that are in either of two repositories, but have
+less than 3 copies:
+
+ git annex find --not --exclude '*.mp3' --and \
+ -\( --in usbdrive --or --in archive -\) --and \
+ --not --copies 3
diff --git a/doc/tips/recover_data_from_lost+found.mdwn b/doc/tips/recover_data_from_lost+found.mdwn
new file mode 100644
index 000000000..48ef2a1d7
--- /dev/null
+++ b/doc/tips/recover_data_from_lost+found.mdwn
@@ -0,0 +1,19 @@
+Suppose something goes wrong, and fsck puts all the files in lost+found.
+It's actually very easy to recover from this disaster.
+
+First, check out the git repository again. Then, in the new checkout:
+
+ $ mkdir recovered-content
+ $ sudo mv ../lost+found/* recovered-content
+ $ sudo chown you:you recovered-content
+ $ chmod -R u+w recovered-content
+ $ git annex add recovered-content
+ $ git rm recovered-content
+ $ git commit -m "recovered some content"
+ $ git annex fsck
+
+The way that works is that when git-annex adds the same content that was in
+the repository before, all the old links to that content start working
+again. This works particularly well if the SHA* backends are used, but even
+with the default backend it will work pretty well, as long as fsck
+preserved the modification time of the files.
diff --git a/doc/tips/untrusted_repositories.mdwn b/doc/tips/untrusted_repositories.mdwn
new file mode 100644
index 000000000..cdb5da7c3
--- /dev/null
+++ b/doc/tips/untrusted_repositories.mdwn
@@ -0,0 +1,28 @@
+Suppose you have a USB thumb drive and are using it as a git annex
+repository. You don't trust the drive, because you could lose it, or
+accidentally run it through the laundry. Or, maybe you have a drive that
+you know is dying, and you'd like to be warned if there are any files
+on it not backed up somewhere else. Maybe the drive has already died
+or been lost.
+
+You can let git-annex know that you don't trust a repository, and it will
+adjust its behavior to avoid relying on that repositories's continued
+availability.
+
+ # git annex untrust usbdrive
+ untrust usbdrive ok
+
+Now when you do a fsck, you'll be warned appropriately:
+
+ # git annex fsck .
+ fsck my_big_file
+ Only these untrusted locations may have copies of this file!
+ 05e296c4-2989-11e0-bf40-bad1535567fe -- portable USB drive
+ Back it up to trusted locations with git-annex copy.
+ failed
+
+Also, git-annex will refuse to drop a file from elsewhere just because
+it can see a copy on the untrusted repository.
+
+It's also possible to tell git-annex that you have an unusually high
+level of trust for a repository. See [[trust]] for details.
diff --git a/doc/tips/using_Amazon_S3.mdwn b/doc/tips/using_Amazon_S3.mdwn
new file mode 100644
index 000000000..b59ca9b4f
--- /dev/null
+++ b/doc/tips/using_Amazon_S3.mdwn
@@ -0,0 +1,37 @@
+git-annex extends git's usual remotes with some [[special_remotes]], that
+are not git repositories. This way you can set up a remote using say,
+Amazon S3, and use git-annex to transfer files into the cloud.
+
+First, export your S3 credentials:
+
+ # export ANNEX_S3_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
+ # export ANNEX_S3_SECRET_ACCESS_KEY="s3kr1t"
+
+Now, create a gpg key, if you don't already have one. This will be used
+to encrypt everything stored in S3, for your privacy. Once you have
+a gpg key, run `gpg --list-secret-keys` to look up its key id, something
+like "2512E3C7"
+
+Next, create the S3 remote, and describe it.
+
+ # git annex initremote cloud type=S3 encryption=2512E3C7
+ initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok
+ # git annex describe cloud "at Amazon's US datacenter"
+ describe cloud ok
+
+The configuration for the S3 remote is stored in git. So to make another
+repository use the same S3 remote is easy:
+
+ # cd /media/usb/annex
+ # git pull laptop
+ # git annex initremote cloud
+ initremote cloud (gpg) (checking bucket) ok
+
+Now the remote can be used like any other remote.
+
+ # git annex copy my_cool_big_file --to cloud
+ copy my_cool_big_file (gpg) (checking cloud...) (to cloud...) ok
+ # git annex move video/hackity_hack_and_kaxxt.mov --to cloud
+ move video/hackity_hack_and_kaxxt.mov (checking cloud...) (to cloud...) ok
+
+See [[special_remotes/S3]] for details.
diff --git a/doc/tips/using_the_SHA1_backend.mdwn b/doc/tips/using_the_SHA1_backend.mdwn
new file mode 100644
index 000000000..70dc2ef75
--- /dev/null
+++ b/doc/tips/using_the_SHA1_backend.mdwn
@@ -0,0 +1,11 @@
+A handy alternative to the default [[backend|backends]] is the
+SHA1 backend. This backend provides more git-style assurance that your data
+has not been damaged. And the checksum means that when you add the same
+content to the annex twice, only one copy need be stored in the backend.
+
+The only reason it's not the default is that it needs to checksum
+files when they're added to the annex, and this can slow things down
+significantly for really big files. To make SHA1 the default, just
+add something like this to `.gitattributes`:
+
+ * annex.backend=SHA1
diff --git a/doc/tips/using_the_web.mdwn b/doc/tips/using_the_web.mdwn
new file mode 100644
index 000000000..8009927a4
--- /dev/null
+++ b/doc/tips/using_the_web.mdwn
@@ -0,0 +1,32 @@
+The web can be used as a [[special_remote|special_remotes]] too.
+
+ # git annex addurl http://example.com/video.mpeg
+ addurl example.com_video.mpeg (downloading http://example.com/video.mpeg)
+ ########################################################## 100.0%
+ ok
+
+Now the file is downloaded, and has been added to the annex like any other
+file. So it can be renamed, copied to other repositories, and so on.
+
+Note that git-annex assumes that, if the web site does not 404, the file is
+still present on the web, and this counts as one [[copy|copies]] of the
+file. So it will let you remove your last copy, trusting it can be
+downloaded again:
+
+ # git annex drop example.com_video.mpeg
+ drop example.com_video.mpeg (checking http://example.com/video.mpeg) ok
+
+If you don't [[trust]] the web to this degree, just let git-annex know:
+
+ # git annex untrust web
+ untrust web ok
+
+With the result that it will hang onto files:
+
+ # git annex drop example.com_video.mpeg
+ drop example.com_video.mpeg (unsafe)
+ Could only verify the existence of 0 out of 1 necessary copies
+ Also these untrusted repositories may contain the file:
+ 00000000-0000-0000-0000-000000000001 -- web
+ (Use --force to override this check, or adjust annex.numcopies.)
+ failed
diff --git a/doc/tips/what_to_do_when_you_lose_a_repository.mdwn b/doc/tips/what_to_do_when_you_lose_a_repository.mdwn
new file mode 100644
index 000000000..16a55b37b
--- /dev/null
+++ b/doc/tips/what_to_do_when_you_lose_a_repository.mdwn
@@ -0,0 +1,19 @@
+So you lost a thumb drive containing a git-annex repository. Or a hard
+drive died or some other misfortune has befallen your data.
+
+Unless you configured backups, git-annex can't get your data back. But it
+can help you deal with the loss.
+
+First, go somewhere that knows about the lost repository, and mark it as
+untrusted.
+
+ git annex untrust usbdrive
+
+To remind yourself later what happened, you can change its description, too:
+
+ git annex describe usbdrive "USB drive lost in Timbuktu. Probably gone forever."
+
+This retains the [[location_tracking]] information for the repository.
+Maybe you'll find the drive later. Maybe that's impossible. Either way,
+this lets git-annex tell you why a file is no longer accessible, and
+it avoids it relying on that drive to hold any content.