aboutsummaryrefslogtreecommitdiff
path: root/doc/tips
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2017-09-08 15:41:31 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2017-09-08 15:46:24 -0400
commit5ef1c9b5690057e5b18dc7dcc3627776b400c544 (patch)
treef71d9ad13509977736bd55698cb4ccc18311e091 /doc/tips
parent23f55c0efdd58f8024d9b0c9e4b02db7b8d27b61 (diff)
S3 export (untested)
It opens a http connection per file exported, but then so does git annex copy --to s3. Decided not to munge exported filenames for IA. Too large a chance of the munging having confusing results. Instead, export of files not supported by IA, eg with spaces in their name, will fail. This commit was supported by the NSF-funded DataLad project.
Diffstat (limited to 'doc/tips')
-rw-r--r--doc/tips/Internet_Archive_via_S3.mdwn35
1 files changed, 9 insertions, 26 deletions
diff --git a/doc/tips/Internet_Archive_via_S3.mdwn b/doc/tips/Internet_Archive_via_S3.mdwn
index 15f241c9f..20d14bdec 100644
--- a/doc/tips/Internet_Archive_via_S3.mdwn
+++ b/doc/tips/Internet_Archive_via_S3.mdwn
@@ -55,31 +55,14 @@ from it. Also, git-annex whereis will tell you a public url for the file
on archive.org. (It may take a while for archive.org to make the file
publically visibile.)
-Note the use of the SHA256E [[backend|backends]] when adding files. That is
-the default backend used by git-annex, but even if you don't normally use
-it, it makes most sense to use the WORM or SHA256E backend for files that
-will be stored in the Internet Archive, since the key name will be exposed
-as the filename there, and since the Archive does special processing of
-files based on their extension.
+## exporting trees
-## publishing only one subdirectory
+By default, files stored in the Internet Archive will show up there named
+by their git-annex key, not the original filename. If the filenames
+are important, you can run `git annex initremote` with an additional
+parameter "exporttree=yes", and then use [[git-annex-export]] to publish
+a tree of files to the Internet Archive.
-Perhaps you have a repository with lots of files in it, and only want
-to publish some of them to a particular Internet Archive item. Of course
-you can specify which files to send manually, but it's useful to
-configure [[preferred_content]] settings so git-annex knows what content
-you want to store in the Internet Archive.
-
-One way to do this is using the "public" repository type.
-
- git annex enableremote archive-panama preferreddir=panama
- git annex wanted archive-panama standard
- git annex group archive-panama public
-
-Now anything in a "panama" directory will be sent to that remote,
-and anything else won't. You can use `git annex copy --auto` or the
-assistant and it'll do the right thing.
-
-When setting up an Internet Archive item using the webapp, this
-configuration is automatically done, using an item name that the user
-enters as the name of the subdirectory.
+Note that the Internet Archive does not support filenames containing
+whitespace and some other characters. Exporting such problem filenames will
+fail; you can rename the file and re-export.