diff options
author | Joey Hess <joeyh@joeyh.name> | 2017-09-08 15:41:31 -0400 |
---|---|---|
committer | Joey Hess <joeyh@joeyh.name> | 2017-09-08 15:46:24 -0400 |
commit | 5ef1c9b5690057e5b18dc7dcc3627776b400c544 (patch) | |
tree | f71d9ad13509977736bd55698cb4ccc18311e091 /doc/tips | |
parent | 23f55c0efdd58f8024d9b0c9e4b02db7b8d27b61 (diff) |
S3 export (untested)
It opens a http connection per file exported, but then so does git
annex copy --to s3.
Decided not to munge exported filenames for IA. Too large a chance of
the munging having confusing results. Instead, export of files not
supported by IA, eg with spaces in their name, will fail.
This commit was supported by the NSF-funded DataLad project.
Diffstat (limited to 'doc/tips')
-rw-r--r-- | doc/tips/Internet_Archive_via_S3.mdwn | 35 |
1 files changed, 9 insertions, 26 deletions
diff --git a/doc/tips/Internet_Archive_via_S3.mdwn b/doc/tips/Internet_Archive_via_S3.mdwn index 15f241c9f..20d14bdec 100644 --- a/doc/tips/Internet_Archive_via_S3.mdwn +++ b/doc/tips/Internet_Archive_via_S3.mdwn @@ -55,31 +55,14 @@ from it. Also, git-annex whereis will tell you a public url for the file on archive.org. (It may take a while for archive.org to make the file publically visibile.) -Note the use of the SHA256E [[backend|backends]] when adding files. That is -the default backend used by git-annex, but even if you don't normally use -it, it makes most sense to use the WORM or SHA256E backend for files that -will be stored in the Internet Archive, since the key name will be exposed -as the filename there, and since the Archive does special processing of -files based on their extension. +## exporting trees -## publishing only one subdirectory +By default, files stored in the Internet Archive will show up there named +by their git-annex key, not the original filename. If the filenames +are important, you can run `git annex initremote` with an additional +parameter "exporttree=yes", and then use [[git-annex-export]] to publish +a tree of files to the Internet Archive. -Perhaps you have a repository with lots of files in it, and only want -to publish some of them to a particular Internet Archive item. Of course -you can specify which files to send manually, but it's useful to -configure [[preferred_content]] settings so git-annex knows what content -you want to store in the Internet Archive. - -One way to do this is using the "public" repository type. - - git annex enableremote archive-panama preferreddir=panama - git annex wanted archive-panama standard - git annex group archive-panama public - -Now anything in a "panama" directory will be sent to that remote, -and anything else won't. You can use `git annex copy --auto` or the -assistant and it'll do the right thing. - -When setting up an Internet Archive item using the webapp, this -configuration is automatically done, using an item name that the user -enters as the name of the subdirectory. +Note that the Internet Archive does not support filenames containing +whitespace and some other characters. Exporting such problem filenames will +fail; you can rename the file and re-export. |