diff options
author | Joey Hess <joeyh@joeyh.name> | 2017-09-12 12:33:08 -0400 |
---|---|---|
committer | Joey Hess <joeyh@joeyh.name> | 2017-09-12 12:35:58 -0400 |
commit | 96ddbf12195da0bd836f356a3b3637e449e91ca7 (patch) | |
tree | fbcc8f5f2c95bb632601e3be2874fb410bd1a280 | |
parent | 196aacee5e3e76d653246f501e0295006b3a7f20 (diff) |
S3: Allow removing files from IA, but warn about derived versions potentially still existing there.
Removal works, only derives are a potential issue, so allow removing
with a warning. This way, unexporting a file works, and behavior is
consistent with IA remotes whether or not exporttree=yes.
Also tested exporting filenames containing unicode, spaces, underscores.
All worked, despite the IA's faq saying it doesn't.
This commit was sponsored by Trenton Cronholm on Patreon.
-rw-r--r-- | CHANGELOG | 2 | ||||
-rw-r--r-- | Remote/S3.hs | 25 | ||||
-rw-r--r-- | doc/tips/Internet_Archive_via_S3.mdwn | 27 | ||||
-rw-r--r-- | doc/todo/export.mdwn | 2 |
4 files changed, 33 insertions, 23 deletions
@@ -9,6 +9,8 @@ git-annex (6.20170819) UNRELEASED; urgency=medium * Support building with feed-1.0, while still supporting older versions. * init: Display an additional message when it detects a filesystem that allows writing to files whose write bit is not set. + * S3: Allow removing files from IA, but warn about derived versions + potentially still existing there. -- Joey Hess <id@joeyh.name> Mon, 28 Aug 2017 12:20:59 -0400 diff --git a/Remote/S3.hs b/Remote/S3.hs index c7b72def5..396d2c388 100644 --- a/Remote/S3.hs +++ b/Remote/S3.hs @@ -278,14 +278,17 @@ retrieveCheap _ _ _ = return False - While it may remove the file, there are generally other files - derived from it that it does not remove. -} remove :: S3Info -> S3Handle -> Remover -remove info h k +remove info h k = warnIARemoval info $ do + res <- tryNonAsync $ sendS3Handle h $ + S3.DeleteObject (T.pack $ bucketObject info k) (bucket info) + return $ either (const False) (const True) res + +warnIARemoval :: S3Info -> Annex a -> Annex a +warnIARemoval info a | isIA info = do - warning "Cannot remove content from the Internet Archive" - return False - | otherwise = do - res <- tryNonAsync $ sendS3Handle h $ - S3.DeleteObject (T.pack $ bucketObject info k) (bucket info) - return $ either (const False) (const True) res + warning "Derived versions of removed file may still be present in the Internet Archive" + a + | otherwise = a checkKey :: Remote -> S3Info -> Maybe S3Handle -> CheckPresent checkKey r info Nothing k = case getpublicurl info of @@ -342,7 +345,7 @@ retrieveExportS3 r info _k loc f p = return True removeExportS3 :: Remote -> S3Info -> Key -> ExportLocation -> Annex Bool -removeExportS3 r info _k loc = +removeExportS3 r info _k loc = warnIARemoval info $ catchNonAsync go (\e -> warning (show e) >> return False) where go = withS3Handle (config r) (gitconfig r) (uuid r) $ \h -> do @@ -620,9 +623,9 @@ getBucketObject c = munge . key2file getBucketExportLocation :: RemoteConfig -> ExportLocation -> FilePath getBucketExportLocation c (ExportLocation loc) = getFilePrefix c ++ loc -{- Internet Archive limits filenames to a subset of ascii, - - with no whitespace. Other characters are xml entity - - encoded. -} +{- Internet Archive documentation limits filenames to a subset of ascii. + - While other characters seem to work now, this entity encodes everything + - else to avoid problems. -} iaMunge :: String -> String iaMunge = (>>= munge) where diff --git a/doc/tips/Internet_Archive_via_S3.mdwn b/doc/tips/Internet_Archive_via_S3.mdwn index 20d14bdec..be802b5b2 100644 --- a/doc/tips/Internet_Archive_via_S3.mdwn +++ b/doc/tips/Internet_Archive_via_S3.mdwn @@ -11,9 +11,10 @@ comply with their [terms of service](http://www.archive.org/about/terms.php). A nice added feature is that whenever git-annex sends a file to the Internet Archive, it records its url, the same as if you'd run `git annex addurl`. So any users who can clone your repository can download the files -from archive.org, without needing any login or password info. This makes -the Internet Archive a nice way to publish the large files associated with -a public git repository. +from archive.org, without needing any login or password info. +The url to the content in the Internet Archive is also displayed by +`git annex whereis`. This makes the Internet Archive a nice way to +publish the large files associated with a public git repository. ## webapp setup @@ -50,10 +51,15 @@ Then you can annex files and copy them to the remote as usual: # git annex copy photo1.jpeg --fast --to archive-panama copy (to archive-panama...) ok -Once a file has been stored on archive.org, it cannot be (easily) removed -from it. Also, git-annex whereis will tell you a public url for the file -on archive.org. (It may take a while for archive.org to make the file -publically visibile.) +It may take a while for archive.org to make files publically visible after +they've been uploaded. + +## removing files + +While files can be removed from the Internet Archive, +[derived versions](https://archive.org/help/derivatives.php) +of some files may continued to be stored there after the originals +were removed. git-annex warns about this problem. ## exporting trees @@ -63,6 +69,7 @@ are important, you can run `git annex initremote` with an additional parameter "exporttree=yes", and then use [[git-annex-export]] to publish a tree of files to the Internet Archive. -Note that the Internet Archive does not support filenames containing -whitespace and some other characters. Exporting such problem filenames will -fail; you can rename the file and re-export. +Note that the Internet Archive may not support certian characters +in filenames ([see FAQ](http://archive.org/about/faqs.php#1099)). +If exporting a filename fails due to such limitations, you would need +to rename it in your git annex repository in order to export it. diff --git a/doc/todo/export.mdwn b/doc/todo/export.mdwn index ac77b3d72..43d4d0e8c 100644 --- a/doc/todo/export.mdwn +++ b/doc/todo/export.mdwn @@ -29,8 +29,6 @@ Work is in progress. Todo list: Would need git-annex sync to export to the master tree? This is similar to the little-used preferreddir= preferred content setting and the "public" repository group. -* Test export to IA via S3. In particualar, does removing an exported file - work? Low priority: |