aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2017-09-12 12:33:08 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2017-09-12 12:35:58 -0400
commit96ddbf12195da0bd836f356a3b3637e449e91ca7 (patch)
treefbcc8f5f2c95bb632601e3be2874fb410bd1a280
parent196aacee5e3e76d653246f501e0295006b3a7f20 (diff)
S3: Allow removing files from IA, but warn about derived versions potentially still existing there.
Removal works, only derives are a potential issue, so allow removing with a warning. This way, unexporting a file works, and behavior is consistent with IA remotes whether or not exporttree=yes. Also tested exporting filenames containing unicode, spaces, underscores. All worked, despite the IA's faq saying it doesn't. This commit was sponsored by Trenton Cronholm on Patreon.
-rw-r--r--CHANGELOG2
-rw-r--r--Remote/S3.hs25
-rw-r--r--doc/tips/Internet_Archive_via_S3.mdwn27
-rw-r--r--doc/todo/export.mdwn2
4 files changed, 33 insertions, 23 deletions
diff --git a/CHANGELOG b/CHANGELOG
index 137e4e970..b4a80b2aa 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -9,6 +9,8 @@ git-annex (6.20170819) UNRELEASED; urgency=medium
* Support building with feed-1.0, while still supporting older versions.
* init: Display an additional message when it detects a filesystem that
allows writing to files whose write bit is not set.
+ * S3: Allow removing files from IA, but warn about derived versions
+ potentially still existing there.
-- Joey Hess <id@joeyh.name> Mon, 28 Aug 2017 12:20:59 -0400
diff --git a/Remote/S3.hs b/Remote/S3.hs
index c7b72def5..396d2c388 100644
--- a/Remote/S3.hs
+++ b/Remote/S3.hs
@@ -278,14 +278,17 @@ retrieveCheap _ _ _ = return False
- While it may remove the file, there are generally other files
- derived from it that it does not remove. -}
remove :: S3Info -> S3Handle -> Remover
-remove info h k
+remove info h k = warnIARemoval info $ do
+ res <- tryNonAsync $ sendS3Handle h $
+ S3.DeleteObject (T.pack $ bucketObject info k) (bucket info)
+ return $ either (const False) (const True) res
+
+warnIARemoval :: S3Info -> Annex a -> Annex a
+warnIARemoval info a
| isIA info = do
- warning "Cannot remove content from the Internet Archive"
- return False
- | otherwise = do
- res <- tryNonAsync $ sendS3Handle h $
- S3.DeleteObject (T.pack $ bucketObject info k) (bucket info)
- return $ either (const False) (const True) res
+ warning "Derived versions of removed file may still be present in the Internet Archive"
+ a
+ | otherwise = a
checkKey :: Remote -> S3Info -> Maybe S3Handle -> CheckPresent
checkKey r info Nothing k = case getpublicurl info of
@@ -342,7 +345,7 @@ retrieveExportS3 r info _k loc f p =
return True
removeExportS3 :: Remote -> S3Info -> Key -> ExportLocation -> Annex Bool
-removeExportS3 r info _k loc =
+removeExportS3 r info _k loc = warnIARemoval info $
catchNonAsync go (\e -> warning (show e) >> return False)
where
go = withS3Handle (config r) (gitconfig r) (uuid r) $ \h -> do
@@ -620,9 +623,9 @@ getBucketObject c = munge . key2file
getBucketExportLocation :: RemoteConfig -> ExportLocation -> FilePath
getBucketExportLocation c (ExportLocation loc) = getFilePrefix c ++ loc
-{- Internet Archive limits filenames to a subset of ascii,
- - with no whitespace. Other characters are xml entity
- - encoded. -}
+{- Internet Archive documentation limits filenames to a subset of ascii.
+ - While other characters seem to work now, this entity encodes everything
+ - else to avoid problems. -}
iaMunge :: String -> String
iaMunge = (>>= munge)
where
diff --git a/doc/tips/Internet_Archive_via_S3.mdwn b/doc/tips/Internet_Archive_via_S3.mdwn
index 20d14bdec..be802b5b2 100644
--- a/doc/tips/Internet_Archive_via_S3.mdwn
+++ b/doc/tips/Internet_Archive_via_S3.mdwn
@@ -11,9 +11,10 @@ comply with their [terms of service](http://www.archive.org/about/terms.php).
A nice added feature is that whenever git-annex sends a file to the
Internet Archive, it records its url, the same as if you'd run `git annex
addurl`. So any users who can clone your repository can download the files
-from archive.org, without needing any login or password info. This makes
-the Internet Archive a nice way to publish the large files associated with
-a public git repository.
+from archive.org, without needing any login or password info.
+The url to the content in the Internet Archive is also displayed by
+`git annex whereis`. This makes the Internet Archive a nice way to
+publish the large files associated with a public git repository.
## webapp setup
@@ -50,10 +51,15 @@ Then you can annex files and copy them to the remote as usual:
# git annex copy photo1.jpeg --fast --to archive-panama
copy (to archive-panama...) ok
-Once a file has been stored on archive.org, it cannot be (easily) removed
-from it. Also, git-annex whereis will tell you a public url for the file
-on archive.org. (It may take a while for archive.org to make the file
-publically visibile.)
+It may take a while for archive.org to make files publically visible after
+they've been uploaded.
+
+## removing files
+
+While files can be removed from the Internet Archive,
+[derived versions](https://archive.org/help/derivatives.php)
+of some files may continued to be stored there after the originals
+were removed. git-annex warns about this problem.
## exporting trees
@@ -63,6 +69,7 @@ are important, you can run `git annex initremote` with an additional
parameter "exporttree=yes", and then use [[git-annex-export]] to publish
a tree of files to the Internet Archive.
-Note that the Internet Archive does not support filenames containing
-whitespace and some other characters. Exporting such problem filenames will
-fail; you can rename the file and re-export.
+Note that the Internet Archive may not support certian characters
+in filenames ([see FAQ](http://archive.org/about/faqs.php#1099)).
+If exporting a filename fails due to such limitations, you would need
+to rename it in your git annex repository in order to export it.
diff --git a/doc/todo/export.mdwn b/doc/todo/export.mdwn
index ac77b3d72..43d4d0e8c 100644
--- a/doc/todo/export.mdwn
+++ b/doc/todo/export.mdwn
@@ -29,8 +29,6 @@ Work is in progress. Todo list:
Would need git-annex sync to export to the master tree?
This is similar to the little-used preferreddir= preferred content
setting and the "public" repository group.
-* Test export to IA via S3. In particualar, does removing an exported file
- work?
Low priority: