aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2017-08-31 14:24:32 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2017-08-31 14:24:32 -0400
commita7383bc94e41d94e77e67406e1a4085d34241bfc (patch)
tree4a9dc422ada9ad424bef84dcb7466812e1f51f2f
parent36b222bf699023f3e460c8bc231b2916aa27ab5c (diff)
make storeExport atomic
This avoids needing to deal with the complexity of partially transferred files in the export. We'd not be able to resume uploading to such a file anyway, so just avoid them. The implementation in Remote.Directory is not completely ideal, because it could leave the temp file hanging around in the export directory. This only happens if it's killed with -9, or there's a power failure; normally viaTmp cleans up after itself, even when interrupted. I could not see a better way to do it though, since the export directory might be the root of a filesystem. Also some design thoughts on resuming, which depend on storeExport being atomic. This commit was sponsored by Fernando Jimenez on Partreon.
-rw-r--r--Remote/Directory.hs12
-rw-r--r--Types/Remote.hs3
-rw-r--r--Utility/Tmp.hs2
-rw-r--r--doc/design/exporting_trees_to_special_remotes.mdwn74
4 files changed, 50 insertions, 41 deletions
diff --git a/Remote/Directory.hs b/Remote/Directory.hs
index abbde1ceb..f5d7f7e49 100644
--- a/Remote/Directory.hs
+++ b/Remote/Directory.hs
@@ -29,6 +29,7 @@ import qualified Remote.Directory.LegacyChunked as Legacy
import Annex.Content
import Annex.UUID
import Utility.Metered
+import Utility.Tmp
remote :: RemoteType
remote = RemoteType {
@@ -116,11 +117,6 @@ getLocation d k = do
storeDir :: FilePath -> Key -> FilePath
storeDir d k = addTrailingPathSeparator $ d </> hashDirLower def k </> keyFile k
-{- Where we store temporary data for a key, in the directory, as it's being
- - written. -}
-tmpDir :: FilePath -> Key -> FilePath
-tmpDir d k = addTrailingPathSeparator $ d </> "tmp" </> keyFile k
-
{- Check if there is enough free disk space in the remote's directory to
- store the key. Note that the unencrypted key size is checked. -}
prepareStore :: FilePath -> ChunkConfig -> Preparer Storer
@@ -148,7 +144,7 @@ store d chunkconfig k b p = liftIO $ do
finalizeStoreGeneric tmpdir destdir
return True
where
- tmpdir = tmpDir d k
+ tmpdir = addTrailingPathSeparator $ d </> "tmp" </> keyFile k
destdir = storeDir d k
{- Passed a temp directory that contains the files that should be placed
@@ -233,7 +229,9 @@ checkPresentGeneric d ps = liftIO $
storeExportDirectory :: FilePath -> FilePath -> Key -> ExportLocation -> MeterUpdate -> Annex Bool
storeExportDirectory d src _k loc p = liftIO $ catchBoolIO $ do
createDirectoryIfMissing True (takeDirectory dest)
- withMeteredFile src p (L.writeFile dest)
+ -- Write via temp file so that checkPresentGeneric will not
+ -- see it until it's fully stored.
+ viaTmp (\tmp () -> withMeteredFile src p (L.writeFile tmp)) dest ()
return True
where
dest = exportPath d loc
diff --git a/Types/Remote.hs b/Types/Remote.hs
index 6a4d2039e..6e78bf238 100644
--- a/Types/Remote.hs
+++ b/Types/Remote.hs
@@ -98,7 +98,8 @@ data RemoteA a = Remote {
checkPresentCheap :: Bool,
-- Exports content to an ExportLocation.
- -- The exported file does not need to be updated atomically.
+ -- The exported file should not appear to be present on the remote
+ -- until all of its contents have been transferred.
storeExport :: Maybe (FilePath -> Key -> ExportLocation -> MeterUpdate -> a Bool),
-- Retrieves exported content to a file.
-- (The MeterUpdate does not need to be used if it writes
diff --git a/Utility/Tmp.hs b/Utility/Tmp.hs
index 6a541cfe4..ca611e0b4 100644
--- a/Utility/Tmp.hs
+++ b/Utility/Tmp.hs
@@ -28,7 +28,7 @@ type Template = String
{- Runs an action like writeFile, writing to a temp file first and
- then moving it into place. The temp file is stored in the same
- directory as the final file to avoid cross-device renames. -}
-viaTmp :: (MonadMask m, MonadIO m) => (FilePath -> String -> m ()) -> FilePath -> String -> m ()
+viaTmp :: (MonadMask m, MonadIO m) => (FilePath -> v -> m ()) -> FilePath -> v -> m ()
viaTmp a file content = bracketIO setup cleanup use
where
(dir, base) = splitFileName file
diff --git a/doc/design/exporting_trees_to_special_remotes.mdwn b/doc/design/exporting_trees_to_special_remotes.mdwn
index c9b2b72e5..835a86640 100644
--- a/doc/design/exporting_trees_to_special_remotes.mdwn
+++ b/doc/design/exporting_trees_to_special_remotes.mdwn
@@ -69,11 +69,6 @@ To efficiently update an export, git-annex can diff the tree
that was exported with the new tree. The naive approach is to upload
new and modified files and remove deleted files.
-Note that a file may have been partially uploaded to an export, and then
-the export updated to a tree without that file. So, need to try to delete
-all removed files, even if location tracking does not say that the special
-remote contains them.
-
With rename detection, if the special remote supports moving files,
more efficient updates can be done. It gets complicated; consider two files
that swap names.
@@ -81,33 +76,6 @@ that swap names.
If the special remote supports copying files, that would also make some
updates more efficient.
-## resuming exports
-
-Resuming an interrupted export needs to work well.
-
-There are two cases here:
-
-1. Some of the files in the tree have been uploaded; others have not.
-2. A file has been partially uploaded.
-
-These two cases need to be disentangled somehow in order to handle
-them. One way is to use the location log as follows:
-
-* Before a file is uploaded, look up what key is currently exported
- using that filename. If there is one, update the location log,
- saying it's not present in the special remote.
-* Upload the file.
-* Update the location log for the newly exported key.
-
-Note that this method does not allow resuming a partial upload by appending to
-a file, because we don't know if the file actually started to be uploaded, or
-if the file instead still has the old key's content. Instead, the whole
-file needs to be re-uploaded.
-
-Alternative: Keep an index file that's the current state of the export.
-See comment #4 of [[todo/export]]. Not sure if that works? Perhaps it
-would be overkill if it's only used to support resuming partial uploads.
-
## changes to special remote interface
This needs some additional methods added to special remotes, and to
@@ -123,6 +91,9 @@ Here's the changes to the latter:
* `TRANSFEREXPORT STORE|RETRIEVE Key File`
Requests the transfer of a File on local disk to or from the previously
provided Name on the special remote.
+ Note that it's important that, while a file is being stored,
+ CHECKPRESENTEXPORT not indicate it's present until all the data has
+ been transferred.
The remote responds with either `TRANSFER-SUCCESS` or
`TRANSFER-FAILURE`, and a remote where exports do not make sense
may always fail.
@@ -241,3 +212,42 @@ re-uploads, but it's reasonably efficient.
The documentation should suggest strongly only exporting to a given special
remote from a single repository, or having some other rule that avoids
export conflicts.
+
+## when to update export.log for efficient resuming of exports
+
+When should `export.log` be updated? Possibilities:
+
+* Before performing any work, to set the goal.
+* After the export is fully successful, to record the current state.
+* After some mid-point.
+
+Lots of things could go wrong during an export. A file might fail to be
+transferred or only part of it be transferred; a file's content might not
+be present to transfer at all. The export could be interrupted part way.
+Updating the export.log at the right point in time is important to handle
+these cases efficiently.
+
+If the export.log is updated first, then it's only a goal and does not tell
+us what's been done already.
+
+If the export.log is updated only after complete success, then the common
+case of some files not having content locally present will prevent it from
+being updated. When we resume, we again don't know what's been done
+already.
+
+If the export.log is updated after deleting any files from the
+remote that are not the same in the new treeish as in the old treeish,
+and as long as TRANSFEREXPORT STORE is atomic, then when resuming we can
+trust CHECKPRESENTEXPORT to only find files that have the correct content
+for the current treeish. (Unless a conflicting export was made from
+elsewhere, but in that case, the conflict resolution will have to fix up
+later.)
+
+Efficient resuming can then first check if the location log says the
+export contains the content. (If not, transfer a copy.) If the location
+log says the export contains the content, use CHECKPRESENTEXPORT to see if
+the file exists, and if not transfer a copy. The CHECKPRESENTEXPORT check
+deals with the case where the treeish has two files with the same content.
+If we have a key-to-files map for the export, then we can skip the
+CHECKPRESENTEXPORT check when there's only one file using a key. So,
+resuming can be quite efficient.