doc/todo/upload_large_chunks_without_buffering_in_memory.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Currently, when chunk=100MiB is used with a special remote, git-annex will
use 100 mb ram when uploading a large file to it. The current chunk is
buffered in ram.

See comment in Remote.Helper.Chunked:

	- More optimal versions of this can be written, that rely
	- on L.toChunks to split the lazy bytestring into chunks (typically
	- smaller than the ChunkSize), and eg, write those chunks to a Handle.
	- But this is the best that can be done with the storer interface that
	- writes a whole L.ByteString at a time.

The buffering is probably fine for smaller chunks, in the < 10 mb or
whatever range. Reading such a chunk from gpg and converting it into a http
request will generally need to buffer it all in memory anyway. But, for
eg external special remotes, the content is going to be written to a file
anyway, so there's no point in buffering the content in memory first.

So, need something like:

	prefersFileContent :: Storer -> Bool

And then when storeChunks is given such a Storer, it can then avoid buffering
and write chunk data to a temp file to pass it to the Storer.

--[[Joey]]