summaryrefslogtreecommitdiff
path: root/debian
Commit message (Collapse)AuthorAge
* git-annex-shell: Runs hooks/annex-content after content is received or dropped.Gravatar Joey Hess2012-03-14
|
* Merge branch 'master' into bloomGravatar Joey Hess2012-03-12
|\ | | | | | | | | Conflicts: debian/changelog
* | finish bloom filtersGravatar Joey Hess2012-03-12
| | | | | | | | | | | | | | | | Add tuning, docs, etc. Not sure if status is the right place to remote size.. perhaps unused should report the size and also warn if it sees more keys than the bloom filter allows?
| * status: More accurate display of sizes of tmp and bad keys.Gravatar Joey Hess2012-03-12
|/ | | | | | | | | Can't trust the key size to be accurate for tmp and bad keys, so check actual file size. In the wild I saw the old code be wrong by a factor of about 100! If all tmp/bad keys are empty, they're not shown in status at all. Showing 0 bytes and suggesting to clean it up seemed weird..
* getKeysPresent is now fully lazyGravatar Joey Hess2012-03-11
| | | | | | | | | | | | .. Allowing it to be used by things in constant space! Random statistics: git annex status has gone from taking 239 mb of memory and 26 seconds in a repo, to 8 mb and 13 seconds. The trick here is the unsafeInterleaveIO, and the form of the function's recursion, which I cribbed heavily from System.IO.HVFS.Utils.recurseDirStat. The difference is, this one goes to a limited depth and avoids statting everything.
* status: Fixed to run in nearly constant space.Gravatar Joey Hess2012-03-11
| | | | | | | | Before, it leaked space due to caching lists of keys. Now all necessary data about keys is calculated as they stream in. The "nearly constant" is due to getKeysPresent, which builds up a lot of [] thunks as it traverses .git/annex/objects/. Will deal with it later.
* unused: Reduce memory usage significantly.Gravatar Joey Hess2012-03-11
| | | | | | | | | | | | | | | | | Much of the memory bloat turned out to be due to getKeysReferenced containing a mapM, which is strict and buffered the whole list rather than streaming it. The other half of the bloat was due to building a temporary Set in order to call S.difference. While that is more cpu efficient, I switched to successive S.delete, since with it, I can run a whole git annex unused in less than 8 mb of memory. The whole Set of keys with content available is still stored in memory, so running unused in a repo with a whole lot of file content will still use more memory. In a repo containing 6000 files, it needed 40 mb. Note that the status command still uses the bloatful getKeysReferenced.
* sync: Sync to lower cost remotes first.Gravatar Joey Hess2012-03-10
| | | | | | | | | This has two benefits. 1. When a lot of refs are going to be received, get them via lower cost connection when possible. 2. Allows ctrl-c of sync after the cheaper remotes have been pulled from (or pushed to).
* fsck: Fix up any broken links and misplaced content caused by the directory ↵Gravatar Joey Hess2012-03-10
| | | | hash calculation bug fixed in the last release.
* releasing version 3.201203093.20120309Gravatar Joey Hess2012-03-09
|
* fix key directory hash calculation codeGravatar Joey Hess2012-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix Key directory hash calculation code to behave as it did before version 3.20120227 when a key contains non-ascii. The hash directories for a given Key are based on its md5sum. Prior to ghc 7.4, Keys contained raw, undecoded bytes, so the md5sum was taken of each byte in turn. With the ghc 7.4 filename encoding change, keys contains decoded unicode characters (possibly with surrigates for undecodable bytes). This changes the result of the md5sum, since the md5sum used is pure haskell and supports unicode. And that won't do, as git-annex will start looking in a different hash directory for the content of a key. The surrigates are particularly bad, since that's essentially a ghc implementation detail, so could change again at any time. Also, changing the locale changes how the bytes are decoded, which can also change the md5sum. Symptoms would include things like: * git annex fsck would complain that no copies existed of a file, despite its symlink pointing to the content that was locally present * git annex fix would change the symlink to use the wrong hash directory. Only WORM backend is likely to have been affected, since only it tends to include much filename data (SHA1E could in theory also be affected). I have not tried to support the hash directories used by git-annex versions 3.20120227 to 3.20120308, so things added with those versions with WORM will require manual fixups. Sorry for the inconvenience!
* releasing version 3.201202303.20120230Gravatar Joey Hess2012-03-05
|
* Fix a bug in symlink calculation code, that triggered in rare cases where an ↵Gravatar Joey Hess2012-03-05
| | | | | | | | | | | annexed file is in a subdirectory that nearly matched to the .git/annex/object/xx/yy subdirectories. This is a straight up pure-code stinker. The relative path calculation looked for common subdirectories in the two paths, but failed to stop after the paths diverged. When a later pair of subdirectories were the same, the resulting relative path was wrong. Added regression test for this.
* add remote start and stop hooksGravatar Joey Hess2012-03-04
| | | | | | Locking is used, so that, if there are multiple git-annex processes using a remote concurrently, the stop hook is only run by the last process that uses it.
* Add progress bar display to the directory special remote.Gravatar Joey Hess2012-03-04
| | | | | | | So far I've only written progress bars for sending files, not yet receiving. No longer uses external cp at all. ByteString IO is fast enough.
* Directory special remotes now support chunking files written to themGravatar Joey Hess2012-03-03
| | | | | | Avoiding writing files larger than a specified size is useful on certian things. For example, box.com has a file size limit of 100 mb. Could also be useful on really crappy removable media.
* "here" can be used to refer to the current repository, which can read better ↵Gravatar Joey Hess2012-03-01
| | | | than the old "." (which still works too).
* releasing version 3.201202293.20120229Gravatar Joey Hess2012-02-29
|
* Fix test suite to not require a unicode locale.Gravatar Joey Hess2012-02-29
| | | | | Without a unicode locale, it will fail to print a unicode filename to console, and fails.
* releasing version 3.20120227Gravatar Joey Hess2012-02-27
|
* move --from, copy --from: 10 times faster scanning remote on local diskGravatar Joey Hess2012-02-26
| | | | | | | | | | | | | | | | | | | Rather than go through the location log to see which files are present on the remote, it simply looks at the disk contents directly. I benchmarked this speeding up scanning 834 files, from an annex on my phone's SSD, from 11.39 seconds to 1.31 seconds. (No files actually moved.) Also benchmarked 8139 files, from an annex on spinning storage, speeding up from 103.17 to 13.39 seconds. Note that benchmarking with an encrypted annex on flash actually showed a minor slowdown with this optimisation -- from 13.93 to 14.50 seconds. Seems the overhead of doing the crypto needed to get the filenames to directly check can be higher than the overhead of looking up data in the location log. (Which says good things about how well the location log and git have been optimised!) It *may* make sense to make encrypted local remotes not have hasKeyCheap set; further benchmarking is called for.
* version dependency on openssh-clientGravatar Joey Hess2012-02-25
| | | | | This is only to ensure that it's as new a version as it was built with, so partial upgrades work.
* configure: Check if ssh connection caching is supported by the installed ↵Gravatar Joey Hess2012-02-25
| | | | version of ssh and default annex.sshcaching accordingly.
* do a cleanup commit after moving data from or to a git remoteGravatar Joey Hess2012-02-25
| | | | | | | | Added Annex.cleanup, which is a general purpose interface for adding actions to run at the end. Remotes with the old git-annex-shell will commit every time, and have no commit command, so hide stderr when running the commit command.
* improve alwayscommit=false modeGravatar Joey Hess2012-02-25
| | | | | | | | | | | | | | Now changes are staged into the branch's index, but not committed, which avoids growing a large journal. And sync and merge always explicitly commit, ensuring that even when they do nothing else, they commit the staged changes. Added a flag file to indicate that the branch's journal contains uncommitted changes. (Could use git ls-files, but don't want to run that every time.) In the future, this ability to have uncommitted changes staged in the journal might be used on remotes after a series of oneshot commands.
* add annex.alwayscommit optionGravatar Joey Hess2012-02-25
| | | | | | To avoid commits of data to the git-annex branch after each command is run, set annex.alwayscommit=false. Its data will then be committed less frequently, when a merge or sync is done.
* update copyright format urlGravatar Joey Hess2012-02-25
|
* Deal with NFS problem that caused a failure to remove a directory when ↵Gravatar Joey Hess2012-02-24
| | | | | | | | | | | | | | removing content from the annex. I was able to reproduce this on linux using the kernel's nfs server and mounting localhost:/. Determined that removing the directory fails when the just-deleted file in it was locked. Considered dropping the lock before removing the directory, but this would complicate parts of the code that should not need to worry about locking. So instead, ignore the failure to remove the directory in this case. While I was at it, made it attempt to remove both levels of hash directories, in case they're empty.
* Store web special remote url info in a more efficient location.Gravatar Joey Hess2012-02-17
| | | | | | | | storing it in remotes/web/xx/yy/foo.log meant lots of extra directory objects in git. Now I use xx/yy/foo.log.web, which is just as unique, but more efficient since foo.log is there anyway. Of course, it still looks in the old location too.
* rekey: New plumbing level command, can be used to change the keys used for ↵Gravatar Joey Hess2012-02-16
| | | | files en masse.
* reorderGravatar Joey Hess2012-02-16
|
* addurl: Add --pathdepth option.Gravatar Joey Hess2012-02-16
|
* tweak wordingGravatar Joey Hess2012-02-15
|
* changelogGravatar Joey Hess2012-02-15
|
* Added a annex.queuesize settingGravatar Joey Hess2012-02-15
| | | | | | | | | | useful when adding hundreds of thousands of files on a system with plenty of memory. git add gets quite slow in such a large repository, so if the system has more than the ~32 mb of memory the queue can use by default, it's a useful optimisation to increase the queue size, in order to decrease the number of times git add is run.
* fix memory leak when staging the journalGravatar Joey Hess2012-02-14
| | | | | | The list of files had to be retained until the end so it could be deleted. Also, a list of update-index lines was generated and only then fed into it. Now everything streams in constant space.
* Fixed a memory leak due to excessive strictness when committing journal files.Gravatar Joey Hess2012-02-14
| | | | | | When hashing the files, the entire list of shas was read strictly. That was entirely unnecessary, since there's a cleanup action run after they're consumed.
* whereis: Prints the urls of files that the web special remote knows about.Gravatar Joey Hess2012-02-14
|
* changelog for a964012fc36d22e4554dd12e3594579fb3190501Gravatar Joey Hess2012-02-13
| | | | | | | | | | Turns out that commit really made some serious improvements to memory use. With the lazy state monad, git-annex add in a huge tree grew seemingly without bound until it overflowed the stack. With the strict monad, it uses 42 mb max. It's possible another change since the 3.20120123 release fixed that, but a964012fc36d22e4554dd12e3594579fb3190501 seems most likely.
* addurl --fast: Verifies that the url can be downloaded (only getting its ↵Gravatar Joey Hess2012-02-10
| | | | head), and records the size in the key.
* When checking that an url has a key, verify that the Content-Length, if ↵Gravatar Joey Hess2012-02-10
| | | | | | | | available, matches the size of the key. If there's no Content-Length, or the key has no size, this check is not done, but it should happen most of the time, and protect against web content that has changed.
* Fix teardown of stale cached ssh connections.Gravatar Joey Hess2012-02-09
|
* addurl: Normalize badly encoded urls.Gravatar Joey Hess2012-02-09
|
* addurl: Added a --file optionGravatar Joey Hess2012-02-08
| | | | | | | Can be used to specify what file the url is added to. This can be used to override the default filename that is used when adding an url, which is based on the url. Or, when the file already exists, the url is recorded as another location of the file.
* S3: Fix irrefutable pattern failure when accessing encrypted S3 credentials.Gravatar Joey Hess2012-02-08
|
* correctionGravatar Joey Hess2012-02-07
|
* changelogGravatar Joey Hess2012-02-06
|
* note 7.4 neededGravatar Joey Hess2012-02-04
|
* remove; unusedGravatar Joey Hess2012-01-30
|
* Avoid repeated location log commits when a remote is receiving files.Gravatar Joey Hess2012-01-28
| | | | | | | | | Done by adding a oneshot mode, in which location log changes are written to the journal, but not committed. Taking advantage of git-annex's existing ability to recover in this situation. This is used by git-annex-shell and other places where changes are made to a remote's location log.