aboutsummaryrefslogtreecommitdiff
path: root/Annex.hs
Commit message (Collapse)AuthorAge
* Deal with the MonadFail proposalGravatar Benjamin Barenblat2022-01-19
| | | | | | | | | | base-4.9 split MonadFail from Monad. Strengthen some type signatures to require MonadFail instead of just Monad, derive MonadFail in relevant places, and change a partial pattern match inside STM to one that explicitly calls error. (STM is not a MonadFail; the user must explicitly specify the desired semantics if a pattern match doesn’t work out. In this case, the failing branch of the pattern should never be reached, so crashing is fine.)
* Eliminate Data.Map.insertWith'Gravatar Benjamin Barenblat2022-01-19
| | | | | containers-0.6 removed insertWith' in favor of the Data.Map.Strict API. Switch to the new API where appropriate.
* fix --json-progress --json to be same as --json --json-progressGravatar Joey Hess2018-02-19
| | | | | | | Fix behavior of --json-progress followed by --json, in which the latter option disabled the former. This commit was supported by the NSF-funded DataLad project.
* Improve startup time for commands that do not operate on remotesGravatar Joey Hess2018-01-09
| | | | | | | | | | | | | | And for tab completion, by not unnessessarily statting paths to remotes, which used to cause eg, spin-up of removable drives. Got rid of the remotes member of Git.Repo. This was a bit painful. Remote.Git modifies the list of remotes as it reads their configs, so still need a persistent list of remotes. So, put it in as Annex.gitremotes. It's only populated by getGitRemotes, so commands like examinekey that don't care about remotes won't do so. This commit was sponsored by Jake Vosloo on Patreon.
* convert importfeed to youtube-dlGravatar Joey Hess2017-11-29
| | | | | | | | | | | | | | | | | | | | | | | | Fully working, including --fast/--relaxed. Note that, while git-annex addurl --relaxed is not going to check youtube-dl, I kept git annex importfeed --relaxed checking it. Thinking is that, let's not break people's importfeed cron jobs, and importfeed does not typically have to check a large number of new items, so it's ok if it's a little bit slower when used with youtube playlist feeds. importfeed's behavior is also improved (?) when a feed has links in it to non-media files. Before, those were skipped. Now, the content of the link is downloaded. This had to be done, because trying to use youtube-dl is slow, and if those were skipped, it would have to check every time importfeed was run. While this behavior change may not be desirable for some feeds, that intersperse links to web pages with enclosures, it will be desirable for other feeds, that have non-enclosure directy links to media files. Remove old quvi modules. This commit was sponsored by Øyvind Andersen Holm.
* better dup key with -J fixGravatar Joey Hess2017-10-17
| | | | | | | | | | | | | | | This avoids all the complication about redundant work discussed in the previous try at fixing this. At the expense of needing each command that could have the problem to be patched to simply wrap the action in onlyActionOn once the key is known. But there do not seem to be many such commands. onlyActionOn' should not be used with a CommandStart (or CommandPerform), although the types do allow it. onlyActionOn handles running the whole CommandStart chain. I couldn't immediately see a way to avoid mistken use of onlyActionOn'. This commit was supported by the NSF-funded DataLad project.
* Improve behavior when -J transfers multiple files that point to the same keyGravatar Joey Hess2017-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a false start, I found a fairly non-intrusive way to deal with it. Although it only handles transfers -- there may be issues with eg concurrent dropping of the same key, or other operations. There is no added overhead when -J is not used, other than an added inAnnex check. When -J is used, it has to maintain and check a small Set, which should be negligible overhead. It could output some message saying that the transfer is being done by another thread. Or it could even display the same progress info for both files that are being downloaded since they have the same content. But I opted to keep it simple, since this is rather an edge case, so it just doesn't say anything about the transfer of the file until the other thread finishes. Since the deferred transfer action still runs, actions that do more than transfer content will still get a chance to do their other work. (An example of something that needs to do such other work is P2P.Annex, where the download always needs to receive the content from the peer.) And, if the first thread fails to complete a transfer, the second thread can resume it. But, this unfortunately means that there's a risk of redundant work being done to transfer a key that just got transferred. That's not ideal, but should never cause breakage; the same thing can occur when running two separate git-annex processes. The get/move/copy/mirror --from commands had extra inAnnex checks added, inside the download actions. Without those checks, the first thread downloaded the content, and then the second thread woke up and downloaded the same content redundantly. move/copy/mirror --to is left doing redundant uploads for now. It would need a second checkPresent of the remote inside the upload to avoid them, which would be expensive. A better way to avoid redundant work needs to be found.. This commit was supported by the NSF-funded DataLad project.
* add annex-ignore-command and annex-sync-command configsGravatar Joey Hess2017-08-17
| | | | | | | | | | | | | | | | Added remote configuration settings annex-ignore-command and annex-sync-command, which are dynamic equivilants of the annex-ignore and annex-sync configurations. For this I needed a new DynamicConfig infrastructure. Its implementation should be as fast as before when there is no dynamic config, and it caches so shell commands are only run once. Note that annex-ignore-command exits nonzero when the remote should be ignored. While that may seem backwards, it allows using the same command for it as for annex-sync-command when you want to disable both. This commit was sponsored by Trenton Cronholm on Patreon.
* fix sshCleanup race using STMGravatar Joey Hess2017-05-11
|
* Ssh password prompting improved when using -JGravatar Joey Hess2017-05-11
| | | | | | | | | | | | | When ssh connection caching is enabled (and when GIT_ANNEX_USE_GIT_SSH is not set), only one ssh password prompt will be made per host, and only one ssh password prompt will be made at a time. This also fixes a race in prepSocket's stale ssh connection stopping when run with -J. It was possible for one thread to start a cached ssh connection, and another thread to immediately stop it, resulting in excess connections being made. This commit was supported by the NSF-funded DataLad project.
* annex.backend is the new name for what was annex.backendsGravatar Joey Hess2017-05-09
| | | | | | | | | It takes a single key-value backend, rather than the unncessary and confusing list. The old option still works if set. Simplified some old old code too. This commit was sponsored by Thomas Hochstein on Patreon.
* get -J: Improve distribution of jobs amoung remotes when there are more jobs ↵Gravatar Joey Hess2017-03-08
| | | | | | | | | | | | | | | | than remotes. It was distributing jobs to remotes that were not being used by any other job. But, suppose that there are only 2 remotes, and -J10. In such a case, the first 2 downloads would be distributed amoung the 2 remotes, but the other 8 would all go to remote #1. Improved by keeping a counter of how many jobs are assigned to a remote, and prefer remotes with fewer jobs. Note use of Data.Map.Strict to avoid blowing up space. I kept the bang-patterns as-is, although probably not needed with Data.Map.Strict. This commit was sponsored by Jack Hill on Patreon.
* annex.autocommit can be configured via git-annex configGravatar Joey Hess2017-02-03
| | | | | | | | | | | | | | | | | ... to control the default behavior in all clones of a repository. This includes a new Configurable data type, so the GitConfig type indicates which values can be configured this way. The implementation should be quite efficient; the config log is only read once, and only when a Configurable value has not already been set by git-config. Indeed, it would be nice in the future to extend this, so that git-config is itself only read on demand. Some commands may not need to look at the git configuration at all. This commit was sponsored by Trenton Cronholm on Patreon.
* config: New command for storing configuration in the git-annex branch.Gravatar Joey Hess2017-01-30
| | | | | | | | | | | Any config names can be set using this; git-annex commands will only look at specific ones that make sense and are worth the overhead of querying the branch. This might also be useful for storing whatever other config-type stuff the user might want to shove into the git-annex branch. This commit was sponsored by Jochen Bartl on Patreon.
* Optimisations to git-annex branch query and setting, avoiding repeated ↵Gravatar Joey Hess2016-09-29
| | | | | | | | | | | | | | | | | | | | | | copies of the environment. Speeds up commands like "git-annex find --in remote" by over 50%. Profiling showed that adjustGitEnv was 21% of the time and 37% of the allocations of that command. It copied the environment each time with getEnvironment. The only repeated use of adjustGitEnv is in withIndexFile, which tends to be run at least once per file. So, it was optimised by keeping a cache of the environment, which can be reused. There could be other better ways to optimise this. Maybe get the while environment once at startup. But, then it would have to be serialized back out each time running a child process, so I doubt that would be a net win. It might be better to cache a version of the environment that is pre-modified to use .git-annex/index. But, profiling doesn't show that modifying the enviroment is taking any significant time.
* disentangle concurrency and message typeGravatar Joey Hess2016-09-09
| | | | | | | | This makes -Jn work with --json and --quiet, where before setting -Jn disabled those options. Concurrent json output is currently a mess though since threads output chunks over top of one-another.
* get -J: Download different files from different remotes when the remotes ↵Gravatar Joey Hess2016-09-06
| | | | | | | | | | | | | | | | | have the same costs. Only done in -J mode because only if there's concurrency can downloading from two remotes be faster. Without concurrency, it's likely the case that sequential downloads from the same remote are faster than switching back and forth between two remotes. There is some hairy MVar code here, but basically it just keeps the activeremotes MVar full except when deciding which remote to assign to a thread. Also affects gets by sync --content -J This commit was sponsored by Jochen Bartl.
* git annex add in adjusted unlocked branchGravatar Joey Hess2016-03-29
| | | | | Cached the current branch lookup just because it seems unnecessary overhead to run an extra git command per add to query the current branch.
* Sped up git-annex add in direct mode and v6 by using git hash-object --batch.Gravatar Joey Hess2016-03-14
| | | | Speeds up hashSymlink and hashPointerFile.
* remove 3 build flagsGravatar Joey Hess2016-01-26
| | | | | | | | | | | | | * Removed the webapp-secure build flag, rolling it into the webapp build flag. * Removed the quvi and tahoe build flags, which only adds aeson to the core dependencies. * Removed the feed build flag, which only adds feed to the core dependencies. Build flags have cost in both code complexity and also make Setup configure have to work harder to find a usable set of build flags when some dependencies are missing.
* Bug fix: Git config settings passed to git-annex -c did not always take effect.Gravatar Joey Hess2016-01-22
| | | | | | When Config.setConfig runs, it throws away the old Repo and loads a new one. So, add an action to adjust the Repo so that -c settings will persist across that.
* flush keys db queue even on exceptionGravatar Joey Hess2015-12-23
| | | | | Also fixed a bug in makeRunner; run' leaves the mvar empty so have to refill it.
* optimise read and write for Keys database (untested)Gravatar Joey Hess2015-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writes are optimised by queueing up multiple writes when possible. The queue is flushed after the Annex monad action finishes. That makes it happen on program termination, and also whenever a nested Annex monad action finishes. Reads are optimised by checking once (per AnnexState) if the database exists. If the database doesn't exist yet, all reads return mempty. Reads also cause queued writes to be flushed, so reads will always be consistent with writes (as long as they're made inside the same Annex monad). A future optimisation path would be to determine when that's not necessary, which is probably most of the time, and avoid flushing unncessarily. Design notes for this commit: - separate reads from writes - reuse a handle which is left open until program exit or until the MVar goes out of scope (and autoclosed then) - writes are queued - queue is flushed periodically - immediate queue flush before any read - auto-flush queue when database handle is garbage collected - flush queue on exit from Annex monad (Note that this may happen repeatedly for a single database connection; or a connection may be reused for multiple Annex monad actions, possibly even concurrent ones.) - if database does not exist (or is empty) the handle is not opened by reads; reads instead return empty results - writes open the handle if it was not open previously
* temporarily remove cached keys database connectionGravatar Joey Hess2015-12-16
| | | | | | | | | | | | | | | | | | | | | | The problem is that shutdown is not always called, particularly in the test suite. So, a database connection would be opened, possibly some changes queued, and then not shut down. One way this can happen is when using Annex.eval or Annex.run with a new state. A better fix might be to make both of them call Keys.shutdown (and be sure to do it even if the annex action threw an error). Complication: Sometimes they're run reusing an existing state, so shutting down a database connection could cause problems for other users of that same state. I think this would need a MVar holding the database handle, so it could be emptied once shut down, and another user of the database connection could then start up a new one if it got shut down. But, what if 2 threads were concurrently using the same database handle and one shut it down while the other was writing to it? Urgh. Might have to go that route eventually to get the database access to run fast enough. For now, a quick fix to get the test suite happier, at the expense of speed.
* add inode cache to the dbGravatar Joey Hess2015-12-09
| | | | | | | | | Renamed the db to keys, since it is various info about a Keys. Dropping a key will update its pointer files, as long as their content can be verified to be unmodified. This falls back to checksum verification, but I want it to use an InodeCache of the key, for speed. But, I have not made anything populate that cache yet.
* stash DbHandle in Annex stateGravatar Joey Hess2015-12-09
|
* arrange for regional output manager to run when -J is enabledGravatar Joey Hess2015-11-04
| | | | | | | | | | Commands that want to use it have to run their seek action inside allowConcurrentOutput. Which seems reasonable; perhaps some future command will want to support the -J flag but not use regions. The region state moved from Annex to MessageState. This makes sense organizationally, and note that some uses of onLocal use a different Annex state, but pass the MessageState into it, which is what is needed.
* add regions to concurrent outputGravatar Joey Hess2015-11-04
| | | | still no progress displays when getting files etc, but a big improvement
* fix lockKey to run callback in original Annex monad, not local remote'sGravatar Joey Hess2015-10-09
|
* converted MetaData, eliminating a global value from Annex state .. beautifulGravatar Joey Hess2015-07-12
|
* better memoize core.sharedrepository handlingGravatar Joey Hess2015-05-19
| | | | | It was memoized, but that was not used consistently. Move it to Types.GitConfig so it will auto-memoize.
* use lock pools throughout git-annexGravatar Joey Hess2015-05-19
| | | | | | | | | | | | | The one exception is in Utility.Daemon. As long as a process only daemonizes once, which seems reasonable, and as long as it avoids calling checkDaemon once it's already running as a daemon, the fcntl locking gotchas won't be a problem there. Annex.LockFile has it's own separate lock pool layer, which has been renamed to LockCache. This is a persistent cache of locks that persist until closed. This is not quite done; lockContent stil needs to be converted.
* Merge branch 'master' into concurrentprogressGravatar Joey Hess2015-05-12
|\ | | | | | | | | | | | | | | | | | | | | | | Conflicts: Command/Fsck.hs Messages.hs Remote/Directory.hs Remote/Git.hs Remote/Helper/Special.hs Types/Remote.hs debian/changelog git-annex.cabal
| * refactorGravatar Joey Hess2015-04-30
| |
* | get, move, copy, mirror: Concurrent downloads and uploads are now supported!Gravatar Joey Hess2015-04-10
|/ | | | | | | | | | | This works, and seems fairly robust. Clean get of 20 files at -J3. At -J10, there are some messages about ssh multiplexing, probably due to a race spinning up the ssh connection cacher. But, it manages to get all the files ok regardless. The progress bars are a scrambled mess though, due to bugs in ascii-progress, which I've already filed. Particularly this one: https://github.com/yamadapc/haskell-ascii-progress/issues/8
* use defGravatar Joey Hess2015-04-03
|
* --auto is no longer a global option; only get, drop, and copy accept it.Gravatar Joey Hess2015-03-25
| | | | Not a behavior change unless you were passing it to a command that ignored it.
* Submodules are now supported by git-annex!Gravatar Joey Hess2015-03-02
| | | | | | | | | | | | | | | | | | Seems to work, but still experimental until it's been tested more. When repositories are on filesystems not supporting symlinks, the .git dir symlink trick cannot be used. Since we're going to be in direct mode anyway, the .git dir symlink is not strictly needed. However, I have not fixed the code that creates new annex symlinks to handle this case -- the committed symlinks will be wrong. git annex sync happens to currently fail in a submodule using direct mode, because there's no HEAD ref. That also needs to be dealt with to get this fully working in crippled filesystems. Leaving http://github.com/datalad/datalad/issues/44 open until these issues are dealt with.
* update my email address and homepage urlGravatar Joey Hess2015-01-21
|
* remove ifdef that is always trueGravatar Joey Hess2015-01-15
| | | | cabal file has a depends on exceptions 0.6.0
* handle sync's use of setCurrentDirectory to work with relative pathsGravatar Joey Hess2015-01-06
| | | | | | I think this is the last problimatic setCurrentDirectory. I also audited for extrnal commands that git-annex might run with cwd = foo, and did not find any that were passed any FilePath that might be absolute.
* Switch to using relative paths to the git repository.Gravatar Joey Hess2015-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the git repository to be moved while git-annex is running in it, with fewer problems. On Windows, this avoids some of the problems with the absurdly small MAX_PATH of 260 bytes. In particular, git-annex repositories should work in deeper/longer directory structures than before. See http://git-annex.branchable.com/bugs/__34__git-annex:_direct:_1_failed__34___on_Windows/ There are several possible ways this change could break git-annex: 1. If it changes its working directory while it's running, that would be Bad News. Good news everyone! git-annex never does so. It would also break thread safety, so all such things were stomped out long ago. 2. parentDir "." -> "" which is not a valid path. I had to fix one instace of this, and I should probably wipe all calls to parentDir out of the git-annex code base; it was never a good idea. 3. Things like relPathDirToFile require absolute input paths, and code assumes that the git repo path is absolute and passes it to it as-is. In the case of relPathDirToFile, I converted it to not make this assumption. Currently, the test suite has 16 failures.
* try to avoid Windows MAX_PATH limit, by using \\?\ prefix on git repo pathGravatar Joey Hess2015-01-06
|
* Urls can now be claimed by remotes. This will allow creating, for example, a ↵Gravatar Joey Hess2014-12-08
| | | | external special remote that handles magnet: and *.torrent urls.
* When accessing a local remote, shut down git-cat-file processes afterwards, ↵Gravatar Joey Hess2014-08-20
| | | | | | | | | | | | | | | to ensure that remotes on removable media can be unmounted. Closes: #758630 This does mean that eg, copying multiple files to a local remote will become slightly slower, since it now restarts git-cat-file after each copy. Should not be significant slowdown. The reason git-cat-file is run on the remote at all is to update its location log. In order to add an item to it, it needs to get the current content of the log. Finding a way to avoid needing to do that would be a good path to avoiding this slowdown if it does become a problem somehow. This commit was sponsored by Evan Deaubl.
* unify exception handling into Utility.ExceptionGravatar Joey Hess2014-08-07
| | | | | | | | | | | | | | | | | | | | Removed old extensible-exceptions, only needed for very old ghc. Made webdav use Utility.Exception, to work after some changes in DAV's exception handling. Removed Annex.Exception. Mostly this was trivial, but note that tryAnnex is replaced with tryNonAsync and catchAnnex replaced with catchNonAsync. In theory that could be a behavior change, since the former caught all exceptions, and the latter don't catch async exceptions. However, in practice, nothing in the Annex monad uses async exceptions. Grepping for throwTo and killThread only find stuff in the assistant, which does not seem related. Command.Add.undo is changed to accept a SomeException, and things that use it for rollback now catch non-async exceptions, rather than only IOExceptions.
* fix for Windows file timestamp timezone madnessGravatar Joey Hess2014-06-12
| | | | | | | | | | | | | | | | | | | | | | On Windows, changing the time zone causes the apparent mtime of files to change. This confuses git-annex, which natually thinks this means the files have actually been modified (since THAT'S WHAT A MTIME IS FOR, BILL <sheesh>). Work around this stupidity, by using the inode sentinal file to detect if the timezone has changed, and calculate a TSDelta, which will be applied when generating InodeCaches. This should add no overhead at all on unix. Indeed, I sped up a few things slightly in the refactoring. Seems to basically work! But it has a big known problem: If the timezone changes while the assistant (or a long-running command) runs, it won't notice, since it only checks the inode cache once, and so will use the old delta for all new inode caches it generates for new files it's added. Which will result in them seeming changed the next time it runs. This commit was sponsored by Vincent Demeester.
* allow building with old versions of exceptions before MonadMask was split outGravatar Joey Hess2014-05-28
|
* Use exceptions in place of deprecated MonadCatchIO-transformersGravatar Ben Gamari2014-05-28
|
* factor out getRemoteGitConfigGravatar Joey Hess2014-05-16
|