| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In my git-annex repos, I found some stale transfer info files
without lock files.
Pass a mode to tryLockExclusive, so it will create the lock file if
not present, and so not fail to clean up such transfer info files.
Normally, transfer info files are accompanied by a lock file.
But, when alwaysRunTransfer is used, the locking can fail
and it will still write the transfer info file. Perhaps there are other
cases too? Note that mkProgressUpdater's meter
writes to the transfer info file too, and it might be possible for
that meter to fire after runTransfer has cleaned up.
This commit was sponsored by andrea rota.
|
|
|
|
|
|
| |
to avoid the user being surprised in cases where that behavior is not desired or expected
This commit was supported by the NSF-funded DataLad project.
|
| |
|
|
|
|
|
|
|
|
| |
It was not getting old lines removed, because the tree graft confused
the updater, so it union merged from the previous git-annex branch,
which still contained the old lines. Fixed by carefully using setIndexSha.
This commit was supported by the NSF-funded DataLad project.
|
|
|
|
|
|
|
| |
This breaks backwards compatibility, but only with unreleased versions of
git-annex, which I think is acceptable.
This commit was supported by the NSF-funded DataLad project.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This way, the temp files that might be left due to failure will be
cleaned up next time.
Also, nub the list of incomplete exports to avoid repeatedly adding the
same tree to it when running export repeatedly when it's failing.
This commit was supported by the NSF-funded DataLad project.
|
|
|
|
|
|
|
|
|
| |
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.
This commit was supported by the NSF-funded DataLad project.
|
|
|
|
|
|
|
|
|
|
| |
Not yet used, but essential for resuming cleanly.
Note that, in normmal operation, only one commit is made to export.log
during an export; the incomplete version only gets to the journal and
is then overwritten.
This commit was supported by the NSF-funded DataLad project.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Straightforward enough, except for the needed belt-and-suspenders sanity
checks to avoid foot shooting due to exports not being key/value stores.
* Even when annex.verify=false, always verify from exports.
* Only get files from exports that use a backend that supports
checksum verification.
* Never trust exports, even if the user says to, because then
`git annex drop` would drop content if the export seemed to contain
a copy.
This commit was supported by the NSF-funded DataLad project.
|
|
|
|
|
|
|
|
|
|
|
| |
So it will be available later and elsewhere, even after GC.
I first though to use git update-index to do this, but feeding it a line
with a tree object seems to always cause it to generate a git subtree
merge. So, fell back to using the Git.Tree interface to maniupulate the
trees, and not involving the git-annex branch index file at all.
This commit was sponsored by Andreas Karlsson.
|
|
|
|
|
|
| |
Incremental export updates work now too.
This commit was sponsored by Anthony DeRobertis on Patreon.
|
|
|
|
| |
Removed its Show instance.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Can be used to override the default timestamps used in log files in the
git-annex branch. This is a dangerous environment variable; use with
caution.
Note that this only affects writing to the logs on the git-annex branch.
It is not used for metadata in git commits (other env vars can be set for
that).
There are many other places where timestamps are still used, that don't
get committed to git, but do touch disk. Including regular timestamps
of files, and timestamps embedded in some files in .git/annex/, including
the last fsck timestamp and timestamps in transfer log files.
A good way to find such things in git-annex is to get for getPOSIXTime and
getCurrentTime, although some of the results are of course false positives
that never hit disk (unless git-annex gets swapped out..)
So this commit does NOT necessarily make git-annex comply with some HIPPA
privacy regulations; it's up to the user to determine if they can use it in
a way compliant with such regulations.
Benchmarking: It takes 0.00114 milliseconds to call getEnv
"GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log
files can be written with an added overhead of only 0.114 seconds. That
should be by far swamped by the actual overhead of writing the log files
and making the commit containing them.
This commit was supported by the NSF-funded DataLad project.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fsck already special-cased dead keys to make --all not report errors with
them, and it makes sense to also expand that to whereis. I think it makes
sense for dead keys to be skipped by all uses of --all, so mistakes can be
completely forgotten about and not come back to haunt us.
The speed impact of testing if the key is dead is negligible for fsck and
whereis, since they use the location log anyway and it gets cached.
This does slow down a few commands that support --all, in particular
metadata --all runs around 2x as slow. I don't think metadata
--all is often used though. It might slow down copy/move/mirror
--all and get --all.
log --all is not affected (does not use the normal --all machinery).
Dead keys will still be processed by --incomplete, --branch,
--failed, and --key. Although it would be unlikely for a dead key to
ave in incomplete or failed transfer. It seems to make perfect sense for
--branch to process keys on the branch, even if dead.
(fsck's special-casing of dead keys was left in, so if one of these options
causes a dead key to be fscked, there will be a nice message.)
This commit was supported by the NSF-funded DataLad project.
|
|
|
|
|
|
| |
classroom setting.
This commit was supported by the NSF-funded DataLad project.
|
| |
|
|
|
|
|
|
| |
To prevent any further mistakes like 1a497cefb47557f0b4788c606f9071be422b2511
This commit was sponsored by Francois Marier on Patreon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
2f868db90c7ba16eee901b9b1472b1e1a889dd93 changed the Read instance for
Key.
I've checked all uses of that instance (by removing it and seeing what
breaks), and they're all limited to the webapp, except one.
That is GitAnnexDistribution's Read instance.
So, 2f868db90c7ba16eee901b9b1472b1e1a889dd93 would have broken upgrades
of git-annex from downloads.kitenet.net. Once the .info files there got
updated for a new release, old releases would have failed to parse them
and never upgraded.
To fix this, I found a way to make the .info files that contain
GitAnnexDistribution values be readable by the old version of git-annex.
This commit was sponsored by Ewen McNeill.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... to control the default behavior in all clones of a repository.
This includes a new Configurable data type, so the GitConfig type indicates
which values can be configured this way.
The implementation should be quite efficient; the config log is only read
once, and only when a Configurable value has not already been set by
git-config.
Indeed, it would be nice in the future to extend this, so that git-config
is itself only read on demand. Some commands may not need to look at the
git configuration at all.
This commit was sponsored by Trenton Cronholm on Patreon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.
As well as being an optimisation, this helps move toward eliminating use of
missingh.
(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)
I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
|
| |
|
|
|
|
|
|
|
|
|
| |
Docs say vicfg can configure everything from git-annex branch,
so it ought to configure numcopies.
Note that commenting out existing numcopies does not unset it.
This commit was sponsored by Thom May on Patreon.
|
|
|
|
|
| |
This is a big scary change. I have convinced myself it should be safe. I
hope!
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.
Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.
This commit was sponsored by Ethan Aubin.
|
|
|
|
|
|
| |
This sped up git annex find --not --in web from 6.64s to 5.69s.
The optimised parser is probably more like 50% faster than the general one
it replaced.
|
|
|
|
| |
Avoid orphan instance warning
|
| |
|
|
|
|
|
|
|
|
|
| |
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.
Noisy due to some refactoring into Types/
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
terminated lines written by some versions of git-annex on Windows.
This fixes strange displays in some cases, including whereis showing
many duplicate locations, and showing more total copies than actually
exist.
It's unknown if that lead to data loss when eg, dropping. At the moment,
it seems unlikely it could, since the UUID with \r's appended is not the
same as a UUID without, and so no remote matches it.
It's also unknown if \r's can leak in on windows, perhaps when merging the
git-annex branch.
|
|
|
|
| |
avoid redundant work for repeated ForgetDeadRemotes transitions
|
|
|
|
| |
For example: git-annex reinject --known /mnt/backup/*
|
|
|
|
|
|
| |
expressions that make sense in its context.
So, not "standard" or "lackingcopies", etc.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
annex.largefiles
This makes git annex clean not look at the git-annex branch at all,
and so speeds it up by 50% or more.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
added back with unchanged content.
Implemented with no additional overhead of compares etc.
This is safe to do for presence logs because of their locality of change;
a given repo's presence logs are only ever changed in that repo, or in a
repo that has just been actively changing the content of that repo.
So, we don't need to worry about a split-brain situation where there'd
be disagreement about the location of a key in a repo. And so, it's ok to
not update the timestamp when that's the only change that would be made
due to logging presence info.
|
| |
|
| |
|
|
|
|
|
|
|
| |
was set to the empty string and the other set to some expression, this bug caused all files to be wanted, instead of only files matching the expression.
Avoid: MAny `MOr` otherexpression
Which matches anything.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I want this as fast as possible, so it can be added to code paths without
slowing them down.
Avoid the set lookup, and rely on laziness,
drops runtime from 14.37 ns to 11.03 ns according to this criterion benchmark:
import Criterion.Main
import qualified Types.Difference as New
import qualified Types.DifferenceOld as Old
main :: IO ()
main = defaultMain
[ bgroup "hasDifference"
[ bench "new" $ whnf (New.hasDifference New.OneLevelObjectHash) new
, bench "old" $ whnf (Old.hasDifference Old.OneLevelObjectHash) old
]
]
where
s = "fromList [ObjectHashLower, OneLevelObjectHash, OneLevelBranchHash]"
new = New.readDifferences s
old = Old.readDifferences s
A little bit of added boilerplate, but I suppose it's worth it to not
need to worry about set lookup overhead. Note that adding more differences
would slow down the old implementation; the new implementation will run
the same speed.
|
| |
|
|
|
|
| |
in a bare repo. Otherwise, still reports files with lost contents, even if the content is dead.
|