git-annex-gpl - git-annex without the AGPL

	Commit message (Collapse)	Author	Age
*	better dup key with -J fix	Joey Hess	2017-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This avoids all the complication about redundant work discussed in the previous try at fixing this. At the expense of needing each command that could have the problem to be patched to simply wrap the action in onlyActionOn once the key is known. But there do not seem to be many such commands. onlyActionOn' should not be used with a CommandStart (or CommandPerform), although the types do allow it. onlyActionOn handles running the whole CommandStart chain. I couldn't immediately see a way to avoid mistken use of onlyActionOn'. This commit was supported by the NSF-funded DataLad project.
*	Improve behavior when -J transfers multiple files that point to the same key	Joey Hess	2017-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After a false start, I found a fairly non-intrusive way to deal with it. Although it only handles transfers -- there may be issues with eg concurrent dropping of the same key, or other operations. There is no added overhead when -J is not used, other than an added inAnnex check. When -J is used, it has to maintain and check a small Set, which should be negligible overhead. It could output some message saying that the transfer is being done by another thread. Or it could even display the same progress info for both files that are being downloaded since they have the same content. But I opted to keep it simple, since this is rather an edge case, so it just doesn't say anything about the transfer of the file until the other thread finishes. Since the deferred transfer action still runs, actions that do more than transfer content will still get a chance to do their other work. (An example of something that needs to do such other work is P2P.Annex, where the download always needs to receive the content from the peer.) And, if the first thread fails to complete a transfer, the second thread can resume it. But, this unfortunately means that there's a risk of redundant work being done to transfer a key that just got transferred. That's not ideal, but should never cause breakage; the same thing can occur when running two separate git-annex processes. The get/move/copy/mirror --from commands had extra inAnnex checks added, inside the download actions. Without those checks, the first thread downloaded the content, and then the second thread woke up and downloaded the same content redundantly. move/copy/mirror --to is left doing redundant uploads for now. It would need a second checkPresent of the remote inside the upload to avoid them, which would be expensive. A better way to avoid redundant work needs to be found.. This commit was supported by the NSF-funded DataLad project.
*	Avoid repeated checking that files passed on the command line exist.	Joey Hess	2017-10-16
\| \| \| \| \| \| \| \| \| \| \|	git annex add, git annex lock etc make multiple seek passes, and each seek pass checked that files existed. That was unncessary redundant work. Fixed by adding a new WorkTreeItem type, make seek actions use it, and check that the files exist when constructing it. This commit was supported by the NSF-funded DataLad project.
*	avoid warning	Joey Hess	2017-10-16
\|
*	copy, move: Behave same with --fast when sending to remotes located on a ↵	Joey Hess	2017-09-29
\| \| \| \| \| \|	local disk as when sending to other remotes. Let --fast override use of hasKey even when hasKeyCheap.
*	sync: Added --cleanup, which removes local and remote synced/ branches.	Joey Hess	2017-09-28
\| \| \| \| \| \| \|	Also deletes any tagged pushes that the assistant might have done, since those would also prevent resetting a branch back. This commit was sponsored by andrea rota.
*	metadata: Added --remove-all.	Joey Hess	2017-09-28
\| \| \| \| \| \| \|	Motivation is to remove all metadata when it gets copied from a previous version of the file, and that is not deisrable. This commit was supported by the NSF-funded DataLad project.
*	add exporter thread to assistant	Joey Hess	2017-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is similar to the pusher thread, but a separate thread because git pushes can be done in parallel with exports, and updating a big export should not prevent other git pushes going out in the meantime. The exportThread only runs at most every 30 seconds, since updating an export is more expensive than pushing. This may need to be tuned. Added a separate channel for export commits; the committer records a commit in that channel. Also, reconnectRemotes records a dummy commit, to make the exporter thread wake up and make sure all exports are up-to-date. So, connecting a drive with a directory special remote export will immediately update it, and getting online will automatically update S3 and WebDAV exports. The transfer queue is not involved in exports. Instead, failed exports are retried much like failed pushes. This commit was sponsored by Ewen McNeill.
*	update transfer info and notify when exporting	Joey Hess	2017-09-20
\| \| \| \| \| \| \|	Same as is done for all other transfers of content, so the webapp will display progress bars etc. This commit was sponsored by Anthony DeRobertis on Patreon.
*	export --fast sets up but does not populate export	Joey Hess	2017-09-19
\| \| \| \|	sync --content finishes
*	git annex sync --content to exports	Joey Hess	2017-09-19
\| \| \| \| \| \|	Assistant still todo. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
*	configuration and docs for tracking exports	Joey Hess	2017-09-19
\| \| \| \| \| \|	Not yet handled by sync or assistant. This commit was sponsored by Nick Daly on Patreon.
*	merge changes made on other repos into ExportTree	Joey Hess	2017-09-18
\| \| \| \| \| \| \| \| \| \| \|	Now when one repository has exported a tree, another repository can get files from the export, after syncing. There's a bug: While the database update works, somehow the database on disk does not get updated, and so the database update is run the next time, etc. Wasn't able to figure out why yet. This commit was sponsored by Ole-Morten Duesund on Patreon.
*	update ExportTree table efficiently	Joey Hess	2017-09-18
\| \| \| \| \| \| \|	Use same diff and key lookup except when the whole tree has to be scanned. This commit was sponsored by Peter Hogg on Patreon.
*	add ExportTree table to export db	Joey Hess	2017-09-18
\| \| \| \| \| \| \| \| \| \| \| \|	New table needed to look up what filenames are used in the currently exported tree, for reasons explained in export.mdwn. Also, added smart constructors for ExportLocation and ExportDirectory to make sure they contain filepaths with the right direction slashes. And some code refactoring. This commit was sponsored by Francois Marier on Patreon.
*	lock to avoid more than one export to a remote at a time	Joey Hess	2017-09-18
\| \| \| \|	This commit was sponsored by Jack Hill on Patreon.
*	split out Types.Export	Joey Hess	2017-09-15
\|
*	avoid unncessary db queries when exported directory can't be empty	Joey Hess	2017-09-15
\| \| \| \| \| \|	In rename foo/bar to foo/baz, foo can't be empty. In delete zxyyz, there's no exported directory (top doesn't count).
*	remove empty directories when removing from export	Joey Hess	2017-09-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The subtle part of this is what happens when the remote fails to remove an empty directory. The removal from the export needs to fail in that case, so the removal will be tried again later. However, removeExportLocation has already been run and changed the export db, so if the next run checks getExportLocation, it might decide nothing remains to be done, leaving the empty directory. Dealt with that by making removeEmptyDirectories, handle a failure by calling addExportLocation, reverting the database changes so the next run will be guaranteed to try deleting the empty directory again. This commit was sponsored by Thomas Hochstein on Patreon.
*	trust level overridden message adjusted for forced untrusted export remotes	Joey Hess	2017-09-13
\|
*	remove debug print	Joey Hess	2017-09-12
\|
*	export: cache connections for S3 and webdav	Joey Hess	2017-09-12
\|
*	leave export logged as incomplete if initial renames fail	Joey Hess	2017-09-12
\| \| \| \| \| \| \| \| \| \|	This way, the temp files that might be left due to failure will be cleaned up next time. Also, nub the list of incomplete exports to avoid repeatedly adding the same tree to it when running export repeatedly when it's failing. This commit was supported by the NSF-funded DataLad project.
*	export to webdav	Joey Hess	2017-09-12
\| \| \| \| \| \| \| \| \| \| \|	This basically works, but there's a bug when renaming a file that leaves a .git-annex-temp-content-key file in the webdav store, that never gets cleaned up. Also, exporting files with spaces to box.com seems to fail; perhaps it does not support it? This commit was supported by the NSF-funded DataLad project.
*	interrupted export recovery bugfixes	Joey Hess	2017-09-07
\| \| \| \| \| \| \| \| \| \| \| \| \|	When an export was interrupted, the sqlite database won't have been committed necessarily. Also, the interrupted export might have been run in an entirely different repository. There's not a significant speed benefit in checking getExportLocation in this case anyway, so avoid it. Also, remove the old filename from the export database. Recovery from interrupted exports is now tested working. This commit was supported by the NSF-funded DataLad project.
*	avoid renaming to temp files before deleting	Joey Hess	2017-09-07
\| \| \| \| \| \| \| \| \| \|	Only rename when actually ncessary. The diff gets buffered in memory. Probably git has to buffer a diff in memory when generating it as well, so this memory usage should not be a problem, even when the diff is very large. I hope. This commit was supported by the NSF-funded DataLad project.
*	prevent exporttree=yes on remotes that don't support exports	Joey Hess	2017-09-07
\| \| \| \| \| \| \| \| \|	Don't allow "exporttree=yes" to be set when the special remote does not support exports. That would be confusing since the user would set up a special remote for exports, but `git annex export` to it would later fail. This commit was supported by the NSF-funded DataLad project.
*	bugfix	Joey Hess	2017-09-06
\|
*	export file renaming	Joey Hess	2017-09-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is seriously super hairy. It has to handle interrupted exports, which may be resumed with the same or a different tree. It also has to recover from export conflicts, which could cause the wrong content to be renamed to a file. I think this works, or is close to working. See the update to the design for how it works. This is definitely not optimal, in that it does more renames than are necessary. It would probably be worth finding the keys that are really renamed and only renaming those. But let's get the "simple" approach to work first.. This commit was supported by the NSF-funded DataLad project.
*	record incomplete exports in export.log	Joey Hess	2017-09-06
\| \| \| \| \| \| \| \| \| \|	Not yet used, but essential for resuming cleanly. Note that, in normmal operation, only one commit is made to export.log during an export; the incomplete version only gets to the journal and is then overwritten. This commit was supported by the NSF-funded DataLad project.
*	use export db to correctly handle duplicate files	Joey Hess	2017-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed uncorrect UniqueKey key in db schema; a key can appear multiple times with different files. The database has to be flushed after each removal. But when adding files to the export, lots of changes are able to be queued up w/o flushing. So it's still fairly efficient. If large removals of files from exports are too slow, an alternative would be to make two passes over the diff, one pass queueing deletions from the database, then a flush and the a second pass updating the location log. But that would use more memory, and need to look up exportKey twice per removed file, so I've avoided such optimisation yet. This commit was supported by the NSF-funded DataLad project.
*	flush queued changes to export db on exit	Joey Hess	2017-09-04
\|
*	remove some backtraces on user errors	Joey Hess	2017-09-04
\|
*	track exported files in a sqlite database	Joey Hess	2017-09-04
\| \| \| \| \| \| \| \| \|	Went with a separate db per export remote, rather than a single export database. Mostly because there will probably not be a lot of separate export remotes, and it might be convenient to be able to delete a given remote's export database. This commit was supported by the NSF-funded DataLad project.
*	implement exporttree=yes configuration	Joey Hess	2017-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Only export to remotes that were initialized to support it. * Prevent storing key/value on export remotes. * Prevent enabling exporttree=yes and encryption in the same remote. SetupStage Enable was changed to take the old RemoteConfig. This allowed only setting exporttree when initially setting up a remote, and not configuring it later after stuff might already be stored in the remote. Went with =yes rather than =true for consistency with other parts of git-annex. Changed docs accordingly. This commit was supported by the NSF-funded DataLad project.
*	refactor ExportActions	Joey Hess	2017-09-01
\| \| \| \| \| \| \| \|	This will allow disabling exports for remotes that are not configured to allow them. Also, exportSupported will be useful for the external special remote to probe. This commit was supported by the NSF-funded DataLad project
*	graft exported tree into git-annex branch	Joey Hess	2017-08-31
\| \| \| \| \| \| \| \| \| \| \|	So it will be available later and elsewhere, even after GC. I first though to use git update-index to do this, but feeding it a line with a tree object seems to always cause it to generate a git subtree merge. So, fell back to using the Git.Tree interface to maniupulate the trees, and not involving the git-annex branch index file at all. This commit was sponsored by Andreas Karlsson.
*	implement export.log and resolve export conflicts	Joey Hess	2017-08-31
\| \| \| \| \| \|	Incremental export updates work now too. This commit was sponsored by Anthony DeRobertis on Patreon.
*	resuming exports	Joey Hess	2017-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \|	Make a pass over the whole exported tree, and upload anything that has not yet reached the export. Update location log when exporting. Note that the synthesized keys for non-annexed files are stored in the location log too. Some cases involving files in the tree with the same content are not handled correctly yet. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
*	improve type	Joey Hess	2017-08-31
\|
*	fix error message when content to export is not locally available	Joey Hess	2017-08-31
\|
*	initial export command	Joey Hess	2017-08-29
\| \| \| \| \| \|	Very basic operation works, but of course this is only the beginning. This commit was sponsored by Nick Daly on Patreon.
*	toFeed was unused so remove	Joey Hess	2017-08-28
\|
*	Support building with feed-1.0, while still supporting older versions.	Joey Hess	2017-08-28
\| \| \| \|	This commit was sponsored by Jeff Goeke-Smith on Patreon.
*	add annex-ignore-command and annex-sync-command configs	Joey Hess	2017-08-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added remote configuration settings annex-ignore-command and annex-sync-command, which are dynamic equivilants of the annex-ignore and annex-sync configurations. For this I needed a new DynamicConfig infrastructure. Its implementation should be as fast as before when there is no dynamic config, and it caches so shell commands are only run once. Note that annex-ignore-command exits nonzero when the remote should be ignored. While that may seem backwards, it allows using the same command for it as for annex-sync-command when you want to disable both. This commit was sponsored by Trenton Cronholm on Patreon.
*	move, copy: Support --batch.	Joey Hess	2017-08-15
\|
*	Added GIT_ANNEX_VECTOR_CLOCK environment variable	Joey Hess	2017-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Can be used to override the default timestamps used in log files in the git-annex branch. This is a dangerous environment variable; use with caution. Note that this only affects writing to the logs on the git-annex branch. It is not used for metadata in git commits (other env vars can be set for that). There are many other places where timestamps are still used, that don't get committed to git, but do touch disk. Including regular timestamps of files, and timestamps embedded in some files in .git/annex/, including the last fsck timestamp and timestamps in transfer log files. A good way to find such things in git-annex is to get for getPOSIXTime and getCurrentTime, although some of the results are of course false positives that never hit disk (unless git-annex gets swapped out..) So this commit does NOT necessarily make git-annex comply with some HIPPA privacy regulations; it's up to the user to determine if they can use it in a way compliant with such regulations. Benchmarking: It takes 0.00114 milliseconds to call getEnv "GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log files can be written with an added overhead of only 0.114 seconds. That should be by far swamped by the actual overhead of writing the log files and making the commit containing them. This commit was supported by the NSF-funded DataLad project.
*	fsck: Support --json.	Joey Hess	2017-06-26
\| \| \| \| \| \| \|	One use case is to get a list of files that fsck fails on, in order to eg, drop them from a remote. This commit was sponsored by Nick Daly on Patreon.
*	support --to=. as shorthand for --to=here	Joey Hess	2017-06-01
\|
*	configuration to disable automatic merge conflict resolution	Joey Hess	2017-06-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Added annex.resolvemerge configuration, which can be set to false to disable the usual automatic merge conflict resolution done by git-annex sync and the assistant. * sync: Added --no-resolvemerge option. Note that disabling merge conflict resolution is probably not a good idea in a direct mode repo or adjusted branch. Since updates to both are done outside the usual work tree, if it fails the tree is not left in a conflicted state, and it would be hard to manually resolve the conflict. Still, made annex.resolvemerge be supported in those cases for consistency. This commit was sponsored by Riku Voipio.