summaryrefslogtreecommitdiff
path: root/doc/bugs/clean-duplicates_causes_data_loss.mdwn
blob: c5d545420a71b1070d6de977b229d369dc37031b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
### Please describe the problem.

Use of git-annex import --clean-duplicates can cause data loss, where git-annex deletes content that it doesn't actually have a copy of (i.e. there is no duplicate).

### What steps will reproduce the problem?

I've written a quick'n'dirty test script which goes through a bunch of combinations and tests --clean-duplicates. Given an 'origin' repo containing 'a' and 'b' content and a clone of it ('import') which doesn't contain 'a' and 'b' content.

g-a import --clean-duplicates ~/tmp/importme (containing a, b and c) into 'import' after:

    Origin is set to trusted in import, b is dropped from within origin:
  b is deleted from importme even though no annexes have copies (reasonable, as origin is set to trusted and import thinks it has the content).

    Origin is set to semitrusted in import, b is dropped within origin:
  b is deleted from importme even though no annexes have copies (this is most likely one to bite people).

    Origin is set to untrusted in import, b is dropped within origin:
  b is deleted from importme even though no annexes have copies and git-annex has been explicitly told to not trust information about origin :( This is really surprising behaviour!

### What version of git-annex are you using? On what operating system?

* 5.20150409
* Arch Linux (git-annex-bin)

### Please provide any additional information below.

I can provide the script if it is wanted (coded in Perl, couple of non-core dependencies).

> Decided to go ahead and make it check remotes like drop does, so [[done]]
> --[[Joey]]