summaryrefslogtreecommitdiff
path: root/doc/todo/inject_on_import/comment_1_314de82da94f47539e754e9bec950d7b._comment
blob: 693f967cf0e6207b61d21a26d0504686fea898eb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
[[!comment format=mdwn
 username="Xyem"
 subject="comment 1"
 date="2015-03-20T10:51:59Z"
 content="""
I just tested this and git-annex does not act according to its documentation.

    ## git init annex
    Initialized empty Git repository in ./annex/.git/
    ## mkdir to_import
    mkdir: created directory ‘to_import’
    ## echo A > annex/A
    ## echo B > annex/B
    ## echo A > to_import/A
    ## echo B > to_import/B
    ## echo C > to_import/C
    ## cd annex/
    ## git annex init
    init  ok
    (Recording state in git...)
    ## git annex add .
    add A ok
    add B ok
    (Recording state in git...)
    ## git commit -m \"commit\"
    [master (root-commit) f2628f0] commit
     2 files changed, 2 insertions(+)
     create mode 120000 A
     create mode 120000 B

    ## git annex drop B --force
    drop B ok
    (Recording state in git...)

Annex knows about A and B, A is present but B is not.

to_import contains A, B and C.

    ## git annex import --deduplicate ../to_import/
    import to_import/C ok
    import to_import/A (duplicate) ok
    import to_import/B ok
    (Recording state in git...)
    ## ls -R
    .:
    A  B  to_import
    
    ./to_import:
    B  C

According to the docs, it would look like this:

    ## ls -R
    .:
    A  B  to_import
    
    ./to_import:
    C

and B would be a dangling symlink (due to the content having been seen by the annex before, but not present).

So '--import --deduplicate' actually imports/deduplicates files whose content /is not present/ in the annex, rather than 'has not been seen before'.

Fortunately, this error is on the side of caution and data is never lost, but you do end up with duplicates (2 symlinks pointing to the same content).

The behaviour is acceptable (to me at least, better duplicate symlinks than lost data), but the documentation is misleading and should probably refer to content presence rather than having ever been present.
"""]]