summaryrefslogtreecommitdiff
path: root/doc/bugs/Large_unannex_operations_result_in_stale_symlinks_and_data_loss.mdwn
blob: 2629a7d56eade6bb668e58ae18df60acb6c9c63c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
## What steps will reproduce the problem?

Take a large sub-directory in a repository (e.g. `ccash`) with some files within,

     $ tar -xzf ccash.tar.gz
     $ du -sh ccash
     59M	ccash
     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     -rw-r--r-- 1 dietz dietz   1748 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java
     -rw-r--r-- 1 dietz dietz 313898 May 22 18:36 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar

Annex it,

     $ git annex add ccash
     ...
     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     lrwxrwxrwx 1 dietz dietz 215 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java -> ../../../../../../../../../../../.git/annex/objects/mv/zf/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486
     lrwxrwxrwx 1 dietz dietz 210 Jul 27  2011 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar -> ../../../../../../../../.git/annex/objects/8G/gQ/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73

Unannex it (before or after committing),

     $ git annex unannex ccash

Note that some fraction of the files will still be symbolic links, now pointing to non-existent files. This data has apparently been lost forever.

     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     -rw-r--r-- 1 dietz dietz 1748 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java
     lrwxrwxrwx 1 dietz dietz  210 Jul 27  2011 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar -> ../../../../../../../../.git/annex/objects/8G/gQ/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73

It is unclear why some files are affected while others are not. That being said, unannexing small numbers of files at a time appears to avoid the issue,

     $ tar -zxf ccash.tar.gz
     $ git annex add ccash
     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     lrwxrwxrwx 1 dietz dietz 215 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java -> ../../../../../../../../../../../.git/annex/objects/mv/zf/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486
     lrwxrwxrwx 1 dietz dietz 210 Jul 27  2011 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar -> ../../../../../../../../.git/annex/objects/8G/gQ/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73
     $ git annex unannex ccash/trunk/DataProvider/WebContent/WEB-INF
     ...
     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     lrwxrwxrwx 1 dietz dietz    215 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java -> ../../../../../../../../../../../.git/annex/objects/mv/zf/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486
     -rw-r--r-- 1 dietz dietz 313898 Jul 27  2011 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar

For this reason, it seems likely this is due to some sort of race condition.


## What version of git-annex are you using? On what operating system?

This is on Ubuntu 12.04 with git-annex revision a1e2bc4.


> There was no good soluton to this, so I picked a bad one that 
> will not have users complainging git-annex ate their data.
> They will complain that `git annex unannex` is slow since it now copies
> the file, and  perhaps instead use --fast, and hopefully avoid destroying
> their own data by editing the resulting hard links.
> 
> [[done]] --[[Joey]]