summaryrefslogtreecommitdiff
path: root/doc/bugs/Large_unannex_operations_result_in_stale_symlinks_and_data_loss.mdwn
blob: 630db722b603ed93e75cf41d1149177b9051a876 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
## What steps will reproduce the problem?

Take a large sub-directory in a repository (e.g. `ccash`) with some files within,

     $ tar -xzf ccash.tar.gz
     $ du -sh ccash
     59M	ccash
     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     -rw-r--r-- 1 dietz dietz   1748 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java
     -rw-r--r-- 1 dietz dietz 313898 May 22 18:36 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar

Annex it,

     $ git annex add ccash
     ...
     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     lrwxrwxrwx 1 dietz dietz 215 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java -> ../../../../../../../../../../../.git/annex/objects/mv/zf/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486
     lrwxrwxrwx 1 dietz dietz 210 Jul 27  2011 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar -> ../../../../../../../../.git/annex/objects/8G/gQ/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73

Unannex it (before or after committing),

     $ git annex unannex ccash

Note that some fraction of the files will still be symbolic links, now pointing to non-existent files. This data has apparently been lost forever.

     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     -rw-r--r-- 1 dietz dietz 1748 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java
     lrwxrwxrwx 1 dietz dietz  210 Jul 27  2011 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar -> ../../../../../../../../.git/annex/objects/8G/gQ/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73

It is unclear why some files are affected while others are not. That being said, unannexing small numbers of files at a time appears to avoid the issue,

     $ tar -zxf ccash.tar.gz
     $ git annex add ccash
     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     lrwxrwxrwx 1 dietz dietz 215 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java -> ../../../../../../../../../../../.git/annex/objects/mv/zf/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486
     lrwxrwxrwx 1 dietz dietz 210 Jul 27  2011 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar -> ../../../../../../../../.git/annex/objects/8G/gQ/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73/SHA256-s313898--593552ffea3c5823c6602478b5002a7c525fd904a3c44f1abe4065c22edfac73
     $ git annex unannex ccash/trunk/DataProvider/WebContent/WEB-INF
     ...
     $ ls -l ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar 
     lrwxrwxrwx 1 dietz dietz    215 Jul 27  2011 ccash/trunk/annotationinterface/src/edu/byu/nlp/annotationinterface/java/BasicAnnotation.java -> ../../../../../../../../../../../.git/annex/objects/mv/zf/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486/SHA256-s1748--5c0d1cbf104214b6d0ab85c53a85cadb975ec208f42a7b33a76d85e175352486
     -rw-r--r-- 1 dietz dietz 313898 Jul 27  2011 ccash/trunk/DataProvider/WebContent/WEB-INF/lib/dom4j.jar

For this reason, it seems likely this is due to some sort of race condition.


## What version of git-annex are you using? On what operating system?

This is on Ubuntu 12.04 with git-annex revision a1e2bc4.