doc/forum/Is_parallel_fsck_for_two_or_more_different_directories_possible__63__.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

I can put `git-annex fsck` in a loop to check a large directory like this:

`-S` starts an incremental check, `-m` continues the started incremental check, `&>>` appends all output (both `stdout` and `stderr`) into the `fsck.log` file. 

```
$ git-annex fsck -S large-directory --from remote-repo --time-limit=60s &>>~/log/fsck.log
#...
#...
#...
$ while (sleep 10); do
  git-annex fsck -m large-directory --from remote-repo --time-limit=1h &>>~/log/fsck.log
#...
#...
#...
done;
```

I need the loop because the connection to `remote-repo` fails after some time (or because remote server error) and needs a reconnect, after that, everything is ok.

Suppose, I have many large directories and it would be faster to check them if I could run them parallelly. Many small files, they do not take too much bandwidth but more I/O and network communication.

I know that the progress of `fsck` is stored in a database (now after every 1000 files or 5 minutes or `--time-limit`) but is the checked directory (large-directory) is taken into account when starting/storing the progress? 

**Is the checked directory/path in the primary-key?** Or is it much more complicated?

If I could start checking many directories in the same time, `fsck` would finish much faster (think about thousands of small icon files). Is it just me or somebody else could profit from this?

(This is _not_ a feature request, I would like to know if anybody needs this, if possible at all.)

Thanks,
parhuzamos