summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2017-04-24 10:36:58 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2017-04-24 10:36:58 -0400
commitf7f3273db89e7d8402178bf7e8736325c7c11370 (patch)
treefc922ee993fa5712d8e6e2be4b3e692d3ada88c2
parent868153da1f17d07b176882261530f055e03795b1 (diff)
parentef6a2807e9b1ac5c4650560f4a9f702b67b4dcc2 (diff)
Merge branch 'master' of ssh://git-annex.branchable.com
-rw-r--r--doc/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock.mdwn247
-rw-r--r--doc/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock/comment_1_d165e4895e7043192888751218261282._comment16
-rw-r--r--doc/forum/Lots_of_4k_symlinks.mdwn2
-rw-r--r--doc/forum/Lots_of_4k_symlinks/comment_1_96384eaeef1d067a24678c7aa3613063._comment21
-rw-r--r--doc/forum/git_annex_init_timeout/comment_3_1e733fad01e6b420c7fd9f7832e9b3f7._comment10
-rw-r--r--doc/tips/Repositories_with_large_number_of_files.mdwn15
-rw-r--r--doc/tips/Repositories_with_large_number_of_files/comment_6_9169a33c06cf8aea231cdd8f51ce17b6._comment8
-rw-r--r--doc/tips/migrating_two_seperate_disconnected_directories_to_git_annex.mdwn2
-rw-r--r--doc/tips/splitting_a_repository.mdwn58
-rw-r--r--doc/todo/build_a_user_guide.mdwn40
-rw-r--r--doc/workflow.mdwn2
11 files changed, 416 insertions, 5 deletions
diff --git a/doc/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock.mdwn b/doc/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock.mdwn
new file mode 100644
index 000000000..3077bd017
--- /dev/null
+++ b/doc/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock.mdwn
@@ -0,0 +1,247 @@
+### Please describe the problem.
+
+I have a Synology DS216+ NAS, which can export shares via SMB2, AFP or NFS. Since I have a lot of "media" data in git-annex, I'd like to be able to put a useful copy of that data (ie, original file names) onto the NAS, and have git-annex track that content as another "local" copy of the content that is available (rather than having to track it by hand).
+
+Because the NFS shares have UID mapping issues (AFAICT NFS v4 id mapping is not supported without Kerberos authentication, due to changes since about Linux 3.4; the Synology is based on a Linux 3.10 kernel), I have been trying to use SMBv2 shares, with symlinks enabled ("Allow symbolic links within shared folders"). I'm aware that `git-annex` on a network file system is less supported, but this is still a use case that would be quite useful to manage larger pools of files.
+
+After updating to the latest release (to [fix git annex init timeouts](https://git-annex.branchable.com/forum/git_annex_init_timeout/)), I am able to complete the `git clone ...`, `git annex init`, and `git sync` steps. (The SMB mount supports symlinks, but not hard links; hence pidlocks being needed.)
+
+But `git annex get` or `git annex copy` to transfer any of the content files fails with:
+
+ get README (transfer already in progress, or unable to take transfer lock)
+ Unable to access these remotes: ashram-data
+
+whether initiated from the SMB mounted `git-annex` or a `git-annex` with the data on a local file system. No content files get transferred.
+
+Of note, on this "SMB share with symlinks enabled" I can manually create symlinks, but `git annex` still detects symlinks as not possible and switches to direct mode. However it leaves the existing symlinks from `git clone ...` in place. It is unclear to me whether this is part of the cause of the "transfer lock" issue, or whether the transfer locks are, eg, not using pidlocks and thus failing because flock() does not work.
+
+### What steps will reproduce the problem?
+
+* Mount a remote file system via SMB v2, with symlinks enabled (eg mounting from a Samba share should be sufficient, as that's what is running on the Synology NAS)
+
+* `git clone` an existing `git-annex` onto a directory on that SMB mounted filesystem
+
+* `git remote ....` configure one or more remotes with the file data
+
+* `git annex init "SMB share"`
+
+* `git annex sync`
+
+* `git annex get ....` of an existing file
+
+or alternatively after the init/sync, go to a source `git-annex` and do:
+
+* `git remote add smb ....`
+
+* `git annex sync`
+
+* `git annex copy --to=smb ...` of an existing file with content in that `git-annex`
+
+### What version of git-annex are you using? On what operating system?
+
+`git-annex` 6.20170320 (downloaded 2017-04-23) on Mac OS X 10.11 (El Capitan):
+
+ ewen@ashram:~$ uname -a
+ Darwin ashram 15.6.0 Darwin Kernel Version 15.6.0: Fri Feb 17 10:21:18 PST 2017; root:xnu-3248.60.11.4.1~1/RELEASE_X86_64 x86_64
+ ewen@ashram:~$ git annex version
+ git-annex version: 6.20170320-g41c5d9d
+ build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV FsEvents ConcurrentOutput TorrentParser MagicMime Feeds Quvi
+ key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
+ remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
+ ewen@ashram:~$
+
+### Please provide any additional information below.
+
+[[!format sh """
+# Example session
+ewen@ashram:/nas01/annex/purchased$ git clone /bkup/purchased/mediabytes
+Cloning into 'mediabytes'...
+done.
+ewen@ashram:/nas01/annex/purchased$ cd mediabytes/
+ewen@ashram:/nas01/annex/purchased/mediabytes$ git remote add ashram-data /data/purchased/mediabytes
+ewen@ashram:/nas01/annex/purchased/mediabytes$ git remote add ashram /bkup/purchased/mediabytes
+ewen@ashram:/nas01/annex/purchased/mediabytes$ git remote remove origin
+ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex init 'nas01 file server'
+init nas01 file server
+ Detected a filesystem without POSIX fcntl lock support.
+
+ Enabling annex.pidlock.
+
+ Detected a filesystem without fifo support.
+
+ Disabling ssh connection caching.
+
+ Detected a crippled filesystem.
+
+ Disabling core.symlinks.
+
+ Enabling direct mode.
+ok
+(recording state in git...)
+ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex sync
+commit ok
+pull ashram
+From /bkup/purchased/mediabytes
+ * [new branch] git-annex -> ashram/git-annex
+ * [new branch] master -> ashram/master
+ * [new branch] synced/git-annex -> ashram/synced/git-annex
+ * [new branch] synced/master -> ashram/synced/master
+ok
+pull ashram-data
+From /data/purchased/mediabytes
+ * [new branch] git-annex -> ashram-data/git-annex
+ * [new branch] master -> ashram-data/master
+ * [new branch] synced/git-annex -> ashram-data/synced/git-annex
+ * [new branch] synced/master -> ashram-data/synced/master
+ok
+(merging ashram-data/git-annex into git-annex...)
+(recording state in git...)
+push ashram
+Counting objects: 8, done.
+Delta compression using up to 8 threads.
+Compressing objects: 100% (6/6), done.
+Writing objects: 100% (8/8), 806 bytes | 0 bytes/s, done.
+Total 8 (delta 2), reused 1 (delta 0)
+To /bkup/purchased/mediabytes
+ 02b7699..0edf575 git-annex -> synced/git-annex
+ok
+push ashram-data
+Counting objects: 8, done.
+Delta compression using up to 8 threads.
+Compressing objects: 100% (6/6), done.
+Writing objects: 100% (8/8), 806 bytes | 0 bytes/s, done.
+Total 8 (delta 2), reused 1 (delta 0)
+To /data/purchased/mediabytes
+ 02b7699..0edf575 git-annex -> synced/git-annex
+ok
+ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex sync
+commit ok
+pull ashram
+ok
+pull ashram-data
+ok
+ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex get .
+get README (transfer already in progress, or unable to take transfer lock)
+ Unable to access these remotes: ashram-data
+
+ Try making some of these repositories available:
+ 00420eb2-1c9a-47e8-abfa-675b9ffb0782 -- naos_wd_1 backup drive
+ 0ac1326e-9e03-4c1e-bd51-6cd664988caa -- Naos (colo) fileserver
+ 0e5fd428-ba5b-4817-8489-9780bc62e655 -- naos_wd_2 backup drive
+ 1144edcc-b18f-4cb1-a828-6c272d7788d6 -- naos_wd_3 backup drive
+ 6142cfd4-83ba-410f-bf54-fd4b79e49fdb -- ashram external data drive [ashram-data]
+ 77aeca6a-a4f8-4734-befc-2a0b50687c52 -- tv home fileserver
+ cc1c45f3-7c96-4103-b9c0-cce0f5f2cc36 -- bethel_data_drive
+ eb0ca2b6-e8c6-43b0-a805-de17a4d7eeae -- naos_wd_4 backup drive
+failed
+get alex-lindsay-1.m4a (transfer already in progress, or unable to take transfer lock)
+ Unable to access these remotes: ashram-data
+
+ Try making some of these repositories available:
+ 00420eb2-1c9a-47e8-abfa-675b9ffb0782 -- naos_wd_1 backup drive
+ 0ac1326e-9e03-4c1e-bd51-6cd664988caa -- Naos (colo) fileserver
+ 0e5fd428-ba5b-4817-8489-9780bc62e655 -- naos_wd_2 backup drive
+ 1144edcc-b18f-4cb1-a828-6c272d7788d6 -- naos_wd_3 backup drive
+ 6142cfd4-83ba-410f-bf54-fd4b79e49fdb -- ashram external data drive [ashram-data]
+ 77aeca6a-a4f8-4734-befc-2a0b50687c52 -- tv home fileserver
+ cc1c45f3-7c96-4103-b9c0-cce0f5f2cc36 -- bethel_data_drive
+ eb0ca2b6-e8c6-43b0-a805-de17a4d7eeae -- naos_wd_4 backup drive
+failed
+[...]
+[...]
+failed
+git-annex: get: 59 failed
+ewen@ashram:/nas01/annex/purchased/mediabytes$
+ewen@ashram:/nas01/annex/purchased/mediabytes$ ls -l | head
+total 118
+lrwx------ 1 ewen staff 182 23 Apr 18:53 README -> .git/annex/objects/3M/8w/SHA256E-s273--1eb5c07e12d99c99af8c163b5e005162820518020cc9922152852bd1422858ae/SHA256E-s273--1eb5c07e12d99c99af8c163b5e005162820518020cc9922152852bd1422858ae
+lrwx------ 1 ewen staff 200 23 Apr 18:53 alex-lindsay-1.m4a -> .git/annex/objects/9M/xJ/SHA256E-s26905448--c8ecda8ca67c3ff8ab1061b34a2974be41eb4b44421e3c39d1dda63f4de6b910.m4a/SHA256E-s26905448--c8ecda8ca67c3ff8ab1061b34a2974be41eb4b44421e3c39d1dda63f4de6b910.m4a
+lrwx------ 1 ewen staff 194 23 Apr 18:53 alexlindsay.txt -> .git/annex/objects/P7/7g/SHA256E-s17467--2e7ae57eff6db5a832681443eb3787d1d7f9e57a52e4fedb2b693136a3df19f0.txt/SHA256E-s17467--2e7ae57eff6db5a832681443eb3787d1d7f9e57a52e4fedb2b693136a3df19f0.txt
+lrwx------ 1 ewen staff 198 23 Apr 18:53 bourne-1.mp3 -> .git/annex/objects/98/vg/SHA256E-s7560841--a24ce7acabcb0dd46b9a9df6df8aff50145d8b1d5253308a51573a7742652943.mp3/SHA256E-s7560841--a24ce7acabcb0dd46b9a9df6df8aff50145d8b1d5253308a51573a7742652943.mp3
+lrwx------ 1 ewen staff 192 23 Apr 18:53 bourne.txt -> .git/annex/objects/QK/2p/SHA256E-s1031--d45a0f0aba935126bf29d6265e9c252faae8b735abdc20edbec69eb3cc9edcd7.txt/SHA256E-s1031--d45a0f0aba935126bf29d6265e9c252faae8b735abdc20edbec69eb3cc9edcd7.txt
+lrwx------ 1 ewen staff 200 23 Apr 18:53 chasejarvis-1.m4a -> .git/annex/objects/M1/8z/SHA256E-s80616681--d1b63892a9d68c51ddef9eb5c0a56ced795a449afe13d4884663ca6ecc76cade.m4a/SHA256E-s80616681--d1b63892a9d68c51ddef9eb5c0a56ced795a449afe13d4884663ca6ecc76cade.m4a
+lrwx------ 1 ewen staff 194 23 Apr 18:53 chasejarvis.txt -> .git/annex/objects/35/jv/SHA256E-s50443--687b0cf63cc8f5724e3a694e1de6221ca4d1dd3449a6e6ad9b6f34501bfa30ce.txt/SHA256E-s50443--687b0cf63cc8f5724e3a694e1de6221ca4d1dd3449a6e6ad9b6f34501bfa30ce.txt
+lrwx------ 1 ewen staff 184 23 Apr 18:53 checksums -> .git/annex/objects/3Q/vv/SHA256E-s1672--31b3cb7ccbcafc96d5061775dc31aa974bb4ce2f16984b2ab0104eb66674d81b/SHA256E-s1672--31b3cb7ccbcafc96d5061775dc31aa974bb4ce2f16984b2ab0104eb66674d81b
+lrwx------ 1 ewen staff 200 23 Apr 18:53 chrismarquardt-1.m4a -> .git/annex/objects/k4/f3/SHA256E-s61861112--c14bacb5adfe3eca237547760183a986499808084dd1925a0202598981e4406b.m4a/SHA256E-s61861112--c14bacb5adfe3eca237547760183a986499808084dd1925a0202598981e4406b.m4a
+ewen@ashram:/nas01/annex/purchased/mediabytes$
+ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex info
+repository mode: direct
+trusted repositories: 0
+semitrusted repositories: 13
+ 00000000-0000-0000-0000-000000000001 -- web
+ 00000000-0000-0000-0000-000000000002 -- bittorrent
+ 00420eb2-1c9a-47e8-abfa-675b9ffb0782 -- naos_wd_1 backup drive
+ 0ac1326e-9e03-4c1e-bd51-6cd664988caa -- Naos (colo) fileserver
+ 0ba93724-85a8-4ac7-855b-e656a6abaf09 -- ashram (laptop) internal drive [ashram]
+ 0e5fd428-ba5b-4817-8489-9780bc62e655 -- naos_wd_2 backup drive
+ 1144edcc-b18f-4cb1-a828-6c272d7788d6 -- naos_wd_3 backup drive
+ 4e5bfc02-1adc-44d2-a60a-885d9120ed5d -- nas01 file server [here]
+ 6142cfd4-83ba-410f-bf54-fd4b79e49fdb -- ashram external data drive [ashram-data]
+ 77aeca6a-a4f8-4734-befc-2a0b50687c52 -- tv home fileserver
+ bde87f8e-4740-411f-b068-3dbbd9db03d4 -- nas01 file server
+ cc1c45f3-7c96-4103-b9c0-cce0f5f2cc36 -- bethel_data_drive
+ eb0ca2b6-e8c6-43b0-a805-de17a4d7eeae -- naos_wd_4 backup drive
+untrusted repositories: 0
+transfers in progress: none
+available local disk space: 3.63 terabytes (+1 megabyte reserved)
+local annex keys: 0
+local annex size: 0 bytes
+annexed files in working tree: 59
+size of annexed files in working tree: 1.67 gigabytes
+bloom filter size: 32 mebibytes (0% full)
+backend usage:
+ SHA256E: 59
+ewen@ashram:/nas01/annex/purchased/mediabytes$
+"""]]
+
+For reference Synology DS216+ `/etc/samba/smb.conf`:
+
+ ewen@nas01:/volume1$ cat /etc/samba/smb.conf
+ [global]
+ printcap name=cups
+ winbind enum groups=yes
+ include=/var/tmp/nginx/smb.netbios.aliases.conf
+ min protocol=SMB2
+ security=user
+ local master=no
+ realm=*
+ passdb backend=smbpasswd
+ printing=cups
+ max protocol=SMB3
+ winbind enum users=yes
+ load printers=yes
+ workgroup=WORKGROUP
+ ewen@nas01:/volume1$
+
+and the relevant share:
+
+ ewen@nas01:/volume1$ cat /etc/samba/smb.share.conf
+ [annex]
+ recycle bin admin only=no
+ ftp disable modify=no
+ ftp disable download=no
+ write list=nobody,nobody
+ browseable=yes
+ mediaindex=no
+ hide unreadable=no
+ win share=yes
+ enable recycle bin=no
+ invalid users=nobody,nobody
+ read list=nobody,nobody
+ ftp disable list=no
+ edit synoacl=yes
+ valid users=nobody,nobody
+ writeable=yes
+ guest ok=yes
+ path=/volume1/annex
+ skip smb perm=yes
+ comment="Git Annexes"
+ [music]
+ [...]
+ ewen@nas01:/volume1$
+
+### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
+
+I'm an extremely regular user of git-annex on OS X and Linux, for several years, using it as a podcatcher and to manage most of my "large file" media. It's one of those "couldn't live without" tools. Thanks for writing it.
+
+(Touched, to ask for comments to be emailed to me)
diff --git a/doc/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock/comment_1_d165e4895e7043192888751218261282._comment b/doc/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock/comment_1_d165e4895e7043192888751218261282._comment
new file mode 100644
index 000000000..7b92837e3
--- /dev/null
+++ b/doc/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock/comment_1_d165e4895e7043192888751218261282._comment
@@ -0,0 +1,16 @@
+[[!comment format=mdwn
+ username="ewen"
+ avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
+ subject="git annex standalone on Synology NAS"
+ date="2017-04-23T07:52:19Z"
+ content="""
+After posting this bug, I found a [git annex tip on running a standalone build on the Synology NAS](http://git-annex.branchable.com/tips/Synology_NAS_and_git_annex/), so that may provide another workaround for me. Particularly since the Synology DS216+ is actually a x86-64 based system:
+
+ ewen@nas01:/volume1$ uname -a
+ Linux nas01 3.10.102 #15047 SMP Thu Feb 23 02:23:28 CST 2017 x86_64 GNU/Linux synology_braswell_216+II
+ ewen@nas01:/volume1$
+
+but it is probably still worth figuring out the cause of the transfer lock issue on SMB for other SMB use cases.
+
+Ewen
+"""]]
diff --git a/doc/forum/Lots_of_4k_symlinks.mdwn b/doc/forum/Lots_of_4k_symlinks.mdwn
index c9113d919..7f90f874d 100644
--- a/doc/forum/Lots_of_4k_symlinks.mdwn
+++ b/doc/forum/Lots_of_4k_symlinks.mdwn
@@ -1,6 +1,6 @@
Hi,
-this is a minor issue and probably there is no better solution, but nevertheless I would like to point out it and maybe discuss a little about the issue.
+this is a minor issue and probably there is no better solution, but nevertheless I would like to point it out and maybe discuss a little about the issue.
Given that the symlinks generated by annex are pretty large in size (they point to a file named by a large hash number), ext4 is using an entire block (4K) of storage instead of [embedding the symlink into the inode][inode] itself. For the "archivist use case" of annex, this might lead to tens or hundreds of MBs of disk occupied by symlinks which actually don't add up to more than a few MBs.
diff --git a/doc/forum/Lots_of_4k_symlinks/comment_1_96384eaeef1d067a24678c7aa3613063._comment b/doc/forum/Lots_of_4k_symlinks/comment_1_96384eaeef1d067a24678c7aa3613063._comment
new file mode 100644
index 000000000..10a9bbdc2
--- /dev/null
+++ b/doc/forum/Lots_of_4k_symlinks/comment_1_96384eaeef1d067a24678c7aa3613063._comment
@@ -0,0 +1,21 @@
+[[!comment format=mdwn
+ username="CandyAngel"
+ avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
+ subject="comment 1"
+ date="2017-04-24T13:55:02Z"
+ content="""
+> For the \"archivist use case\" of annex, this might lead to tens or hundreds of MBs of disk occupied by symlinks which actually don't add up to more than a few MBs.
+
+ $ pwd
+ /home/sorting_annex/mnt/keyfile
+ $ du -shc *-*
+ ...
+ 33M fd0dc9d3-ad62-429e-ba1b-acc26a453ca4
+ 33M fd2fc989-bea7-4ffb-bbc8-2e34cd0e5be5
+ 33M fd79bbd4-d41e-4ea8-acc8-86437c5eed7c
+ 33M ffbd042e-f6d9-4450-9a57-8ed1086f587c
+ 2.7G total
+ $
+
+Just a bit :P (yes, that is 2.7G of symlinks *so far*)
+"""]]
diff --git a/doc/forum/git_annex_init_timeout/comment_3_1e733fad01e6b420c7fd9f7832e9b3f7._comment b/doc/forum/git_annex_init_timeout/comment_3_1e733fad01e6b420c7fd9f7832e9b3f7._comment
new file mode 100644
index 000000000..a047306c0
--- /dev/null
+++ b/doc/forum/git_annex_init_timeout/comment_3_1e733fad01e6b420c7fd9f7832e9b3f7._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="ewen"
+ avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
+ subject="git annex get ... transfer lock issues"
+ date="2017-04-23T07:36:08Z"
+ content="""
+The fix a couple of months ago appears to allow `git annex init` to complete on a SMB file share, which is great. But FTR there are currently [issues with file transfers](https://git-annex.branchable.com/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock/) (link is to bug report about failing to get transfer lock), so use with a NAS is still \"work in progress\".
+
+Ewen
+"""]]
diff --git a/doc/tips/Repositories_with_large_number_of_files.mdwn b/doc/tips/Repositories_with_large_number_of_files.mdwn
index c1f219eee..347f6f94a 100644
--- a/doc/tips/Repositories_with_large_number_of_files.mdwn
+++ b/doc/tips/Repositories_with_large_number_of_files.mdwn
@@ -1,5 +1,7 @@
Just as git does not scale well with large files, it can also become painful to work with when you have a large *number* of files. Below are things I have found to minimise the pain.
+[[!toc]]
+
# Using version 4 index files
During operations which affect the index, git writes an entirely new index out to index.lck and then replaces .git/index with it. With a large number of files, this index file can be quite large and take several seconds to write every time you manipulate the index!
@@ -40,9 +42,14 @@ If it takes a long time to list the files in a directory, naturally, git(-annex)
You can avoid this by keeping the number of files in a directory to between 5000 and 20000 (depends on the filesystem and its settings).
-[fpart](http://contribs.martymac.org/fpart/) can be a very useful tool to achieve this.
+[fpart](https://sourceforge.net/projects/fpart/) can be a very useful tool to achieve this.
+
+This sort of usage was discussed in [[forum/Handling_a_large_number_of_files]] and [[forum/__34__git_annex_sync__34___synced_after_8_hours]]. -- [[CandyAngel]]
+
+# Forget tracking information
+
+In addition to keeping track of where files are, git-annex keeps a *log* that keeps track of where files *were*. This can take up space as well and slow down certain operations.
-## Topics discussing this sort of usage
+You can use the [[git-annex-forget]] command to drop historical location tracking info for files.
-* [[forum/Handling_a_large_number_of_files]]
-* [[forum/__34__git_annex_sync__34___synced_after_8_hours]]
+Note: this was discussed in [[forum/scalability_with_lots_of_files]]. -- [[anarcat]]
diff --git a/doc/tips/Repositories_with_large_number_of_files/comment_6_9169a33c06cf8aea231cdd8f51ce17b6._comment b/doc/tips/Repositories_with_large_number_of_files/comment_6_9169a33c06cf8aea231cdd8f51ce17b6._comment
new file mode 100644
index 000000000..1fb1de999
--- /dev/null
+++ b/doc/tips/Repositories_with_large_number_of_files/comment_6_9169a33c06cf8aea231cdd8f51ce17b6._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="anarcat"
+ avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
+ subject="merge with scalability?"
+ date="2017-04-24T14:10:29Z"
+ content="""
+shouldn't this tip be merged into the [[scalability]] page directly?
+"""]]
diff --git a/doc/tips/migrating_two_seperate_disconnected_directories_to_git_annex.mdwn b/doc/tips/migrating_two_seperate_disconnected_directories_to_git_annex.mdwn
index 8f078c78b..c8faf340c 100644
--- a/doc/tips/migrating_two_seperate_disconnected_directories_to_git_annex.mdwn
+++ b/doc/tips/migrating_two_seperate_disconnected_directories_to_git_annex.mdwn
@@ -1,3 +1,5 @@
+Note: this is the reverse of [[splitting_a_repository]].
+
Scenario
--------
diff --git a/doc/tips/splitting_a_repository.mdwn b/doc/tips/splitting_a_repository.mdwn
new file mode 100644
index 000000000..89080f620
--- /dev/null
+++ b/doc/tips/splitting_a_repository.mdwn
@@ -0,0 +1,58 @@
+[[!meta title="Splitting a git-annex repository"]]
+
+Note: this is the reverse of [[migrating two seperate disconnected directories to git annex]].
+
+I have a [git annex](https://git-annex.branchable.com/) repo for all my media
+that has grown to 57866 files and git operations are getting slow, especially
+on external spinning hard drives, so I decided to split it into separate
+repositories.
+
+This is how I did it, with some help from `#git-annex`. Suppose the old big repo is at `~/oldrepo`:
+
+```
+# Create a new repo for photos only
+mkdir ~/photos
+cd photos
+git init
+git annex init laptop
+
+# Hardlink all the annexed data from the old repo
+cp -rl ~/oldrepo/.git/annex/objects .git/annex/
+
+# Regenerate the git annex metadata
+git annex fsck --fast
+
+# Also split the repo on the usb key
+cd /media/usbkey
+git clone ~/photos
+cd photos
+git annex init usbkey
+cp -rl ../oldrepo/.git/annex/objects .git/annex/
+git annex fsck --fast
+
+# Connect the annexes as remotes of each other
+git remote add laptop ~/photos
+cd ~/photos
+git remote add usbkey /media/usbkey
+```
+
+At this point, I went through all repos doing standard cleanup:
+
+```
+# Remove unneeded hard links
+git annex unused
+git annex dropunused --force 1-12345
+
+# Sync
+git annex sync
+```
+
+To make sure nothing is missing, I used `git annex find --not --in=here`
+to see if, for example, the usbkey that should have everything could be missing
+some thing.
+
+Update: Antoine Beaupré pointed me to
+[this tip about Repositories with large number of files](http://git-annex.branchable.com/tips/Repositories_with_large_number_of_files/)
+which I will try next time one of my repositories grows enough to hit a performance issue.
+
+> This document was originally written by [Enrico Zini](http://www.enricozini.org/blog/2017/debian/splitting-a-git-annex-repository/) and added to this wiki by [[anarcat]].
diff --git a/doc/todo/build_a_user_guide.mdwn b/doc/todo/build_a_user_guide.mdwn
index 44b350eb4..1fcf9bf93 100644
--- a/doc/todo/build_a_user_guide.mdwn
+++ b/doc/todo/build_a_user_guide.mdwn
@@ -3,3 +3,43 @@ there's a lot of good documentation on this wiki, but it's hard to find sometime
a good example of this problem is [[todo/document_standard_groups_more_extensively_in_the_UI]]. --[[anarcat]]
update: a beginning of this may be the the [[workflow]] page but it lacks a lot of details...
+
+So we have those entry points so far:
+
+ * [[git-annex]] - the manpage
+ * [[walkthrough]] - "A walkthrough of some of the basic features of git-annex, using the command line", described as "only one possible workflow for using git-annex"
+ * [[assistant]] - a whole subtree of pages describing the assistant, includes a [[assistant/quickstart]] - introduction to the assistant with a series of screenshots, described in [[walkthrough]] as "If you don't want to use the command line, see quickstart instead.", linked from the [[assistant]] page
+ * [[workflow]] - a summary of the different workflows that git-annex can use
+ * [[special remotes]] - a good list of "supported backends", which may be a better wording
+ * inversely, [[not]] is what is *not* supported, obviously
+ * [[install]] - how to install git-annex, of course
+ * [[tips]] - a mish-mash list of "how to do X in git-annex", 68 pages at the time of writing
+ * there's the "details" section on the frontpage which covers lots of the [[internals]], [[design]] and so on
+ * there are also what i consider to be "leaf" pages like [[how it works]] or [[sync]] there
+
+So it seems the fundamentals of such a user guide are there. It's just a matter of grouping this in a meaningful way.
+
+I am thinking the following structure may be a good basis:
+
+ * Introduction
+ * [[how it works]]?
+ * ...?
+ * [[Install|install]]
+ * Walkthrough
+ * [[comandline|walkthrough]]
+ * [[graphical|assistant/quickstart]]?
+ * How do I...
+ * [[Supported backends|special remotes]]
+ * [[Unsupported features|not]]
+ * [[split repositories|tips/splitting_a_repository]]
+ * [[merge repositories|tips/migrating_two_seperate_disconnected_directories_to_git_annex]]
+ * deal with lots of files: [[tips/Repositories_with_large_number_of_files]] or merge into [[scalability/]]?
+ * decide which files go where? something about [[preferred_content]] and [[preferred_content/standard_groups/]]?
+ * sort and regroup the best [[tips]] pages
+ * Troubleshooting / FAQ?
+ * [[Common mistakes|tips/antipatterns]]
+ * sort out the best of [[forums]] and [[tips]] that commonly occur
+ * [[Design]]
+ * [[internals]]
+ * [[encryption]]
+ * ... etc - all the developer stuff users shouldn't usually have to know unless they care about performance or need to reimplement something
diff --git a/doc/workflow.mdwn b/doc/workflow.mdwn
index 1855649c1..e5d440c6d 100644
--- a/doc/workflow.mdwn
+++ b/doc/workflow.mdwn
@@ -27,6 +27,8 @@ produce file changes. When you move files into or out of your repository
folder, git-annex should record the changes and automatically propagate
them to other connected machines.
+The webapp is part of the larger [[assistant]].
+
# 2. [[git annex assistant|git-annex-assistant]] without the webapp
You could call [[`git annex assistant`|git-annex-assistant]] the