summaryrefslogtreecommitdiff
path: root/doc/internals.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'doc/internals.mdwn')
-rw-r--r--doc/internals.mdwn247
1 files changed, 247 insertions, 0 deletions
diff --git a/doc/internals.mdwn b/doc/internals.mdwn
new file mode 100644
index 000000000..bf0fa668c
--- /dev/null
+++ b/doc/internals.mdwn
@@ -0,0 +1,247 @@
+In the world of git, we're not scared about internal implementation
+details, and sometimes we like to dive in and tweak things by hand. Here's
+some documentation to that end.
+
+## `.git/annex/objects/aa/bb/*/*`
+
+This is where locally available file contents are actually stored.
+Files added to the annex get a symlink checked into git that points
+to the file content.
+
+First there are two levels of directories used for hashing, to prevent
+too many things ending up in any one directory.
+See [[hashing]] for details.
+
+Each subdirectory has the [[name_of_a_key|key_format]] in one of the
+[[key-value_backends|backends]]. The file inside also has the name of the key.
+This two-level structure is used because it allows the write bit to be removed
+from the subdirectories as well as from the files. That prevents accidentially
+deleting or changing the file contents. See [[lockdown]] for details.
+
+In [[direct_mode]], file contents are not stored in here, and instead
+are stored directly in the file. However, the same symlinks are still
+committed to git, internally.
+
+Also in [[direct_mode]], some additional data is stored in these directories.
+`.cache` files contain cached file stats used in detecting when a file has
+changed, and `.map` files contain a list of file(s) in the work directory
+that contain the key.
+
+# `.git/annex/tmp/`
+
+This directory contains partially transferred objects.
+
+# `.git/annex/misctmp/`
+
+This is a temp directory for miscellaneous other temp files.
+
+While .git/annex/objects and .git/annex/tmp can be put on different
+filesystems if desired, .git/annex/misctmp
+has to be on the same filesystem as the work tree and git repository.
+
+# `.git/annex/bad/`
+
+git-annex fsck puts any bad objects it finds in here.
+
+# `.git/annex/transfers/`
+
+Contains information files for uploads and downloads that are in progress,
+as well as any that have failed. Used especially by the assistant.
+It is safe to delete these files.
+
+# `.git/annex/ssh/`
+
+ssh connection caching files are written in here.
+
+# `.git/annex/index`
+
+This is a git index file which git-annex uses for commits to the git-annex
+branch.
+
+# `.git/annex/journal/`
+
+git-annex uses this to journal changes to the git-annex branch,
+before committing a set of changes.
+
+## The git-annex branch
+
+This branch is managed by git-annex, with the contents listed below.
+
+The file `.git/annex/index` is a separate git index file it uses
+to accumulate changes for the git-annex branch.
+Also, `.git/annex/journal/` is used to record changes before they
+are added to git.
+
+This branch operates on objects exclusively. No file names will ever
+be stored in this branch.
+
+The files stored in this branch are all designed to be auto-merged
+using git's [[union merge driver|git-union-merge]]. So each line
+has a timestamp, to allow the most recent information to be identified.
+
+### `uuid.log`
+
+Records the UUIDs of known repositories, and associates them with a
+description of the repository. This allows git-annex to display something
+more useful than a UUID when it refers to a repository that does not have
+a configured git remote pointing at it.
+
+The file format is simply one line per repository, with the uuid followed by a
+space and then the description, followed by a timestamp. Example:
+
+ e605dca6-446a-11e0-8b2a-002170d25c55 laptop timestamp=1317929189.157237s
+ 26339d22-446b-11e0-9101-002170d25c55 usb disk timestamp=1317929330.769997s
+
+## `numcopies.log`
+
+Records the global numcopies setting.
+
+The file format is simply a timestamp followed by a number.
+
+## `remote.log`
+
+Holds persistent configuration settings for [[special_remotes]] such as
+Amazon S3.
+
+The file format is one line per remote, starting with the uuid of the
+remote, followed by a space, and then a series of var=value pairs,
+each separated by whitespace, and finally a timestamp.
+
+Encrypted special remotes store their encryption key here,
+in the "cipher" value. It is base64 encoded, and unless shared [[encryption]]
+is used, is encrypted to one or more gpg keys. The first 256 bytes of
+the cipher is used as the HMAC SHA1 encryption key, to encrypt filenames
+stored on the special remote. The remainder of the cipher is used as a gpg
+symmetric encryption key, to encrypt the content of files stored on the special
+remote.
+
+## `trust.log`
+
+Records the [[trust]] information for repositories. Does not exist unless
+[[trust]] values are configured.
+
+The file format is one line per repository, with the uuid followed by a
+space, and then either `1` (trusted), `0` (untrusted), `?` (semi-trusted),
+`X` (dead) and finally a timestamp.
+
+Example:
+
+ e605dca6-446a-11e0-8b2a-002170d25c55 1 timestamp=1317929189.157237s
+ 26339d22-446b-11e0-9101-002170d25c55 ? timestamp=1317929330.769997s
+
+Repositories not listed are semi-trusted.
+
+## `group.log`
+
+Used to group repositories together.
+
+The file format is one line per repository, with the uuid followed by a space,
+and then a space-separated list of groups this repository is part of,
+and finally a timestamp.
+
+## `preferred-content.log`
+
+Used to indicate which repositories prefer to contain which file contents.
+
+The file format is one line per repository, with the uuid followed by a space,
+then a boolean expression, and finally a timestamp.
+
+Files matching the expression are preferred to be retained in the
+repository, while files not matching it are preferred to be stored
+somewhere else.
+
+## `required-content.log`
+
+Used to indicate which repositories are required to contain which file
+contents.
+
+File format is identical to preferred-content.log.
+
+## `group-preferred-content.log`
+
+Contains standard preferred content settings for groups. (Overriding or
+supplimenting the ones built into git-annex.)
+
+The file format is one line per group, staring with a timestamp, then a
+space, then the group name followed by a space and then the preferred
+content expression.
+
+## `aaa/bbb/*.log`
+
+These log files record [[location_tracking]] information
+for file contents. These are placed in two levels of subdirectories
+for hashing. See [[hashing]] for details.
+
+The name of the key is the filename, and the content
+consists of a timestamp, either 1 (present) or 0 (not present), and
+the UUID of the repository that has or lacks the file content.
+
+Example:
+
+ 1287290776.765152s 1 e605dca6-446a-11e0-8b2a-002170d25c55
+ 1287290767.478634s 0 26339d22-446b-11e0-9101-002170d25c55
+
+## `aaa/bbb/*.log.web`
+
+These log files record urls used by the
+[[web_special_remote|special_remotes/web]]. Their format is similar
+to the location tracking files, but with urls rather than UUIDs.
+
+## `aaa/bbb/*.log.rmt`
+
+These log files are used by remotes that need to record their own state
+about keys. Each remote can store one line of data about a key, in
+its own format.
+
+Example:
+
+ 1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55 blah blah
+ 1287290767.478634s 26339d22-446b-11e0-9101-002170d25c55 foo=bar
+
+## `aaa/bbb/*.log.met`
+
+These log files are used to store arbitrary [[design/metadata]] about keys.
+Each key can have any number of metadata fields. Each field has a set of
+values.
+
+Lines are timestamped, and record when values are added (`field +value`),
+but also when values are removed (`field -value`). Removed values
+are retained in the log so that when merging an old line that sets a value
+that was later unset, the value is not accidentially added back.
+
+For example:
+
+ 1287290776.765152s tag +foo +bar author +joey
+ 1291237510.141453s tag -bar +baz
+
+The value can be completely arbitrary data, although it's typically
+reasonably short. If the value contains any whitespace
+(including \r or \r), it will be base64 encoded. Base64 encoded values
+are indicated by prefixing them with "!"
+
+## `schedule.log`
+
+Used to record scheduled events, such as periodic fscks.
+
+The file format is simply one line per repository, with the uuid followed by a
+space and then its schedule, followed by a timestamp.
+
+There can be multiple events in the schedule, separated by "; "
+
+The format of the scheduled events is the same described in
+the SCHEDULED JOBS section of the man page.
+
+Example:
+
+ 42bf2035-0636-461d-a367-49e9dfd361dd fsck self 30m every day at any time; fsck 4b3ebc86-0faf-4892-83c5-ce00cbe30f0a 1h every year at any time timestamp=1385646997.053162s
+
+## `transitions.log`
+
+Used to record transitions, eg by `git annex forget`
+
+Each line of the file is a transition, followed by a timestamp.
+
+Example:
+
+ ForgetGitHistory 1387325539.685136s
+ ForgetDeadRemotes 1387325539.685136s