diff options
Diffstat (limited to 'doc/tips')
6 files changed, 305 insertions, 0 deletions
diff --git a/doc/tips/How_to_retroactively_annex_a_file_already_in_a_git_repo/comment_7_603db6818d33663b70b917c04fd8485b._comment b/doc/tips/How_to_retroactively_annex_a_file_already_in_a_git_repo/comment_7_603db6818d33663b70b917c04fd8485b._comment new file mode 100644 index 000000000..5527c2b43 --- /dev/null +++ b/doc/tips/How_to_retroactively_annex_a_file_already_in_a_git_repo/comment_7_603db6818d33663b70b917c04fd8485b._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="https://launchpad.net/~stephane-gourichon-lpad" + nickname="stephane-gourichon-lpad" + avatar="http://cdn.libravatar.org/avatar/02d4a0af59175f9123720b4481d55a769ba954e20f6dd9b2792217d9fa0c6089" + subject=""Hmm, guyz? Are you serious with these scripts?" Well, what's the matter?" + date="2016-11-15T10:58:32Z" + content=""" +## Wow, scary + +Dilyin's comment is scary. It suggests bad things can happen, but is not very clear. + +Bloated history is one thing. +Obviously broken repo is bad but can be (slowly) recovered from remotes. +Subtly crippled history that you don't notice can be a major problem (especially once you have propagated it to all your remotes to \"recover from bloat\"). + +## More common than it seems + +There's a case probably more common than people actually report: mistakenly doing `git add` instead of `git annex add` and realizing it only after a number of commits. Doing `git annex add` at that time will have the file duplicated (regular git and annex). + +Extra wish: when doing `git annex add` of a file that is already present in git history, `git-annex` could notice and tell. + +## Simple solution? + +Can anyone elaborate on the scripts provided here, are they safe? What can happen if improperly used or in corner cases? + +* \"files are replaced with symlinks and are in the index\" -> so what ? +* \"Make sure that you don't have annex.largefiles settings that would prevent annexing the files.\" -> What would happen? Also `.gitattributes`. + +Thank you. +"""]] diff --git a/doc/tips/How_to_retroactively_annex_a_file_already_in_a_git_repo/comment_8_834410421ccede5194bd8fbaccea8d1a._comment b/doc/tips/How_to_retroactively_annex_a_file_already_in_a_git_repo/comment_8_834410421ccede5194bd8fbaccea8d1a._comment new file mode 100644 index 000000000..2c36962aa --- /dev/null +++ b/doc/tips/How_to_retroactively_annex_a_file_already_in_a_git_repo/comment_8_834410421ccede5194bd8fbaccea8d1a._comment @@ -0,0 +1,82 @@ +[[!comment format=mdwn + username="StephaneGourichon" + avatar="http://cdn.libravatar.org/avatar/8cea01af2c7a8bf529d0a3d918ed4abf" + subject="Walkthrough of a prudent retroactive annex." + date="2016-11-24T11:27:59Z" + content=""" +Been using the one-liner. Despite the warning, I'm not dead yet. + +There's much more to do than the one-liner. + +This post offers instructions. + +# First simple try: slow + +Was slow (estimated >600s for 189 commits). + +# In tmpfs: about 6 times faster + +I have cloned repository into /run/user/1000/rewrite-git, which is a tmpfs mount point. (Machine has plenty of RAM.) + +There I also did `git annex init`, git-annex found its state branches. + +On second try I also did + + git checkout -t remotes/origin/synced/master + +So that filter-branch would clean that, too. + +There, `filter-branch` operation finished in 90s first try, 149s second try. + +`.git/objects` wasn't smaller. + +# Practicing reduction on clone + +This produced no visible benefit: + +time git gc --aggressive +time git repack -a -d + +Even cloning and retrying on clone. Oh, but I should have done `git clone file:///path` as said on git-filter-branch man page's section titled \"CHECKLIST FOR SHRINKING A REPOSITORY\" + +This (as seen on https://rtyley.github.io/bfg-repo-cleaner/ ) was efficient: + + git reflog expire --expire=now --all && git gc --prune=now --aggressive + +`.git/objects` shrunk from 148M to 58M + +All this was on a clone of the repo in tmpfs. + +# Propagating cleaned up branches to origin + +This confirmed that filter-branch did not change last tree: + + git diff remotes/origin/master..master + git diff remotes/origin/synced/master synced/master + +This, expectedly, was refused: + + git push origin master + git push origin synced/master + +On origin, I checked out the hash of current master, then on tmpfs clone + + git push -f origin master + git push -f origin synced/master + +Looks good. + +I'm not doing the aggressive shrink now, because of the \"two orders of magnitude more caution than normal filter-branch\" recommended by arand. + +# Now what? Check if precious not broken + +I'm planning to do the same operation on the other repos, then : + +* if everything seems right, +* if `git annex sync` works between all those fellows +* etc, +* then I would perform the reflog expire, gc prune on some then all of them, etc. + +Joey, does this seem okay? Any comment? + +"""]] diff --git a/doc/tips/a_gui_for_metadata_operations.mdwn b/doc/tips/a_gui_for_metadata_operations.mdwn new file mode 100644 index 000000000..1e1180068 --- /dev/null +++ b/doc/tips/a_gui_for_metadata_operations.mdwn @@ -0,0 +1,13 @@ +Hey everyone. + +I wrote a GUI for git-annex metadata in Python: [git-annex-metadata-gui](https://github.com/alpernebbi/git-annex-metadata-gui). +It shows the files that are in the current branch (only those in the annex) in the respective folder hierarchy. +The keys that are in the repository, but not in the current branch are also shown in another tab. +You can view, edit or remove fields for individual files with support for multiple values for fields. +There is a file preview for image and text files as well. +I uploaded some screenshots in the repository to show it in action. + +While making it, I decided to move the git-annex calls into its own Python package, +which became [git-annex-adapter](https://github.com/alpernebbi/git-annex-adapter). + +I hope these can be useful to someone other than myself as well. diff --git a/doc/tips/a_gui_for_metadata_operations/comment_1_1ce311d8328ea370a6a3494adea0f5db._comment b/doc/tips/a_gui_for_metadata_operations/comment_1_1ce311d8328ea370a6a3494adea0f5db._comment new file mode 100644 index 000000000..2a55de0be --- /dev/null +++ b/doc/tips/a_gui_for_metadata_operations/comment_1_1ce311d8328ea370a6a3494adea0f5db._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2016-12-07T19:58:11Z" + content=""" +Thank you for this, I've always wanted such a GUI, and it's been a common +user request! +"""]] diff --git a/doc/tips/peer_to_peer_network_with_tor.mdwn b/doc/tips/peer_to_peer_network_with_tor.mdwn new file mode 100644 index 000000000..0fdc34625 --- /dev/null +++ b/doc/tips/peer_to_peer_network_with_tor.mdwn @@ -0,0 +1,163 @@ +git-annex has recently gotten support for running as a +[Tor](https://torproject.org/) hidden service. This is a nice secure +and easy to use way to connect repositories in different +locations. No account on a central server is needed; it's peer-to-peer. + +## dependencies + +To use this, you need to get Tor installed and running. See +[their website](https://torproject.org/), or try a command like: + + sudo apt-get install tor + +You also need to install [Magic Wormhole](https://github.com/warner/magic-wormhole). + + sudo apt-get install magic-wormhole + +## pairing two repositories + +You have two git-annex repositories on different computers, and want to +connect them together over Tor so they share their contents. Or, you and a +friend want to connect your repositories together. Pairing is an easy way +to accomplish this. + +In each git-annex repository, run these commands: + + git annex enable-tor + git annex remotedaemon + +The enable-tor command may prompt for the root password, since it +configures Tor. Now git-annex is running as a Tor hidden service, but +it will only talk to peers after pairing with them. + +In both repositories, run this command: + + git annex p2p --pair + +This will print out a pairing code, like "11-incredible-tumeric", +and prompt for you to enter the other repository's pairing code. + +Once the pairing codes are exchanged, the two repositories will be securely +connected to one-another via Tor. Each will have a git remote, with a name +like "peer1", which connects to the other repository. + +Then, you can run commands like `git annex sync peer1 --content` to sync +with the paired repository. + +Pairing connects just two repositories, but you can repeat the process to +pair with as many other repositories as you like, in order to build up +larger networks of repositories. + +## how to exchange pairing codes + +When pairing with a friend's repository, you have to exchange +pairing codes. How to do this securely? + +The pairing codes can only be used once, so it's ok to exchange them in +a way that someone else can access later. However, if someone can overhear +your exchange of codes in real time, they could trick you into pairing +with them. + +Here are some suggestions for how to exchange the codes, +with the most secure ways first: + +* In person. +* In an encrypted message (gpg signed email, Off The Record (OTR) + conversation, etc). +* By a voice phone call. + +## starting git-annex remotedaemon on boot + +Notice the `git annex remotedaemon` being run in the above examples. +That command runs the Tor hidden service so that other peers +can connect to your repository over Tor. + +So, you may want to arrange for the remotedaemon to be started on boot. +You can do that with a simple cron job: + + @reboot cd ~/myannexrepo && git annex remotedaemon + +If you use the git-annex assistant, and have it auto-starting on boot, it +will take care of starting the remotedaemon for you. + +## speed of large transfers + +Tor prioritizes security over speed, and the Tor network only has so much +bandwidth to go around. So, distributing large quantities (gigabytes) +of data over Tor may be slow, and should probably be avoided. + +One way to avoid sending much data over tor is to set up an encrypted +[[special_remote|special_remotes]] someplace. git-annex knows that Tor is +rather expensive to use, so if a file is available on a special remote as +well as over Tor, it will download it from the special remote. + +You can contribute to the Tor network by +[running a Tor relay or bridge](https://www.torproject.org/getinvolved/relays.html.en). + +## onion addresses and authentication + +You don't need to know about this, but it might be helpful to understand +how it works. + +git-annex's Tor support uses onion address as the address of a git remote. +You can `git pull`, push, etc with those onion addresses: + + git pull tor-annnex::eeaytkuhaupbarfi.onion:4412 + git remote add peer1 tor-annnex::eeaytkuhaupbarfi.onion:4412 + +Onion addresses are semi-public. When you add a remote, they appear in your +`.git/config` file. For security, there's a second level of authentication +that git-annex uses to make sure that only people you want to can access +your repository over Tor. That takes the form of a long string of numbers +and letters, like "7f53c5b65b8957ef626fd461ceaae8056e3dbc459ae715e4". + +The addresses generated by `git annex peer --gen-addresses` +combine the onion address with the authentication data. + +When you run `git annex peer --link`, it sets up a git remote using +the onion address, and it stashes the authentication data away in a file in +`.git/annex/creds/` + +When you pair repositories, these addresses are exchanged using +[Magic Wormhole](https://github.com/warner/magic-wormhole). + +## security + +Tor hidden services can be quite secure. But this doesn't mean that using +git-annex over Tor is automatically perfectly secure. Here are some things +to consider: + +* Anyone who learns the address of a peer can connect to that peer, + download the whole history of the git repository, and any available + annexed files. They can also upload new files to the peer, and even + remove annexed files from the peer. So consider ways that the address + of a peer might be exposed. + +* While Tor can be used to anonymize who you are, git defaults to including + your name and email address in git commit messages. So if you want an + anonymous git-annex repository, you'll need to configure git not to do + that. + +* Using Tor prevents listeners from decrypting your traffic. But, they'll + probably still know you're using Tor. Also, by traffic analysis, + they may be able to guess if you're using git-annex over tor, and even + make guesses about the sizes and types of files that you're exchanging + with peers. + +* There have been past attacks on the Tor network that have exposed + who was running Tor hidden services. + <https://blog.torproject.org/blog/tor-security-advisory-relay-early-traffic-confirmation-attack> + +* An attacker who can connect to the git-annex Tor hidden service, even + without authenticating, can try to perform denial of service attacks. + +* Magic wormhole is pretty secure, but the code phrase could be guessed + (unlikely) or intercepted. An attacker gets just one chance to try to enter + the correct code phrase, before pairing finishes. If the attacker + successfully guesses/intercepts both code phrases, they can MITM the + pairing process. + + If you don't want to use magic wormhole, you can instead manually generate + addresses with `git annex p2p --gen-addresses` and send them over an + authenticated, encrypted channel (such as OTR) to a friend to add with + `git annex p2p --link`. This may be more secure, if you get it right. diff --git a/doc/tips/using_Google_Cloud_Storage/comment_8_1b4eb7e0f44865cd5ff0f8ef507d99c1._comment b/doc/tips/using_Google_Cloud_Storage/comment_8_1b4eb7e0f44865cd5ff0f8ef507d99c1._comment new file mode 100644 index 000000000..1a71f7726 --- /dev/null +++ b/doc/tips/using_Google_Cloud_Storage/comment_8_1b4eb7e0f44865cd5ff0f8ef507d99c1._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="scottgorlin@a32946b2aad278883c1690a0753241583a9855b9" + nickname="scottgorlin" + avatar="http://cdn.libravatar.org/avatar/2dd1fc8add62bbf4ffefac081b322563" + subject="Coldline" + date="2016-11-21T00:49:23Z" + content=""" +Wanted to add that \"storageclass=COLDLINE\" appears to work seamlessly, both from my mac and arm NAS. As far as I can tell, this appears to be a no-brainer vs glacier - builtin git annex client, simpler/cheaper billing, and no 4 hour delay! +"""]] |