Wednesday 25 January 2012

Version Control System (VCS)


This guide is purposefully high-level: most tutorials throw a bunch of text commands at you. Let’s cover the high-level concepts without getting stuck in the syntax (the Subversion manual is always there, don’t worry). Sometimes it’s nice to see what’s possible.
Checkins
The simplest scenario is checking in a file (list.txt) and modifying it over time.
clip_image002
Each time we check in a new version, we get a new revision (r1, r2, r3, etc.). In Subversion you’d do:
  • svn add list.txt (odify the file)
  • svn ci list.txt -m "Changed the list"
  • The -m flag is the message to use for this checkin.
Checkouts and Editing
In reality, you might not keep checking in a file. You may have to check out, edit and check in. The cycle looks like this:
clip_image003
If you don’t like your changes and want to start over, you can revert to the previous version and start again (or stop). When checking out, you get the latest revision by default. If you want, you can specify a particular revision. In Subversion, run:
  • svn co list.txt (get latest version)
  • edit file...
  • svn revert list.txt (throw away changes)
  • svn co -r2 list.txt (check out particular version)
Diffs
The trunk has a history of changes as a file evolves. Diffs are the changes you made while editing: imagine you can “peel” them off and apply them to a file:
clip_image004
For example, to go from r1 to r2, we add eggs (+Eggs). Imagine peeling off that red sticker and placing it on r1, to get r2.
And to get from r2 to r3, we add Juice (+Juice). To get from r3 to r4, we remove Juice and add Soup (-Juice, +Soup).
Most version control systems store diffs rather than full copies of the file. This saves disk space: 4 revisions of a file doesn’t mean we have 4 copies; we have 1 copy and 4 small diffs. Pretty nifty, n SVN, we diff two revisions of a file like this:
svn diff -r3:4 list.txt
Diffs help us notice changes (“How did you fix that bug again?”) and even apply them from one branch to another.
Bonus question: what’s the diff from r1 to r4?
+Eggs
+Soup
Notice how “Juice” wasn’t even involved — the direct jump from r1 to r4 doesn’t need that change, since Juice was overridden by Soup.
Branching
Branches let us copy code into a separate folder so we can monkey with it separately:
clip_image005
For example, we can create a branch for new, experimental ideas for our list: crazy things like Rice or Eggo waffles. Depending on the version control system, creating a branch (copy) may change the revision number.
Now that we have a branch, we can change our code and work out the kinks. (“Hrm… waffles? I don’t know what the boss will think. Rice is a safe bet.”). Since we’re in a separate branch, we can make changes and test in isolation, knowing our changes won’t hurt anyone. And our branch history is under version control.
In Subversion, you create a branch simply by copying a directory to another.
svn copy http://path/to/trunk http://path/to/branch
So branching isn’t too tough of a concept: Pretend you copied your code into a different directory. You’ve probably branched your code in school projects, making sure you have a “fail safe” version you can return to if things blow up.
Merging
Branching sounds simple, right? Well, it’s not — figuring out how to merge changes from one branch to another can be tricky.
Let’s say we want to get the “Rice” feature from our experimental branch into the mainline. How would we do this? Diff r6 and r7 and apply that to the main line?
Wrongo. We only want to apply the changes that happened in the branch!. That means we diff r5 and r6, and apply that to the main trunk:
clip_image006
If we diffed r6 and r7, we would lose the “Bread” feature that was in main. This is a subtle point — imagine “peeling off” the changes from the experimental branch (+Rice) and adding that to main. Main may have had other changes, which is ok — we just want to insert the Rice feature.
In Subversion, merging is very close to diffing. Inside the main trunk, run the command:
svn merge -r5:6 http://path/to/branch
This command diffs r5-r6 in the experimental branch and applies it to the current location. Unfortunately, Subversion doesn’t have an easy way to keep track of what merges have been applied, so if you’re not careful you may apply the same changes twice. It’s a planned feature, but the current advice is to keep a changelog message reminding you that you’ve already merged r5-r6 into main.
Conflicts
Many times, the VCS can automatically merge changes to different parts of a file. Conflicts can arise when changes appear that don’t gel: Joe wants to remove eggs and replace it with cheese (-eggs, +cheese), and Sue wants to replace eggs with a hot dog (-eggs, +hot dog).
clip_image007
At this point it’s a race: if Joe checks in first, that’s the change that goes through (and Sue can’t make her change).
When changes overlap and contradict like this, the VCS may report a conflict and not let you check in — it’s up to you to check in a newer version that resolves this dilemma. A few approaches:
  • Re-apply your changes. Sync to the the latest version (r4) and re-apply your changes to this file: Add hot dog to the list that already has cheese.
  • Override their changes with yours. Check out the latest version (r4), copy over your version, and check your version in. In effect, this removes cheese and replaces it with hot dog.
Conflicts are infrequent but can be a pain. Usually I update to the latest and re-apply my changes.
Tagging
Who would have thought a version control system would be Web 2.0 compliant? Many systems let you tag (label) any revision for easy reference. This way you can refer to “Release 1.0″ instead of a particular build number:
clip_image008
In Subversion, tags are just branches that you agree not to edit; they are around for posterity, so you can see exactly what your version 1.0 release contained. Hence they end in a stub — there’s nowhere to go.
(in trunk)
svn copy http://path/to/revision http://path/to/tag
Real-life example: Managing Windows Source Code
We guessed that Windows was managed out of a shared folder, but it’s not the case. So how’s it done?
  • There’s a main line with stable builds of Windows.
  • Each group (Networking, User Interface, Media Player, etc.) has its own branch to develop new features. These are under development and less stable than main.
You develop new features in your branch and “Reverse Integrate (RI)” to get them into Main. Later, you “Forward Integrate” and to get the latest changes from Main into your branch:
clip_image009
Let’s say we’re at Media Player 10 and IE 6. The Media Player team makes version 11 in their own branch. When it’s ready and tested, there’s a patch from 10 – 11 which is applied to Main (just like the “Rice” example, but a tad more complicated). This a reverse integration, from the branch to the trunk. The IE team can do the same thing.
Later, the Media Player team can pick up the latest code from other teams, like IE. In this case, Media Player forward integrates and gets the latest patches from main into their branch. This is like pulling in the “Bread” feature into the experimental branch, but again, more complicated.
So it’s RI and FI. Aye aye. This arrangement lets changes percolate throughout the branches, while keeping new code out of the main line. Cool, eh?
In reality, there’s many layers of branches and sub-branches, along with quality metrics that determine when you get to RI. But you get the idea: branches help manage complexity. Now you know the basics of how one of the largest software projects are organized.

No comments:

Post a Comment