Sunday
24Jan2010

Submodules and Subrepos Done Right

An approach to managing Git or Mercurial sub-repos easily, safely, and simply, while allowing you to embed Git projects in a Mercurial repo and vice-versa.

Background

Most software projects rely on other software projects to function. For example, Nitrogen depends on SimpleBridge, Coverize, and Mochiweb or Yaws. Riak depends on Webmachine and Mochiweb.

In the name of simplicity and ease of use, it's generally a good idea for the parent repo to contain the source code of any sub-projects it uses.

But then you are faced with a decision:

Should the parent project include the full history of the sub-project as well?

You currently have three options, all with tradeoffs:

  1. Remove revision history in your sub-projects by deleting the .git/.hg directory, and experience pain when you want to pull the latest updates or commit a patch on the sub-project.

  2. Track the entire .git/.hg directory for each sub-project, and accept that the .git/.hg directory of your parent project will now be huge.

  3. Use Git submodules (or Mercurial subrepos), and hope that you never have to include a Git project inside of a Mercurial project, or vice versa. Also, accept the unfortunate fact that your build process now requires a working Internet connection.

The Search for a Better Way

What do I really want out of sub-repo support?

  • I want the sub-repos to seem like part of the parent project to the user, while still seeming like distinct repositories to me.

  • I want to be able to work with the full history of the sub-repo, but I don't want this included in the parent repo, or sent out to anyone who downloads the code.

  • I want an easy process for contributors to get the history of the sub-repos, so that they can commit patches.

Furthermore:

  • I only want to use tested, core features of Git or Mercurial.

  • I want the solution to be "cross platform", so that I can stick Git repos in Mercurial and vice versa.

  • I want the solution to be simple to use and easy to understand for the most common use case. (In other words, hide complexity from the non power-users.)

Introducing subgit and subhg

After much thought and frustration, I think I've finally found a solution that meets all of my needs. It lets me work with the parent repo as I normally would, using the git or hg command. Furthermore, it gives me a different command to work with the sub-repos. Finally, it is cross platform, allowing me to mix and match Git and Mercurial projects. The only downside is that a contributor needs to jump through an extra hoop or two in order to get the history of a sub-repo.

In practice, the solution looks like this:

Change to the directory of a sub-project...

> cd ParentProject/SubProject1

Operate on the parent project...

> git status
...status info...

> git commit
...commit code...

Operate on the sub-project. Notice the use of the 'subgit' command...

> subgit status
...sub-project status info...

> subgit commit
...commit sub-project code...

Best of all, the subgit and subhg commands are just thin wrappers around git and hg. Each is about 25 lines of shell script.

Installation

To try this out on your computer:

  1. Save the following scripts to a location in your PATH: subgit and subhg. (Remember to run chmod 755 to make them executable.)

  2. Create a global excludes file for both Git and Mercurial, and add .subgit and .subhg to it.

    ~/.gitconfig

    [core]
    excludesfile = "~/.gitignore"
    

    ~/.gitignore

    .subgit
    .subhg
    

    ~/.hgrc

    [ui]
    ignore=~/.hgignore
    

    ~/.hgignore

    .subgit
    .subhg
    

That's all!

How Does It Work?

The core concept behind this approach is to store the version history of your sub-project in a non-standard directory, and then use special wrapper scripts when you want Git or Mercurial to operate against that directory.

In other words, your projects won't have a .git or .hg directory. Instead, they will have a .subgit or .subhg directory, which is not tracked by the parent repo.

|--ParentProject    <-- a Git repository
   |
   |--.git
   |
   |--SubProject1   <-- a Git sub-repo
   |  |--.subgit    
   |
   |--SubProject2
   |  |--.subhg     <-- a Mercurial sub-repo
   |
   |--src

This tricks Git or Mercurial into tracking the files inside of your sub-repo, even though the files actually belong to a different repository. (Normally Git won't track a Git repo nested inside of another Git repo.)

The wrapper scripts--subgit and subhg--do the heavy lifting to make Git or Mercurial use the .subgit and .subhg directories.

subgit simply searches upward for the closest parent directory that contains a .subgit directory. Once found, it calls git, telling git to use the .subgit directory for repository information. subhg works the same way.

Usage

To create a sub-repo that can be managed with subgit or subhg:

  1. Inside of an existing, clone a repository like normal:

    git clone git://hostname.com/repository.git sub_project
    
  2. Then, change to the new repository's directory and run subgit setup. This simply renames .git to .subgit:

    cd sub_project
    subgit setup
    
  3. Now, test that it worked by viewing the parent's Git log and the sub-repo's git log:

    git log
    ...print out log for the parent project...
    
    
    subgit log
    ...print out log for the sub-project...
    

subhg works the same way.

Some Final Thoughts

First, this approach is intended for all of the projects out there using GitHub, BitBucket, Google Code, etc. as their main distribution channel. Most of these projects have a small group of contributors, and a much larger group of users.

If you distribute your project via a tar'd, gzip'd file, then this blog post is not for you.

Second, in order for other contributors to submit patches to the sub-project code, they will first need to obtain the full history of the sub-project. (Which makes sense, because the whole point of this was to NOT transfer the full history during a clone.)

As far as I know, the best approach to get the history is to just pull it from the sub-project's remote URL into a tmp directory:

git clone git://hostname.com/repository.git tmp
mv tmp/.git sub_project/.subgit
rm -rf tmp

-or-

hg clone http://hostname.com/repo/path/ tmp
mv tmp/.hg sub_project/.subhg
rm-rf tmp

Then, switch to the sub_project directory and checkout the right version. (This assumes your sub-project is in a directory named sub_project.)

Downloads:

Saturday
19Dec2009

Nitrogen/Riak Video from EUC2009, Stockholm

The fine folks over at Erlang Solutions, Ltd. just released the video of my talk "Nitrogen and Riak by Example" from Erlang User Conference 2009 in Stockholm.

Saturday
05Dec2009

Nitrogen, Riak, and 1,000 Lines of Erlang

UPDATE: See the video of my talk on Nitrogen, Riak, and SlideBlast.com.


Check out SlideBlast.com, a tool I created that lets you share and control a slide presentation on the web. SlideBlast was built using Nitrogen and Riak, and is an example of exactly how much you can do with the right tools and 1,000 lines of code. (Ok, it's more like 1,130 lines, but who's counting?)

The full source code is available on GitHub: http://github.com/rklophaus/SlideBlast

How it Works

  1. Upload a .pdf or .zip file.
  2. Share a link with your remote audience.
  3. Start presenting. As you flip through slides, your attendees' slides change, too.

Components

SlideBlast runs on Erlang and uses the following projects:

  • Nitrogen to create a slick, Comet-based user interface.
  • Riak to store slide show data (including the images) with a flexible schema.
  • Mochiweb as the underlying HTTP server.
  • Ghostscript to split and convert .pdf files into images.
  • Imagemagick to create image thumbnails.
  • SyntaxHighlighter to transform uploaded code files into beautiful HTML.
    (Try uploading an .erl, .cs, .cpp, .js, .java, .sh, or .sql file!)

Why I Built SlideBlast.com

I built SlideBlast for my talk Nitrogen and Riak by Example, presented at the Erlang User Conference 2009 in Stockholm, Sweden. In the presentation, I briefly cover both Nitrogen and Riak, and then describe some of the techniques used to build SlideBlast. The video should be online soon, check it out.

Thursday
03Dec2009

Blogging to SquareSpace with TextMate

TextMate has a Blogging Bundle. It allows you to edit and update blog posts from within Textmate and works with any blog that uses the MetaWebLog API. Unfortunately, it failed to work correctly with my SquareSpace blog, returning this cryptic error:

    Error: Error parsing request: Malformed request: \
    com.squarespace.framework.ResourceNotFoundException: Invalid Object Reference (2)

After a little debugging, I realized that it works if you include the ID of your SquareSpace blog in the endpoint XML-RPC URL. For SquareSpace, your XML-RPC endpoint URL should look like this:

    http://www.squarespace.com/do/process/external/PostInterceptor#BLOG_ID

To figure out the correct value for BLOG_ID, go to the url of your SquareSpace journal (for example, mine is http://rklophaus.com/blog), view source, and search for "CURRENT_MODULE_ID". You will see a line that says:

    Squarespace.Constants.CURRENT_MODULE_ID = "<YOUR BLOG ID>";

This is the ID that you should use. The ID for this blog is 3142087, so my endpoint looks like this:

    http://www.squarespace.com/do/process/external/PostInterceptor#3142087
Tuesday
01Dec2009

The Bilski Case and Software Patents

Something happened in the second week of November that could forever change the face of the software industry. (No, I'm not referring to the release of Go Lang.)

On November 9th, the Supreme Court heard oral arguments in re Bilski. (In re Bilski means "in the matter of Bilski." I looked it up for you.)

The outcome of this case determines the future of software patents.

What's a "Bilski"?

Back in 1997, Bilski and Warsaw filed a patent for managing risk in commodities trading through hedging. Using their process, for example, an oil company would be able to offer a locked-in rate to customers, and do some fancy purchasing behind the scenes to cover their asses so that if oil prices spiked, they would still turn a reasonable profit.

The US Patent Office (USPTO) denied the patent because it did not pass the "machine-or-transformation" test. This test says that for a business process to be patentable, it must either be implemented with a particular machine designed or adapted to carry out the process, OR it must transform/reduce an article into a different state.

The problem with the USPTO's decision is that it contradicted precedent set by the Federal Circuit Court.

"Ruh Roh"

Ruh Roh indeed, Scoob. The Federal Circuit Court previously held in State Street, that an invention is patentable if "it produces a useful, concrete and tangible result" and applies to a specific application, tests that Bilski's patent should have passed.

The Bilski team appealed to the Board of Patent Appeals, with no luck. So they appealed again to the Federal Circuit Court, who you would think would quickly overrule the Board's decision, citing State Street. The Circuit Court, however, agreed with the USPTO, saying in effect that they were wrong before, that the State Street ruling no longer applied, and that the machine-or-transformation test is now the one true test.

Interestingly, the Circuit Court included language in the decision stating more or less "holy crap guys, ever since the State Street decision we've been overwhelmed with business process patents, these damn patents are getting more and more abstract, and we need to streamline this process lest we get buried in paperwork."

When all else fails, talk to "Los Jefes"

Bilski appealed again, to the Supreme Court, which granted certiorari in June. (Certiorari means that they would hear the case. Again, I looked it up for you.) The surprisingly readable oral arguments occurred on November 9th, which brings us to the present day.

(See a timeline of these events.)

So, how does this relate to software?

Software was seen as patentable as a business process because it produced a useful, concrete and tangible result. By discarding the State Street precedent, the Federal Circuit Court questioned the foundation upon which business process patents, and thus software patents, are built.

If the Supreme Court upholds the Federal Circuit Court's decision, it means that all existing US software patents will be very, very suspect, and RMS will be a happy, happy man.

If the Supreme Court completely overrules the Federal Circuit Court's decision, it will mean we keep the status quo, a continuation of the muddy quagmire that surrounds software patents today.

This is a black-and-white summary of the outcomes, of course. It is likely that the Supreme Court will shoot for the gray area in between.

Don't Forget The Economy

As exciting as it may be, the Supreme Court won't shake things up too much. Technology is one of the few industries in the United States that hasn't been gutted by the recession. At best, sweeping away software patents will plant some serious concerns in the minds of investors. At worst, their concerns will be justified. I suspect that of all of the factors the Supreme Court considers, the health of the economy will be what guides them the most.

It's Broke, Fix It

Rather than rehash all of the arguments for and against software patents, I'll provide you with a Google search for software patent debate.

To be honest, I'm still not sure which side I support on the issue. I do know that software patents--in their current form--are broken, because:

  • Software patents alone don't do much. You need a patent plus gobs of money.
  • A good chunk of innovation in software industry comes from teams of smart people at startup companies.
  • Startup companies don't have gobs of money, and if they did, they'd prefer to spend it building things, not fiddling with patents.

Can't wait to hear what the Supreme Court decides.