Do the best you can until you know better. Then when you know better, do better. – Maya Angelou

Mr. Walker's Classroom Blog

GitHub Question of the Day

I am using this article in class and as it may disappear I am keeping a local copy stored here.

THE SOFTWARE THAT BUILDS SOFTWARE

167345106-580.jpg

Complex, specialized tools are often made from simpler ones. Machines are built with power drills; software is built with code editors. And so the future of computing depends partly on coding platforms in much the same way that the future of the movie industry depends on camera technology.

Over the past five years, a rapidly growing San Francisco company called GitHub has become a dominant player in software development, largely because it has fine-tuned the tools used for “version control,” which is the process of logging all the changes made to a set of documents. Programs are fragile enough that even a small change—a single misplaced semicolon, for example—might cause it to crash. GitHub keeps track of those semicolons, and who put them where.

Last year, GitHub’s financial projections and cultural influence were enough to secure a hundred-million-dollar investment from Andreessen Horowitz, the Silicon Valley venture-capital firm that previously invested in Twitter, Facebook, and Skype. (It is Andreessen Horowitz’s single biggest investment to date, valuing GitHub at roughly seven hundred and fifty million dollars.) Today, GitHub has three and a half million users working on close to seven million projects, and it grows by ten thousand users on a typical weekday. You can find projects written in almost any programming language in existence, from common Web development languages like P.H.P. and JavaScript to the ancient low-level language Fortran, and even LOLCODE, an absurdist programming syntax jokingly assembled from LOLcat captions.

The GitHub Web site is built around an independent piece of version-control software called Git, which was created by the developer Linus Torvalds. Even before Git, Torvalds was a nerd hero. He created Linux, which is one of the most important operating systems in the world. It runs on more kinds of hardware than any other computer operating system, and it is a dominant platform for Web servers and supercomputers; it is also widely deployed on common devices like wireless routers, cell phones, and even TiVos. It is the crown jewel of the open-source-software movement, a school of thought in which even the most valuable code in a program is freely shared so it can be collaboratively improved by many developers at once.

With so many programmers working on it, building Linux is incredibly complicated, and by 2005, Torvalds and his team had decided that the existing version-control systems were inadequate. They decided to write their own. The result, which was named Git after the insult in British slang, was much more powerful than the other available options.

In other development systems, there is often a hierarchy of gatekeepers who delegate access to chunks of code. In Git, every developer working on a project can have full access to every part of the code and its history. The best ideas and the strongest code for any aspect of the project can bubble up to the surface and be approved, no matter where they originate, dramatically flattening and democratizing the development workflow. “In order to avoid all those stupid political issues, you have to basically allow anybody to make changes, and then accept them based on technical merit after the fact, rather than on some pre-approval process,” Torvalds told me via e-mail.

After its release eight years ago, Git was quickly adopted by developers; its highly distributed model was far more powerful than existing version-control systems. “Source-code-management systems, like any social product, are extremely sticky,” Junio Hamano, the current lead developer of the Git project, wrote in an e-mail. “Your next system has to be ten times better than the system you are currently using in order to justify the cost of switching.”

As Git adoption grew, a developer named Tom Preston-Werner and a handful of collaborators saw the need for a site that made it easy to start new projects, duplicate existing ones, and run the most important Git functions from a desktop app or even a Web browser. The result, GitHub, was launched in 2008. It tremendously simplified Git, an inherently complex piece of software, and allowed coders to collaborate easily over the Internet, providing messaging and social features that would feel familiar to current users of social networks—for example, the ability to follow particular chunks of code in projects the way one might follow people on Twitter. The primary difference is that on GitHub the users share code, not photos and LOLcats.

Activity on GitHub takes place as transactions between “repositories,” which are essentially directories of code. Repositories can be cloned, changed, and then fluidly recombined, so coders can collaboratively build and experiment with software without complicating the development process: no matter what happens along the way, Git can almost always clean up the mess. For instance, HipHop, a tool used by Facebook to optimize its servers, has hundreds of clones in circulation on GitHub, with which several thousand followers are tinkering; any one of them could produce a breakthrough in the software.

Both projects and coders now call GitHub home base: many software projects now exist solely on GitHub, without an external site, and recruiters and employers are beginning to ask developers for profile links, much like an up-and-coming writer might be expected to provide a link to his or her blog.

While GitHub has greatly simplified and centralized software development, it comes with a price: it is now a single point of failure. The service recently suffered several wide-reaching outages that interfered with the ability of many developers to finalize changes to their code. Torvalds has repeatedly complained that its system for code annotations is too lax; the site may be training an entire generation of coders to leave behind weaker paper trails than truly complex software requires. And late last year, GitHub stopped allowing users to freely host large non-code files like videos; in a flash, all GitHub projects that needed anything other than core program code had to find another way to support it.

Strictly speaking, there’s nothing tying developers to GitHub—the clone-friendly nature of a Git repository makes it extremely portable, so migrating away from GitHub would be simple enough. But projects that aren’t on GitHub lose access to a critical mass of interested GitHub users and the stream of helpful code contributions that they might inject, dimming their chances of survival.

Thinking about GitHub as a social product rather than simply a software-development platform is perhaps the best way to consider its future: the big problem GitHub solves is about collaboration, not software. “For now this is about code, but we can make the burden of decision-making into an opportunity,” Preston-Werner, the site’s founder, told the New YorkTimes last year. “It would be useful if you could capture the process of decision-making, and see who suggested the decisions that created a law or a bill.”

Those seem like lofty aspirations, but in early May, the start-up RapGenius, which originally crowdsourced explanations of lyrics from hip-hop songs, announced its expansion into breaking news, using the same text-annotation platform it had been refining for years. RapGenius could prove useful for parsing complicated legal documents, for example, and if rap enthusiasts can build a useful platform for legal analysis, why can’t coders?

GitHub’s most revelatory moments still may be ahead of it, simply because software projects are not the only things that need efficient version control. Short of sitting down in person with a red pen, for instance, it’s often hard for a writer and editor to work on a document together such that both totally understand what’s being changed and why. Writers and editors can theoretically use GitHub to collaborate on documents and articles instead of relying on inline bracketed notes or the clunky revision-tracking and annotation features of Microsoft Word.

This is already happening for certain writing projects—mostly technical manuals and software documentation. One extremely popular GitHub project, called Jekyll, was started by Preston-Werner to use the site as the backbone for a collaborative blogging service not unlike Tumblr. While there are already Web sites and software packages that facilitate collaborative writing—like MixedInk, Google Docs, and MediaWiki—it’s rare that all parties will warm to one of these simultaneously, which is why the story you’re reading right now was repeatedly passed back and forth as an e-mail attachment.

“That’s what our vision is—to make it easier for people to work together than to work alone,” Preston-Werner told me. GitHub has already done this for modern software development. It eventually could be the site that changes the way we collaborate on other kinds of projects, as well. Or, if not, at least the tool used to build whatever piece of software finally does.