2. Starting a Project

With very little argument, the beginning is the most difficult period in a project's life to do successful free software project management. Laying a firm foundation will determine whether your project flourishes or withers away and dies. It is also the subject that is of most immediate interest to anyone reading this document as a tutorial.

Starting a project involves a dilemma that you as a developer must try and deal with: no potential user for your program is interested in a program that doesn't work, while the development process that you want to employ holds involvement of users as imperative.

It is in these dangerous initial moments that anyone working to start a free software project must try and strike a balance along these lines. One of the most important ways that someone trying to start a project can work toward this balance is by establishing a solid framework for the development process through some of the suggestions mentioned in this section.

2.1. Choosing a Project

If you are reading this document, there's a good chance you already have an idea for a project in mind. Chances are also pretty good that it fills a perceived gap by doing something that no other free software project does or by doing something in a way that is unique enough to necessitate a brand new piece of software.

2.1.1. Identify and articulate your idea

Eric S. Raymond writes about how free software projects start in his essay, "The Cathedral and the Bazaar," which comes as required reading for any free software developer. It is available online .

In "The Cathedral and the Bazaar," Raymond tells us that: "every good work of software starts by scratching a developers itch." Raymond's now widely accepted hypothesis is that new free software programs are written, first and foremost, to solve a specific problem facing the developer.

If you have an idea for a program in mind, chances are good that it targets a specific problem or "itch" you want to see scratched. This idea is the project. Articulate it clearly. Write it out. Describe the problem you will attack in detail. The success of your project in tackling a particular problem will be tied to your ability to identify that problem clearly early on. Find out exactly what it is that you want your project to do.

Monty Manley articulates the importance of this initial step in an essay, "Managing Projects the Open Source Way." As the next section will show, there is a lot of work that needs to be done before software is even ready to be coded. Manley says, "Beginning an OSS project properly means that a developer must, first and foremost, avoid writing code too soon!"

2.1.2. Evaluate your idea

In evaluating your idea, you need to first ask yourself a few questions. This should happen before you move any further through this HOWTO. Ask yourself: Is the free software development model really the right one for your project?

Obviously, since the program scratches your itch, you are definitely interested in seeing it implemented in code. But, because one hacker coding in solitude fails to qualify as a free software development effort, you need to ask yourself a second question: Is anybody else interested?

Sometimes the answer is a simple "no." If you want to write a set of scripts to sort your MP3 collection on your machine, maybe the free software development model is not the best one to choose. However, if you want to write a set of scripts to sort anyone's MP3s, a free software project might fill a useful gap.

Luckily, the Internet is a place so big and so diverse that, chances are, there is someone, somewhere, who shares your interests and who feels the same "itch." It is the fact that there are so many people with so many similar needs and desires that introduces the third major question: Has somebody already had your idea or a reasonably similar one?

2.1.2.1. Finding Similar Projects

There are places you can go on the web to try and answer the question above. If you have experience with the free software community, you are probably already familiar with many of these sites. All of the resources listed below offer searching of their databases:

freshmeat.net

freshmeat.net describes itself as, "the Web's largest index of Linux and Open Source software" and its reputation along these lines is totally unparalleled and unquestioned. If you can't find it on freshmeat, its doubtful that you (or anyone else) will find it at all.

Slashdot

Slashdot provides "News for Nerds. Stuff that matters," which usually includes discussion of free software, open source, technology, and geek culture news and events. It is not unusual for a particularly sexy development effort to be announced here, so it is definitely worth checking.

SourceForge

SourceForge houses and facilitates a growing number of open source and free software projects. It is also quickly becoming a nexus and a necessary stop for free software developers. SourceForge's software map and new release pages should be necessary stops before embarking on a new free software project. SourceForge also provides a Code Snippet Library which contains useful reusable chunks of code in an array of languages which can come in useful in any project.

Google and Google's Linux Search

Google and Google's Linux Search, provides powerful web searches that may reveal people working on similar projects. It is not a catalog of software or news like freshmeat or Slashdot, but it is worth checking to make sure you aren't pouring your effort into a redundant project.

2.1.2.2. Deciding to Proceed

Once you have successfully charted the terrain and have an idea about what kinds of similar free software projects exist, every developer needs to decide whether to proceed with their own project. It is rare that a new project seeks to accomplish a goal that is not at all similar or related to the goal of another project. Anyone starting a new project needs to ask themselves: "Will the new project be duplicating work done by another project? Will the new project be competing for developers with an existing project? Can the goals of the new project be accomplished by adding functionality to an existing project?"

If the answer to any of these questions is "yes," try to contact the developer of the existing project(s) in question and see if he or she might be willing to collaborate with you.

For many developers this may be the single most difficult aspect of free software project management, but it is an essential one. It is easy to become fired up by an idea and get caught up in the momentum and excitement of a new project. It is often extremely difficult to do, but it is important that any free software developer remembers that the best interests of the free software community and the quickest way to accomplish your own project's goals and the goals of similar projects can often be accomplished by not starting a new development effort.

2.2. Naming your project

While there are plenty of projects that fail with descriptive names and plenty that succeed without them, I think naming your project is worth giving a bit of thought. Leslie Orchard tackles this issue in an Advogato article. His article is short and definitely worth looking over quickly.

The synopsis is that Orchard recommends you pick a name where, after hearing the name, many users or developers will both:

Humorously, Orchard's project, "Iajitsu," does neither. It is probably unrelated that development has effectively frozen since the article was written.

He makes a good point though. There are companies whose only job is to make names for pieces of software. They make ridiculous amount of money doing it and are supposedly worth it. While you probably can't afford a company like this, you can afford to learn from their existence and think a little bit about the name you are giving your project because it does matter.

If there is a name you really want but it doesn't fit Orchard's criteria, you can still go ahead. I thought "gnubile" was one of the best I'd heard for a free software project ever and I still talk about it long after I've stopped using the program. However, if you can be flexible on the subject, listen to Orchard's advice. It might help you.

2.3. Licensing your Software

On one (somewhat simplistic) level, the difference between a piece of free software and a piece of propriety software is the license. A license helps you as the developer by protecting your legal rights to have your software distributed under your terms and helps demonstrate to those who wish to help you or your project that they are encouraged to join.

2.3.1. Choosing a license

Any discussion of licenses is also sure to generate at least a small flame war as there are strong feelings that some free software licenses are better than others. This discussion also brings up the question of "Open Source Software" and the debate over the terms "Open Source Software" and "Free Software". However, because I've written the Free Software Project Management HOWTO and not the Open Source Software Project Management HOWTO, my own allegiances in this argument are in the open.

In attempting to reach a middle ground through diplomacy without sacrificing my own philosophy, I will recommend picking any license that conforms to the Debian Free Software Guidelines. Originally compiled by the Debian project under Bruce Perens, the DFSG forms the first version of the Open Source Definition. Examples of free licenses given by the DFSG are the GPL, the BSD, and the Artistic License. As ESR mentions in his HOWTO[ESRHOWTO], don't write your own license if at all possible. The three licenses I mention all have long interpretive traditions. They are also definitely free software (and can therefore be distributed as part of Debian and in other places that permit the transfer of free software).

Conforming to the definition of free software offered by Richard Stallman in "The Free Software Definition", any of these licenses will uphold, "users' freedom to run, copy, distribute, study, change and improve the software." There are plenty of other licenses that also conform to the DFSG but sticking with a more well-known license will offer the advantage of immediate recognition and understanding. Many people write three or four sentences in a COPYING file and assume that they have written a free software license--as my long experience with the debian-legal mailing professes, this is very often not the case.

In attempting a more in-depth analysis, I agree with Karl Fogel's description of licenses as falling into two groups: those that are the GPL and those that are not the GPL.

Personally, I license all my software under the GPL. Created and protected by the Free Software Foundation and the GNU Project, the GPL is the license for the Linux kernel, GNOME, Emacs, and the vast majority of GNU/Linux software. It's the obvious choice but I also believe it is a good one. Any BSD fanatic will urge you to remember that there is a viral aspect to the GPL that prevents the mixture of GPL'ed code with non-GPL'ed code. To many people (myself included), this is a benefit, but to some, it is a major drawback.

Many people write three or four sentences in a COPYING file and assume that they have written a free software license--as my long experience with the debian-legal mailing professes, this is very often not the case. It may not protect you, it may not protect your software, and it may make things very difficult for people that want to use your software but who pay a lot of attention to the subtle legal points of licenses. If you are passionate about a home-brewed license, run it by either people at OSI or the debian-legal mailing list first protect yourself from unanticipated side-effects of your license.

The three major licenses can be found at the following locations:

In any case, please read through any license before your release your software under it. As the primary developer, you can't afford any license surprises.

2.3.2. The mechanics of licensing

The text of the GPL offers a good description of the mechanics of applying a license to a piece of software. My quick checklist for applying a license includes:

  • Make yourself or the FSF the copyright holder for the work. In a few rare cases, you might want to make a sponsoring organization (if it's big and powerful enough) the copyright holder instead. Doing this is as simple as putting the name in the blank when you modify the notice of copyright below. Contrary to popular belief, you don't need to file with any organization. The notice alone is enough to copyright your work.

  • If at all possible, attach and distribute a full copy of the license with the source and binary by including a separate file.

  • At the top of each source file in your program, attach a notice of copyright and include information on where the full license can be found. The GPL recommends that each file begin with:

    one line to give the program's name and an idea of what it does.
    Copyright (C) yyyy  name of author
    
    This program is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License
    as published by the Free Software Foundation; either version 2
    of the License, or (at your option) any later version.
    
    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.
    
    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
           

    The GPL goes on to recommend attaching information on methods for contacting you (the author) via email or physical mail.

  • The GPL continues and suggests that if your program runs in an interactive mode, you should write the program to output a notice each time it enters interactive mode that includes a message like this one that points to full information about the programs license:

    Gnomovision version 69, Copyright (C) year name of author
    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
    type `show w'.  This is free software, and you are welcome
    to redistribute it under certain conditions; type `show c' 
    for details.
           
  • Finally, it might be helpful to include a "copyright disclaimer" from an employer or a school if you work as a programmer or if it seems like your employer or school might be able to make an argument for ownership of your code later on. These aren't often needed but there are plenty of free software developers who have gotten into trouble and wish they'd asked for one.

2.3.3. Final license warning

Please, please, please, place your software under some license. It may not seem important, and to you it may not be, but licenses are important. For a piece of software to be included in the Debian GNU/Linux distribution, it must have a license that fits the Debian Free Software Guidelines. If your software has no license, it can not be distributed as a package in Debian until you re-release it under a free license. Please save yourself and others trouble by releasing the first version of your software with a clear license.

2.4. Choosing a Method of Version Numbering

The most important thing about a system of version numbering is that there is one. It may seem pedantic to emphasize this point but you'd be surprised at the number of scripts and small programs that pop up without any version number at all.

The second most important thing about a system of numbering is that the numbers always go up. Automatic version tracking systems and people's sense of order in the universe will fall apart if version numbers don't rise. It doesn't really matter if 2.1 is a big jump and 2.0.005 is a small jump but it does matter that 2.1 is more recent than 2.0.005.

Follow these two simple rules and you will not go (too) wrong. Beyond this, the most common technique seems to be the "major level," "minor level," "patch level" version numbering scheme. Whether you are familiar with the name or not, you interact with it all the time. The first number is the major number and it signifies major changes or rewrites. The second number is the minor number and it represents added or tweaked functionality on top of a largely coherent structure. The third number is the patch number and it usually will only refer to releases fixing bugs.

The widespread use of this scheme is why I know the nature and relative degree in the differences between a 2.4.12 release of the Linux kernel and a 2.4.11, 2.2.12, and 1.2.12 without knowing anything about any of the releases.

You can bend or break these rules, and people do. But beware, if you choose to, someone will get annoyed, assume you don't know, and try and educate you, probably not nicely. I always follow this method and I implore you to do so as well.

There are several version numbering systems that are well known, useful, and that might be worth looking into before you release your first version.

Linux kernel version numbering:

The Linux kernel uses a versioning system where any odd minor version number refers to an development or testing release and any even minor version number refers to a stable version. Think about it for a second. Under this system, 2.1 and 2.3 kernels were and always will be development or testing kernels and 2.0, 2.2. and 2.4 kernels are all production code with a higher degree of stability and more testing.

Whether you plan on having a split development model (as described in Section 3.3) or only one version released at a time, my experience with several free software projects and with the Debian project has taught me that use of Linux's version numbering system is worth taking into consideration. In Debian, all minor versions are stable distributions (2.0, 2.1, etc). However, many people assume that 2.1 is an unstable or development version and continue to use an older version until they get so frustrated with the lack of development progress that they complain and figure the system out. If you never release an odd minor version but only release even ones, nobody is hurt, and less people are confused. It's an idea worth taking into consideration.

Wine version numbering:

Because of the unusual nature of wine's development where the not-emulator is constantly improving but not working toward any immediately achievable goal, wine is released every three weeks. Wine does this by labeling their releases in "Year Month Day" format where each release might be labeled "wine-XXXXXXXX" where the version from January 04, 2000 would be "wine-20000104". For certain projects, "Year Month Day" format can make a lot of sense.

Mozilla milestones:

When one considers Netscape 6 and vendor versions, the mozilla's project development structure is one of the most complex free software models available. The project's version numbering has reflected the unique situation in which it is developed.

Mozilla's version numbering structure has historically been made up of milestones. From the beginning of the mozilla project, the goals of the project in the order and degree to which they were to be achieved were charted out on a series of road maps. Major points and achievements along these road-maps were marked as milestones. Therefore, although Mozilla was built and distributed nightly as "nightly builds," on a day when the goals of a milestone on the road-map had been reached, that particular build was marked as a "milestone release."

While I haven't seen this method employed in any other projects to date, I like the idea and think that it might have value in any testing or development branch of a large application under heavy development.

2.5. Documentation

A huge number of otherwise fantastic free software applications have withered and died because their author was the only person who knew how to use them fully. Even if your program is written primarily for a techno-savvy group of users, documentation is helpful and even necessary for the survival of your project. You will learn later in Section 4.3 that you should always release something that is usable. A piece of software without documentation is not usable.

There are lots of different people you should document for and there are lots of ways to document your project. The importance of documentation in source code to help facilitate development by a large community is vital but it falls outside the scope of this HOWTO. This being the case, this section deals with useful tactics for user-directed documentation.

A combination of tradition and necessity has resulted in a semi-regular system of documentation in most free software projects that is worth following. Both users and developers expect to be able to get documentation in several ways and it's essential that you provide the information they are seeking in a form they can read if your project is ever going to get off the ground. People have come to expect:

2.5.1. Man pages

Your users will want to be able to type "man yourprojectname" end up with a nicely formatted man page highlighting the basic use of your application. Make sure that before you release your program, you've planned for this.

Man pages are not difficult to write. There is excellent documentation on the man page writing process available through the "The Linux Man-Page-HOWTO" which is available through the Linux Documentation project (LDP) and is written by Jens Schweikhardt. It is available from Schweikhardt's site or from the LDP.

It is also possible to write man pages using DocBook SGML. Because man pages are so simple and the DocBook method relatively new, I have not been able to follow this up but would love help from anyone who can give me more information on how exactly how this is done.

2.5.2. Command line accessible documentation

Most users will expect some basic amount of documentation to be easily available from the command line. For few programs should this type of documentation extend for more than one screen (24 or 25 lines) but it should cover the basic usage, a brief (one or two sentence) description of the program, a list of the commands with explanations, as well as all the major options (also with explanations), plus a pointer to more in-depth documentation for those who need it. The command line documentation for Debian's apt-get serves as an excellent example and a useful model:

apt 0.3.19 for i386 compiled on May 12 2000  21:17:27
Usage: apt-get [options] command
       apt-get [options] install pkg1 [pkg2 ...]

apt-get is a simple command line interface for downloading and
installing packages. The most frequently used commands are update
and install.

Commands:
   update - Retrieve new lists of packages
   upgrade - Perform an upgrade
   install - Install new packages (pkg is libc6 not libc6.deb)
   remove - Remove packages
   source - Download source archives
   dist-upgrade - Distribution upgrade, see apt-get(8)
   dselect-upgrade - Follow dselect selections
   clean - Erase downloaded archive files
   autoclean - Erase old downloaded archive files
   check - Verify that there are no broken dependencies

Options:
  -h  This help text.
  -q  Loggable output - no progress indicator
  -qq No output except for errors
  -d  Download only - do NOT install or unpack archives
  -s  No-act. Perform ordering simulation
  -y  Assume Yes to all queries and do not prompt
  -f  Attempt to continue if the integrity check fails
  -m  Attempt to continue if archives are unlocatable
  -u  Show a list of upgraded packages as well
  -b  Build the source package after fetching it
  -c=? Read this configuration file
  -o=? Set an arbitary configuration option, eg -o dir::cache=/tmp
See the apt-get(8), sources.list(5) and apt.conf(5) manual
pages for more information and options.
    

It has become a GNU convention to make this type of information accessible with the "-h" and the "--help" options. Most GNU/Linux users will expect to be able to retrieve basic documentation these ways so if you choose to use different methods, be prepared for the flames and fallout that may result.

2.5.3. Files users will expect

In addition to man pages and command-line help, there are certain files where people will look for documentation, especially in any package containing source code. In a source distribution, most of these files can be stored in the root directory of the source distribution or in a subdirectory of the root called "doc" or "Documentation." Common files in these places include:

README or Readme

A document containing all the basic installation, compilation, and even basic use instructions that make up the bare minimum information needed to get the program up and running. A README is not your chance to be verbose but should be concise and effective. An ideal README is at least 30 lines long and more no more than 250.

INSTALL or Install

The INSTALL file should be much shorter than the README file and should quickly and concisely describe how to build and install the program. Usually an INSTALL file simply instructs the user to run "./configure; make; make install" and touches on any unusual options or actions that may be necessary. For most relatively standard install procedures and for most programs, INSTALL files are as short as possible and are rarely over 100 lines.

CHANGELOG, Changelog, ChangeLog, or changelog

A CHANGELOG is a simple file that every well-managed free software project should include. A CHANGELOG is simple the file that, as its name implies, logs or documents the changes you make to your program. The most simple way to maintain a CHANGELOG is to simply keep a file with the source code for your program and add a section to the top of the CHANGELOG with each release describing what has been changed, fixed, or added to the program. It's a good idea to post the CHANGELOG onto the website as well because it can help people decide whether they want or need to upgrade to a newer version or wait for a more significant improvement.

NEWS

A NEWS file and a ChangeLog are similar. Unlike a CHANGELOG, a NEWS file is not typically updated with new versions. Whenever new features are added, the developer responsible will make a note in the NEWS file. NEWS files should not have to be changed before a release (they should be kept up to date all along) but it's usually a good idea to check first anyway because often developers just forget to keep them as current as they should.

FAQ

For those of you that don't already know, FAQ stands for Frequently Asked Questions and a FAQ is a collection of exactly that. FAQs are not difficult to make. Simply make a policy that if you are asked a question or see a question on a mailing list two or more times, add the question (and its answer) to your FAQ. FAQs are more optional than the files listed above but they can save your time, increase usability, and decrease headaches on all sides.

2.5.4. Website

It's only indirectly an issue of documentation but a good website is quickly becoming an essential part of any free software project. Your website should provide access to your documentation (in HTML if possible). It should also include a section for news and events around your program and a section that details the process of getting involved with development or testing and make an open invitation. It should also supply links to any mailing lists, similar websites, and provide a direct link to all the available ways of downloading your software.

2.5.5. Other documentation hints

  • All your documentation should be in plaintext, or, in cases where it is on your website primarily, in HTML. Everyone can cat a file, everyone has a pager, (almost) everyone can render HTML. You are welcome to distribute information in PDF, PostScript, RTF, or any number of other widely used formats but this information must also be available in plaintext or HTML or people will be very angry at you. In my opinion, info falls into this category as well. There is plenty of great GNU documentation that people simply don't read because it only in info. And this does make people angry. It's not a question of superior formats; it is a question of accessability and the status quo plays a huge role in this determination.

  • It doesn't hurt to distribute any documentation for your program from your website (FAQs etc) with your program. Don't hesitate to throw any of this in the program's tarball. If people don't need it, they will delete it. I can repeat it over and over: Too much documentation is not a sin.

  • Unless your software is particular to a non-English language (a Japanese language editor for example), please distribute it with English language documentation. If you don't speak English or not confident in your skills, ask a friend for help. Like it or not, fair or unfair, English is the language of free software. However, this does not mean you should limit your documentation to only English. If you speak another language, distribute translations of documentation with your software if you have the time and energy to do so. They will invariably be useful to someone.

  • Finally, please spell-check your documentation. Misspellings in documentation are bugs. I'm very guilty of committing this error and it's extremely easy to do. If English is not your first language, have a native speaker look over or edit your documentation or web pages. Poor spelling or grammar goes a long way to making your code look unprofessional. In code comments, this type of thing is less important but in man pages and web pages these mistakes are not acceptable.

2.6. Other Presentation Issues

Many of the remaining issues surrounding the creation of a new free software program fall under what most people describe as common sense issues. Its often said that software engineering is 90 percent common sense combined with 10 percent specialized knowledge. Still, they are worth noting briefly in hopes that they may remind a developer of something they may have forgotten.

2.6.1. Package File Names

I agree with ESR when he says that: " It's helpful to everybody if your archive files all have GNU-like names -- all-lower-case alphanumeric stem prefix, followed by a dash, followed by a version number, extension, and other suffixes." There is more info (including lots of examples of what not to do in his Software Release Practices HOWTO which is included in this HOWTO's bibliography and can be found through the LDP.

2.6.2. Package formats

Package formats may differ depending on the system you are developing for. For windows based software, Zip archives (.zip) usually serve as the package format of choice. If you are developing for GNU/Linux, *BSD, or any UN*X, make sure that your source code is always available in tar'ed and gzip'ed format (.tar.gz). UNIX compress (.Z) has gone out of style and usefulness and faster computers have brought bzip2 (.bz2) into the spot-light as a more effective compression medium. I now make all my releases available in both gzip'ed and bzip2'ed tarballs.

Binary packages should always be distribution specific. If you can build binary packages against a current version of a major distribution, you will only make your users happy. Try to foster relationships with users or developers of large distributions to develop a system for the consistent creation of binary packages. It's often a good idea to provide RedHat RPM's (.rpm), Debian deb's (.deb) and source RPM's SRPM's if possible. Remember: While these binaries packages are nice, getting the source packaged and released should always be your priority. Your users or fellow developers can and will do the binary packages for you.

2.6.3. Version control systems

A version control system can make a lot of these problems of packaging (and a lot of other problems mentioned in this HOWTO) less problematic. If you are using *NIX, CVS is your best bet. I recommend Karl Fogel's book on the subject (and the posted HTML version) wholeheartedly.

CVS or not, you should probably invest some time into learning about a version control system because it provides an automated way of solving many of the problems described by this HOWTO. I am not aware of any free version control systems for Windows or Mac OS but I know that CVS clients exist for both platforms. Websites like SourceForge do a great job as well with a nice, easy-to-use web interface to CVS.

I'd love to devote more space in this HOWTO to CVS because I love it (I even use CVS to keep versions straight on this HOWTO!) but I think it falls outside the scope of this document and already has its own HOWTOs. Most notably is the CVS Best Practices HOWTO[CVSBESTPRACTICES] which I've included in the attached bibliography.

2.6.4. Useful tidbits and presentation hints

Other useful hints include:

  • Make sure that your program can always be found in a single location. Often this means that you have a single directory accessible via FTP or the web where the newest version can be quickly recognized. One effective technique is a provide a symlink called "yourprojectname-latest" that is always pointing to the most recent released or development version of your free software application. Keep in mind that this location will receive many requests for downloads around releases so make sure that the server you choose has adequate bandwidth.

  • Make sure that there is a consistent email address for bug reports. It's usually a good idea to make this something that is NOT your primary email address like yourprojectname@host or yourprojectname-bugs@host. This way, if you ever decide to hand over maintainership or if your email address changes, you simply need to change where this email address forwards. It also will allow for more than one person to deal with the influx of mail that is created if your project becomes as huge as you hope it will.