The Linux Foundation released its annual report on “Who Writes Linux” in April of this year. It reveals interesting facts about how such a large-scale open-source community operates.
Some of the Linux Kernel facts that jump out of the report include:
Releases are done every 80 days on average. Every release includes an average of 10,000 patches; that’s an average of 5.6 patches per hour. About 1,200 developers contribute to any given release.
The number of lines of code increases about 1.1 million per year, which averages out to 136 lines of code per hour. There have been about 10,000 developers involved for the full 20 year history of the project. The most active contributor made only 1.2% of the changes, and the 20th most active contributor made 0.6% of the changes; this shows how flat the distribution of contributions is.
Out of all the changes made, 50% of them were distributed across the top four organizations. The largest number of contributions came from non-affiliated developers (18%). Very experienced developers are focused on merging changes, such as Greg Kroah-Hartman who has signed of 5.8% of line changes, and Linus Torvalds who has merged 2.4% of line changes (being the 4th ranked signer).
What all this illustrates is the behavior of a fully grown and properly provisioned open source project. In particular:
- A large number of contributors.
- A high ratio of contributors to number of lines of code.
- A rather flat distribution of labor.
- No large dependency on any given developer.
- All of them are loved and appreciated, but none of them is indispensable.
The Linux Kernel is not the only project to have flourished to that level. Similar scale projects include, for example:
These are, of course, not full time developers on average, but rather those whose commitments follow the typical power-log distribution of communities where participation is open and volume is only regulated by level of interest and the availability of contributor time. Such distribution is explained by Yochai Benkler in his article “Coase’s Penguin”.
With this context in mind, we have looked back at the Insight Toolkit (ITK) project and realized that the community is vastly underpowered. The size of ITK is about 1.1 Million lines of code (LOC), out of which 655 K lines of code are from third party libraries such as PNG, TIFF, JPEG, GDCM, and HDF5. This leaves us with 468 KLOC of native ITK code. If we apply the rule of thumb on the ratio of the number of developers compared to number of LOC from the Linux Kernel to ITK, we find that our community should have about 468 active contributors. The statistics of the Git repository, however, reveal that in the full history of the project, we have 162 contributors; out of that number, only 74 have contributed during the two years of the ITKv4 refactoring effort.
From this realization, we have launched a new initiative to grow the ITK community to the size that matches the complexity of the toolkit. Our estimation above indicates that we should grow up to 500 developers, as always, following that power log distribution where 20% of developers do 80% of the work, and where there is a long tail of many developers who will take care of the 20% of work remaining.
To get there, we are pursuing two major initiatives: Intensive Training, and Engagement and Retention.
The natural place to start the training and recruiting process is the large community of what we used to call “users,” but that now we more respectfully refer to as “community members.”
Based on the 2,200 subscribers to the mailing list and the 3,200 monthly downloads of ITK released packages from Sourceforge, We estimate that about 5,000 people are using ITK worldwide. We cannot establish an exact number of ITK adopters due to our adherence to the practices of allowing the free flow of software downloads without requiring registration. In other words, we refrain from tracking downloads and getting in your way when you are downloading the toolkit.
From these numbers, our mission is to engage 10% of these community members, and to bring them to become active participants in the development and maintenance of the toolkit. That 10% will correspond to the 500 maintainers who can take care of all the needs of the toolkit, from bug reporting, bug fixing, documentation, training, support in the mailing list, and development of new features and improvements.
To provide the grounds for training we’ve launched the ITK BarCamp initiative. A BarCamp is an open space for collaboration, with particular emphasis on education and improvement of skills.
Being born in the Information Age, the ITK BarCamp is taking advantage of the most popular online sites to facilitate the outreach to the larger ITK community members, wherever they are.
The ITK BarCamp has a Google+ page  and regular hangouts are organized to bring community members to work together on training and development activities. These hangouts are publicly open, and are recorded for future viewing by those who may have missed the occasion. The ITK BarCamp is also one of the first organizations to have a G+ Community page .
ITK BarCamp also maintains YouTube channel  where we post short video tutorials and the recordings on hangout activities. The collection of short video tutorials is following the approach of Kahn Academy: building up a body of five to fifteen minutes videos that can be watched in any order. They cover a large variety of topics related to the software development skills needed to become a master ITK contributor.
These short videos are accompanied by text instructions, and are linked from the documentation page , which is generated from RST files that are processed by Sphinx to generate the final HTML pages. The sources of these RST files and associated Sphinx configuration are publicly available in the Github repository .
Please join us in this initiative to grow the ITK community in the number of contributors, level of programming skills, and spirit of collaboration. Your suggestions are greatly appreciated.
Luis Ibáñez is a Technical Leader at Kitware, Inc. He is one of the main developers of the Insight Toolkit (ITK). Luis is a strong supporter of Open Access publishing and the verification of reproducibility in scientific publications.
Matt McCormick is a medical imaging researcher working at Kitware, Inc. His research interests include medical image registration and ultrasound imaging. Matt is an active member of scientific open source software efforts such as the InsightToolkit, TubeTK, and scientific Python communities.
Xiaoxiao Liu is an R&D Engineer at Kitware. Her research interests are in medical image analysis and applications, including statistical shape analysis for anatomical structures, deformable shape modeling and segmentation, diffeomorphic image registration techniques and image-guided radiotherapy.