Factors Contributing to the Success of Open Source (Part 2 of 3)
In Part 1, I laid out the background for a recent study at Carleton University that determined which factors contribute to the success of Open Source projects.
The six factors were:
1 - Number of developers
2 - Experience of developers
3 - Target audience
4 - Language popularity
5 - Project type
6 - License type
The two factors that had no statistically significant correlation were license type and programming language used. In my opinion, the lack of correlation for programming language makes a lot of sense. It proves that end users of applications and tools really don't care what is under the hood, as long as the product is good at what it does. Language does matter for frameworks and infrastructure types of projects, but the more popular a language is, the more competition the project will have and therefore the benefits of having a larger possible user base will be muted.
I am a bit surprised that licensing does not have more of an impact, I realize that end users probably do not care, but I would have expected a lot of developer based projects to tend towards less restrctive licenses. We will see later this week in part 3 that this factor actually comes back into play during the second part of the study.
All four of the remaining factors had a statistically significant correlation between the factor and likelyhood of success of the project. In order of strength:
1 - Project type (end user app, versus tools versus frameworks and infrastructure).
2 - Developer Experience
3 - Number of developers
4 - Target audience.
So it turns out that projects that are infrstructure/frameworks have the strongest correlation to successful open source projects. Tools less so and end user apps show the least correlation. I did find this a bit surprising given the popularity of some end user applications like Azureus. It does make sense that developers are more sophisticated about open source and
likely to use infrastructure/frameworks, but they would also be just as likely to use the end user applications and tools. One possibility is that there is a lot of competition for tools and applications with non open source incarnations. For example, look at the 100's of bit torrent applications that are also free (but not open source) that compete with Azureus. On the other hand, infrastructure and frameworks may have competition but it seems more likely to be commercial and restrictive in nature. Simply put - applications and tools do not scream to be extended and modified and therefore it doesn't really matter if they are open source so much as it matters that they are free. It does seem to matter that infrastructure is free and open source.
The strong correlation with experience of developers makes a lot of sense. If I were starting a new open source project, I would want to find people with experience to help. If I were starting anything, I would want to stack it with experience. But it is reassuring to see conclusively that Open Source is no different than other endevours - the playing field is level and experience is beneficial.
The number of developers correlation is a bit more suprising. I would have expected that after a certain "core" number of developers on a project there would start to be a negative impact. This was not demonstrated from the data. I know that all projects experience some level of churn during their existence, and I guess having a larger base improves the ability to adapt.
Projects that target the developer audience show a stronger correlation to success and less for sysadmin focused projects and even less for end user applications. I think this comes back to the same discussion as application type. It can't simply be because the developer audience is more accepting of open source because presumably they would also be likely to use end user applications. I think it boils down to the fact that it matters less that end user applications and tools are open source - as long as they're free/cheap. But for stuff that developers might use - stuff that needs to be extended, changed, distributed and adapted - being open source is critical.
One final note - there was actually another dimension to this study. The core data set was 350 random projects from sourceforge. Of the 350 random projects, 108 of them were classified as being in production. Do you think the correlations stay the same on this subset of 108 "production" projects?
On to Part 3.
- Don
The six factors were:
1 - Number of developers
2 - Experience of developers
3 - Target audience
4 - Language popularity
5 - Project type
6 - License type
The two factors that had no statistically significant correlation were license type and programming language used. In my opinion, the lack of correlation for programming language makes a lot of sense. It proves that end users of applications and tools really don't care what is under the hood, as long as the product is good at what it does. Language does matter for frameworks and infrastructure types of projects, but the more popular a language is, the more competition the project will have and therefore the benefits of having a larger possible user base will be muted.
I am a bit surprised that licensing does not have more of an impact, I realize that end users probably do not care, but I would have expected a lot of developer based projects to tend towards less restrctive licenses. We will see later this week in part 3 that this factor actually comes back into play during the second part of the study.
All four of the remaining factors had a statistically significant correlation between the factor and likelyhood of success of the project. In order of strength:
1 - Project type (end user app, versus tools versus frameworks and infrastructure).
2 - Developer Experience
3 - Number of developers
4 - Target audience.
So it turns out that projects that are infrstructure/frameworks have the strongest correlation to successful open source projects. Tools less so and end user apps show the least correlation. I did find this a bit surprising given the popularity of some end user applications like Azureus. It does make sense that developers are more sophisticated about open source and
likely to use infrastructure/frameworks, but they would also be just as likely to use the end user applications and tools. One possibility is that there is a lot of competition for tools and applications with non open source incarnations. For example, look at the 100's of bit torrent applications that are also free (but not open source) that compete with Azureus. On the other hand, infrastructure and frameworks may have competition but it seems more likely to be commercial and restrictive in nature. Simply put - applications and tools do not scream to be extended and modified and therefore it doesn't really matter if they are open source so much as it matters that they are free. It does seem to matter that infrastructure is free and open source.
The strong correlation with experience of developers makes a lot of sense. If I were starting a new open source project, I would want to find people with experience to help. If I were starting anything, I would want to stack it with experience. But it is reassuring to see conclusively that Open Source is no different than other endevours - the playing field is level and experience is beneficial.
The number of developers correlation is a bit more suprising. I would have expected that after a certain "core" number of developers on a project there would start to be a negative impact. This was not demonstrated from the data. I know that all projects experience some level of churn during their existence, and I guess having a larger base improves the ability to adapt.
Projects that target the developer audience show a stronger correlation to success and less for sysadmin focused projects and even less for end user applications. I think this comes back to the same discussion as application type. It can't simply be because the developer audience is more accepting of open source because presumably they would also be likely to use end user applications. I think it boils down to the fact that it matters less that end user applications and tools are open source - as long as they're free/cheap. But for stuff that developers might use - stuff that needs to be extended, changed, distributed and adapted - being open source is critical.
One final note - there was actually another dimension to this study. The core data set was 350 random projects from sourceforge. Of the 350 random projects, 108 of them were classified as being in production. Do you think the correlations stay the same on this subset of 108 "production" projects?
On to Part 3.
- Don
2 Comments:
At 7:09 PM, Michael Scharf said…
Hi Don,
I'm not sure if taking 350 random projects from sourceforge is a good sampling method. Like with shareware, there are many "hobby programmers" trying write end user applications and some of them make it open source in the hope someone would pick it up and help out. They add noice. Many of them are of poor quality (open source, freeware and shareware). Often they are single-developer projects. There are lots of those at sourceforge. I think there should be a higher bar of being considered in such a study than simply being listed at sourceforge.
Like with eclipse, or apache, there has to be a minimal quality for something to be considered an open source project.
Therefore, a more interesting study would be to look at projects hosted at apache.org or eclipse.org. What are the factors of success and failure in those projects.
How many open source projects are out there? The question is similar to the question how long the coastline of a country is (http://en.wikipedia.org/wiki/Coastline_paradox).
It depends on how you count. If someone dumps some source code into a public place, is this already an open source project?
In the context of eclipse we don't have to be scared when one-person-hobby-projects fail. We have to to look for big open source projects. The factors of success and failure of such projects matters. Even big projects can fail!
Michael
At 2:57 PM, Donald Smith said…
The class of SourceForge projects is not necessarily indicitive of supersets and subsets of other Open Source segmentations. The author of the research also notes that the definition of "success" is debatable and that there are many other factors to consider which may or may not be correlated with each other.
But I would argue that sourceforge is a pretty good lithmus test overall. You mentioned that some SourceForge projects are poor quality - there is a continum on all communities, Eclipse and Apache are by no means perfect. you mentioned that many are single-developer projects, but it's not uncommon for people to argue that "the less people involved in a project, the better off it will be" - this proves otherwise.
I do agree wholeheartedly about testing "hobby" programmers versus those who do open source as part of their career, and in fact we've recommended this as a factor to test in ongoing research. Any other suggestions?
- Don
Post a Comment
<< Home