How Open are Commercial Scientific Software Packages?

A revised version of this post has been published as a Viewpoint in the Journal of Physical Chemistry Letters, DOI: 10.1021/acs.jpclett.5b02609.

Most scientific research nowadays relies on some kind of software. This is particularly true in fields such as my own, quantum chemistry. Such software is used both in applications to study various problems in chemistry, often in close connection with experiment, and it serves as platform for the development of new theoretical and computational methods.

In quantum chemistry, many program packages are available [1], that differ in functionality, usability (from easy-to-use by non-specialists to usable only by the person who wrote it) and computational efficiency. What basically all available codes have in common is that they have been developed with public (i.e., tax payers’) money. Nevertheless, the terms under which they are made available differ very significantly: Some program packages are available under open-source licenses (meaning that anyone can “study, change, and distribute the software to anyone and for any purpose” [2]), others are owned by commercial companies who sell them to both academic groups and industry users for a small or large fee. Intermediate models (free, but not open source) also exists, such as closed-source software that is distributed for free by academic groups [3] or software for which the source code is available to academic users, but with license terms that prohibit changes or redistribution.

Pushing towards Open-Source in Science

Open-source scientific software offers a number of advantages for science as a whole [4]. The most important one is that publicly funded scientific software should be available for everyone to use and extent. This has led some funding agencies, in particular in the US, to require software developed under certain grants to be open source. Recently, Krylov et al. published a viewpoint in J. Phys. Chem. Lett. that criticizes such open-source mandates [5].

The piece is written by eminent scientist, whose work in quantum-chemical method and software development I admire. All of them have in common, that, besides being professors at research universities, they are co-owners of companies selling quantum-chemical software packages [6]. I find many of the arguments put forward in this opinion piece flawed, miss a consistent use of terminology (free as in speech vs. free as in beer), and think that it is full of contradicting statements.

Perspective of the Method Developer

Here, I want to focus on one particular perspective: The one I have as a developer of new quantum-chemical methods. To develop, test, and finally use a new idea, it needs to be implemented in software. Usually, this requires using a lot of well-established tools, such as integral codes, basic methods developed many decades ago, and advanced numerical algorithms. All of these are a prerequisite for new developments, but not “interesting” by itself anymore today. Even though all these tools are well-documented in the scientific literature, recreating them would be a major effort that cannot be repeated every time and by every research group – because both time and funding are limited resources, especially for young researches with rather small groups such as myself.

Therefore, method developers in quantum chemistry need some existing program package as a “development platform”. Both open-source and commercial codes can offer such a platform. Open-source codes have the advantage that there is no barrier to access. Anyone can download the source code and start working on a new method. I have so far mostly contributed my developments to commercial codes. These also offer a lot of advantages: For successful codes, the revenue from selling licenses can be used by the companies owning them to employ software developers who maintain and document the code. These can further improve code contributed by academic groups in order to make it maintainable, efficient, and easily extendable. This can speed up new developments and improve the quality and efficiency of the resulting new software.

Commercial Codes as “Open Teamware”?

The authors of the opinion piece in Ref. [5] argue that there is no need for open-source development platforms because many commercial codes, such as Q-Chem [7] and others, operate under what they call a “open teamware” model. As they point out, many commercial code have assembled rather large communities of academic developers.

However, I would argue that access to commercial codes as a development platform is not as open as the authors of Ref. [5] claim. First of all, it is subject to signing a developer agreement, the terms of which are dictated by the companies owning the source code and that are drafted to protect their commercial interests. Usually, they include a transfer of intellectual property rights for the new developments to these companies as well as non-disclosure clauses concerning the source code and algorithms implemented in it. (Here and in the following, I am not talking about specific software packages, because the terms of developer agreements are usually covered by non-disclosure clauses themselves. Therefore, I either do not know the precise terms, or I am not allowed to reveal them).

Often, such developer agreements require exclusiveness, meaning that new source code cannot be contributed to different commercial packages. Sometimes, developers are even banned from using competing program packages [8]. Such requirements for exclusiveness prevent scientific collaborations. I have encounters this on several occasions, when fellow scientists told me that they would love to collaborate, but that they cannot do so because we are contributing to competing packages. Thus, the commercial interests of software companies lead to a segregation of the scientific community based on affiliation with certain codes. Often, methods developed in one program package are reinvented in others because scientists cannot collaborate or use each others’ software.

Perpetuating Power Structures

The use of commercial codes as development platform also puts the few scientists owning the corresponding companies into a gatekeeper position. It is up to them to decide who is allowed to contribute new ideas and developments. The policies of different companies may differ significantly. However, all of them will require to reveal novel research ideas to the scientists in these gatekeeper positions. These will in many cases be competing scientists, who might reject access because ideas are opposite to their own “scientific beliefs” or because they might interfere with their own lines of research.

These mechanism lead to perpetuating power structures that put very few individual scientists, the owners of commercial software packages, in control of most method development in our field. It should be pointed out here that many of the authors of Ref. [5] are not the original developers of the commercial codes they now own, put that they have inherited these from their academic teachers. Such decisions will certainly have been based on scientific achievements, but they have not been taken by the academic community as a whole through peer-review and funding panels, but by the few pioneers who developed the software infrastructure our whole field relies on today.

This contradicts the merit-based access to scientific resources that the authors of Ref. [5] so keenly advertise. The possibility to carry out new method developments should only be based on the quality of new ideas (as judged by grant reviewers and panels of funding agencies) and not on whether or not a scientist is part of a certain school. The “track record of productivity” [5] rewarded by funding agencies with grant money should have been established with competitive ideas, not because of access to a software infrastructure built by a researchers’ academic ancestors. (Again, let me point out that I admire the track record of all of the authors of Ref. [5] – but I think that the playing field has to be leveled for the next generation of scientists).

Finally, I have to admit that, at least in part, the problems discussed above also exist for open-source program packages. Often, these codes are less well documented and maintained (because of the lack of revenue from selling licenses), with the consequence that the barrier to contributing them might be significant. Often, it can only be overcome by collaborating with one of the lead authors of such open-source codes, which again puts these into a similar gatekeeper position. In addition, open-source code is often not immediately released to the public in order to maintain a competitive advantage over scientists that might want to improve or built upon new methods.

Possible Solutions

A first step towards a solution would be to remove the conflict of interest many scientist owning and running scientific software companies face. If these companies are run by businessmen instead of active scientists, then decisions to grant access to new external developers will be based on the possible merits for the (paying) users of the software packages and will not be influenced by fear of scientific competition. Some commercial codes use such a model [9]. A least, the policies underlying decisions whether or not to grant access to external developers should be made transparent.

Second, I believe that funding initiative to create open-source packages and to sustain their maintenance are an important piece in creating truly open platforms for method development. Apparently, such initiative are being implemented in the US, both through national laboratories and via funding agencies [5]. Such initiatives provide a means to level the playing field, by making funding available to open-source packages that commercial codes can obtain via their revenues form selling licenses to academic and industrial users. Such initiatives should, of course, not destroy commercial codes, but level the playing field. In fact, there are also funding opportunities that are exclusively available to commercial codes, such as technology grants. In Europe, many programs under the Horizon2020 framework encourage or require the involvement of small or medium enterprises, and some quantum chemistry software companies have been very successful in securing such grants [10].

Concerning funding for fundamental research, open source mandates might indeed have severe consequences for commercial codes because it would cut them off from academic method development. This could be mediated by requiring such codes – if they want to profit from public funding for basic research – to implement a truly open platform strategy, that allows non-discriminatory access to the source code for interested developers. With strict open-source mandates, commercial codes would still have the possibility to create new development in the form of modular libraries released under open-source licenses.

Conclusions

I have focused here on the perspective of the quantum-chemical method developer. Of course, there are other aspects of this discussion that are equally relevant, such as the one of the users and the global perspective of science as a whole. Related discussions on open access and open data policies are often mixed with those on open source software, which I find detrimental because the players are very different ones (small software companies run by scientists in the case of open source vs. huge publishers with monopolies in the case of open access). I any case, I want to repeat that this blog post only records some of my personal thought and I welcome any comments and discussions.

Conflict-of-Interest Statement I am a university professor in theoretical chemistry whose research depends on funding by public money – via government funding of our university and via funding agencies. Our research is also supported by industry grants from Volkswagen AG, Wolfsburg.
Most of my past method development has been contributed to the commercial software package ADF, owned by Scientific Computing and Modeling (SCM) B.V., Amsterdam, under a developer agreement. I also have access to the Turbomole program package under a developer agreement with Turbomole GmbH, Karlsruhe. I have no financial assets in SCM, Turbomole, or other scientific software companies, and I did not receive direct or indirect financial compensation for these contributions. I have also contributed to the Dirac and Dalton packages, which are free for academic users, but not open source (yet). Some software developed in my research group is – or will soon be – available under open-source licenses.

References

[1] https://en.wikipedia.org/wiki/List_of_quantum_chemistry_and_solid-state_physics_software
[2] https://en.wikipedia.org/wiki/Open-source_software
[3] see e.g., the ORCA code, http://www.cec.mpg.de/forschung/mts-forschungsprojekte/orca-prof-frank-neese-dr-frank-wennmohs.html
[4] J. D. Gezelter, “Open Source and Open Data Should Be Standard Practices”, J. Phys. Chem. Lett. 6, 1168−1169 (2015). DOI: 10.1021/acs.jpclett.5b00285
[5] A. I. Krylov, J. M. Herbert, F. Furche, M. Head-Gordon, P. J. Knowles, R. Lindh, F. R. Manby, P. Pulay, C.-K. Skylaris, H.-J. Werner, J. Phys. Chem. Lett. 6, 2751-2754 (2015). DOI: 10.1021/acs.jpclett.5b01258
[6] see the “conflict-of-interest statements” at the end of Ref. [5]
[7] http://www.q-chem.com/
[8] http://www.bannedbygaussian.org/
[9] http://www.scm.com/
[10] http://www.scm.com/EUprojects/