Date Published: February 01, 2018
Publisher: International Union of Crystallography
Author(s): Evgeny Krissinel, Ville Uski, Andrey Lebedev, Martyn Winn, Charles Ballard.
The paper describes recent CCP4 initiatives and projects aimed at bringing software and data services which utilize distributed computational resources to users.
During the last decade, there has been remarkable progress in methods for structure solution, in large part associated with the development of automated expert systems that are capable of delivering almost complete solutions with minimal or no user intervention. Such systems represent complex scripts (commonly called pipelines) built on top of more elementary programs. Such automation is based on three fundamental components: intrinsic knowledge (expressed in the algorithms and decision makers that are used); extrinsic knowledge (coming from external sources such as structure and sequence databases); and sufficiently deep exploration of known approaches and various parameters.
Automatic pipelines for structure solution, which consume significant CPU resources and utilize custom databases, are naturally suitable for deployment as web services. Fig. 1 ▸ presents a schematic of a conventional web-service setup, which is characterized by the presence of a single internet portal (1, the web server), data storage for keeping user data and calculation results (2, the database), and number crunchers (3). Depending on the anticipated workload, elements of the web service may be deployed on individual machines, or combined on fewer machines to optimize hardware resources. It is not uncommon to keep the database and web server on the same machine, although this is not desirable for security reasons.
The computational cloud (Hassan, 2011 ▸) represents a new model for everyday computing which is becoming increasingly popular. Recent years have been marked by an explosive development and spreading of ultraportable devices, such as tablets, Chromebooks and smartphones, which may be seen as mere cloud terminals. There is also a clear trend towards cloud-based operations in software installed on conventional desktops and laptops. Although cloud computing may be seen as less suitable in terms of data security, for the majority of users it offers at least four essential advantages: (i) the elimination of local data maintenance; (ii) the elimination of data exchange between user devices; (iii) independence of the type of user device; and (iv) access to considerable computational resources.
Distributed computations become increasingly more attractive an alternative to the more traditional approach based on the possession and maintenance of local resources. Associated benefits include not only access to potentially unlimited computational power, but also to a centrally maintained CCP4 setup, databases and user projects available on any device without the need for explicit export and import. The only drawbacks of a distributed model include the requirement for permanent internet access and concerns regarding data security. While the former is a next-to-negligible factor these days, the latter may still be a factor of deterrence in corporate environments. However, even there distributed computing may be employed locally, for example within the network of a particular company or a university laboratory.
Most of the work presented in this paper is currently in progress and the corresponding software and service will be released in the shortest time subject to progress. CCP4 web services are fully available at https://www.ccp4.ac.uk/ccp4online/ with free registration. CCP4 Cloud is in test mode; registration is available on demand to users with a UK FedID. The CCP4 web application, jsCoFE (http://rcccp4serv1.rc-harwell.ac.uk/jscofe/), is being prepared for alpha release and is available for test volunteers upon registration at the jsCoFE home page.
In the present publication, we have given an overview of the current CCP4 activities towards distributed computations for the MX crystallographic community. While ordinary desktops and laptops will be in use for the foreseeable future, the rising popularity of mobile devices and cloud-based approaches to data and software services is fairly obvious. With respect to structure-solution tasks requiring access to supercomputing and databases, the move to cloud models is already recognized as a necessity. From the approaches that we have discussed, CCP4 Cloud represents a good solution for corporate setups with a developed computing infrastructure. CCP4 web application is a relatively lightweight solution that is easy to install and maintain virtually anywhere, but it needs further development to match the functionality of CCP4 Cloud. CCP4 web services remain a well targeted collection of useful CPU-intensive tasks with easy and efficient access, obviously appealing to its ∼600 regular users. It may take a few more years before clarity is reached on which approach, if any, will appeal to CCP4 users most.