Appendix B
Implementation Scenarios: Informatics and Information-Sharing
Chapter 6 discussed the implementation of a fully integrated research strategy. Implementation of the strategy requires an infrastructure capable of expanding current institutional arrangements for interagency coordination, stakeholder engagement, public-private partnerships, and management of potential conflicts of interest. Implementation also requires new mechanisms for integrating informatics and information-sharing into the research structure. This appendix clarifies how a systems approach to the planning and development of a facile and agile informatics infrastructure might help to implement a research strategy that is responsive to the input of stakeholders and accelerate the research for nanotechnology applications and implications. This infrastructure would borrow heavily from advances in digital technologies and the Semantic Web to reduce collaboration timescales and ensure more effective communication among stakeholders.
Implementation scenarios will be summarized here for the development of methods and protocols, predictive and risk models, a federated data-sharing network, and a semantic informatics infrastructure. There are a great many similarities between challenges in improving model development and challenges in improving method development. Both require a systems overview of the problem to encompass the needs of the entire community, both need continual iterative development of pilot efforts to elicit user requirements and enlist user support, and both have an interest in implementing advanced digital tools and applications. The intent of the scenarios is to illustrate how a system approach could accelerate progress in nanoscience and nanotechnology research and translation; however, they should not be viewed as blueprints for implementation. Involving stakeholders at the beginning and throughout the development process is critical for successful implementation. The needs and requirements of the user base must be satisfied if best practices are to be advanced and critical participation in the informatics effort is to be attained.
In all the scenarios presented below, input from existing and emerging stakeholder and user communities is needed. Standard development organizations (SDOs) such as the International Organization for Standardization and the American Society for Testing and Materials (ASTM) continue to adopt new experimental protocols, guides, and practices from metrology institutes such as the National Institute of Standards and Technology, laboratories such as the National Cancer Institute’s (NCI) Nanotechnology Characterization Laboratory, and industry. Interlaboratory testing of those protocols has been performed by the Asia-Pacific Economic Cooperation (Wang et al. 2007), the International Alliance for NanoEHS Harmonization (IANH 2011), and ASTM, among others, using nanomaterials recommended or developed by the Organisation for Economic Co-operation and Development, metrology institutes, or purchased commercially. Support for nanomaterial registration is becoming established through the National Institute of Biomedical Imaging and Bioengineering Registry. The stakeholder and user groups for modeling, data-sharing, and informatics infrastructure also are broadly based (Nanoinformatics 2011a). Historically, this larger nanotechnology community has interacted through workshops designed to support and harmonize increased collaboration and has adopted successful informatics implementations from other fields. More recently a series of workshops on nanoinformatics has provided a roadmap for collaborations in informatics (InterNano 2011) and to support the development and review of pilot nanoinformatics applications (Nanoinformatics 2011b), including those relevant to the following scenarios.
Method Development and Validation Scenario
This scenario closely follows international recommendations for standard method development and validation and adds video technologies to accelerate the development of the methods and for training. Key principles underlying this approach are that it should be efficient, be flexible, add value, be amenable to establishing data rights, provide for continual improvement of the protocol, document experience in its use, and be tailored to develop and maintain the entire needed dataset. A benefit of the scenario is that it provides a basis for training in new and revised methods and for accrediting contract research organizations (CROs) to allow more outsourcing of extensive nanomaterial characterization with validated methods. Finally, the scenario guarantees publication of sensitivity data that are not normally published but that are useful for establishing quantitative structure-activity relationships (QSARs), for designing and redesigning products and processes, and for assessing risk.
A possible set of best practices for establishing validated analytic methods on the basis of current practice and adapting them for nanomaterials to address the needs discussed in Chapter 4 includes the following:
1) Encourage early practitioners to document and publish protocols as individual methods are developed.
2) Begin collaborative development of new methods and protocols only after preliminary video protocols and sufficient well-characterized nanomaterials are available to support method development and testing.
3) Use a material registry system to designate unique lot descriptors for each nanomaterial sample, to maintain a catalog of descriptors to capture lot-to-lot variability in engineered nanomaterials (ENMs), and to correlate possibly different effects seen in various uses and analyses of material lots. Monitor shipping and record environmental conditions during transit of the nanomaterial and any biologic materials needed for use in interlaboratory studies (ILSs), including calibration efforts.
4) Accelerate progress in developing standard analytic methods by using video (particularly common video equipment and applications, such as cellular telephones) and digital collaboration environments (for example, wikis, RSS feeds, and Facebook) to facilitate broad participation and communication.
5) Conduct informal interlaboratory testing of the protocols to identify causes of laboratory bias and to investigate the ruggedness and reliability of the methods before development of the documentary standard. It may be faster to achieve consensus on the documentary standard because of the prior vetting of the protocol and the existence of the informal ILS results, robustness data, and video (JoVE 2011).
6) Rapidly modify the video protocol through a small ILS testing group so that consensus is reached and the (informal) error and uncertainty of the method are satisfactory.
7) Collect data to substantiate that the ruggedness and robustness of the method are adequate (quantification of the sensitivity of the method to variation in any experimental procedure, materials, or conditions is archived with the video protocol).
8) Develop a consensus documentary standard based on the video and the results of the informal ILSs and a more polished video illustrating the method.
9) Use the documentary standard and supplementary video in formal ILSs to determine the error and uncertainty of the method.
10) Publish
a) The documentary consensus standard.
b) The final video (to be used for training).
c) The error and uncertainty data and analysis.
d) The sensitivity data and analysis.
11) Establish a reporting standard for the level of validation achieved by a laboratory that is using the method (for example, full validation, partial validation, corroboration with at least one other laboratory, or a single laboratory result).
12) Continue to engage the stakeholder group through the collaborative environment to
a) Aggregate and organize information on experience with the method, especially details of sample preparation and additional controls required for testing of other ENMs.
b) Rapidly update the method, video, and databases.
c) Aid in establishing the minimum required characterization for each ENM.
d) Provide a virtual helpdesk for the method.
This scenario provides an overarching context for the different activities involved in developing a standard and a framework for collaboration among the various partners in the effort, such as standard development organizations (SDOs), metrology institutes, national user facilities, federally funded research and development centers, and CROs. The information infrastructure allows coordination among all participants in the collaboration, and although each participant may continue to perform its usual role, expansion of roles to new activities and products is possible.
Model Development and Validation Scenario
The informatics needs in Chapter 4 specified increasing the pace of model development and validation by leveraging digital technologies and applications; creating incentives for collaborative development and validation of models to estimate model error, uncertainty, and sensitivity more efficiently; providing platforms for continued improvement and testing of models; and archiving and curating model results. The needs assessment specified both specific capabilities required for an acceptable informatics infrastructure and those related to establishing new collaborative mechanisms. Model development has a rich history in the development and validation of models for biochemistry, pharmaceuticals, and genomics to delineate feasible recommendations for implementation. Model development and validation are different from method development and validation in that most models are already in a digital form and can readily be shared electronically. The scenario presented below poses a series of activities to provide common environments for interdisciplinary model building and validation for all the stakeholders involved in nanoscience and nanotechnology research and translation.
Nanomaterials require a variety of structural descriptors. For small nanoparticles such as dendrimers an exact structural description may be obtained as with polymers or proteins, even though the particle’s conformation may be time dependent and change with its local environment and its interaction with other molecules and surfaces. For nanoparticles within the range of ultrafine particles, there may be large polydispersity and polymorphism, as these particles are primarily produced through batch synthesis rather than self assembly. The descriptors at this particle size range are primarily used in material science and include cruder measures such as core or shell sizes and ranges, as well as grain
size and lattice defects. For both types of nanoparticles, modification of the surfaces of the particles contributes to polydispersity and interface chemistry and descriptors become increasingly important. Finally, the nanoparticles may be embedded in a matrix to fabricate a nanomaterial or nanoproduct (for example, colloids), resulting in a material with even greater polydispersity and thus a larger number of required descriptors.
Because of the variation in the properties of nanomaterials, understanding the mechanisms underlying these properties requires input from many different scientific and engineering fields. To develop a database for a particular nanomaterial, it is useful to have a large lot of material from the same manufactured batch, so that all disciplines contributing to the database are using the same material in their studies. Use of standard analytic methods is needed to generate high quality data and to determine the sensitivity of the experimentally measured values to changes in the material’s environment. Similarly, modeling experiments should use a representative range of nanomaterial structures and validated, documented, and predictive models that can produce reliable results. The material repositories and registries, and the validated experimental protocols discussed previously have their analogs in modeling, but models have the advantage in that they can be more easily shared. The scenario below describes means of improving the quality and reliability of data derived from structural, predictive, and risk models.
Developing reliable structure-property relationships and their underlying physical, chemical, and biologic mechanisms for nanomaterials requires joint, informed, experimental modeling activities across many disciplines. Although it is difficult to determine experimentally the mechanisms for the interactions of nanomaterials with their biologic environments using polydisperse and polymorphic materials, a modeling effort that applies representative structures and validated models may provide important insights.
1) Develop key structural descriptors that provide basic definition of nanomaterial dimensions (and their dispersities), compositions, and surface chemistries. Although it may be possible to define only small nanoparticles precisely, descriptions for the purposes of informatics efforts must be defined as thoroughly as possible.
2) For selected existing applications, develop, archive, organize, curate, and validate molecular models for ENMs, including their surface coatings (and possibly weathered or transformed coatings). One existing pilot project is the Collaboratory for Structural Nanobiology ( CSN 2011), which is modeled on the Worldwide Protein Data Bank (wwPDB 2011), and contains models for several classes of nanomaterials, viewing applications, model-building tools, and a wiki environment to facilitate collaborative activities.
3) Develop structural models and validate them by using blinded data on newly determined ENM structures. Compare the model’s functionality with that of similar structures, and capture comment on the model’s functionality and accuracy to develop requirements for an improved model.
4) Establish a user group (that represents all stakeholders) to draft a governance document for the structural model portal, taking advantage of the history and current governance of the wwPDB, especially regarding data rights, security, and the formation of a scientific advisory group for the next-generation portal.
5) Begin attempts to formulate classification schemes based on ENM “types,” structural motifs, or correlations between molecular models and descriptors used in formulating QSARs for ENMs. This approach extends model development and validation efforts to predictive models, risk models, and functional models for biologic environments. A repository for similar models establishes a new means of collaborating on model development and of accelerating model development while ensuring that credit is assigned for researchers involved in developing and improving the models.
6) Use an initial pilot to archive, organize, curate, validate, and share predictive and probabilistic models and to set user requirements for a collaborative environment for developing and validating predictive models and submodels; organelle, cell, tissue, organ, system, organism, and ecosystem models; and probabilistic and risk models and submodels—with their associated files, runtime parameters, and test suites.
7) Collect the data taken to substantiate that the ruggedness and robustness of the models are adequate (this involves quantification of the sensitivity of the models to variation in any model parameters, materials, or conditions).
8) Establish a user group and a governance document for the portal for predictive model validation, especially with regard to data rights, security, and formation of a scientific advisory group.
9) Use the collaborative environment to create a readily accessible portal to aggregate and organize information on experience with the models tested to rapidly update the models, and to provide a virtual helpdesk for use of the models.
10) Once a model is adequately validated and has a large enough user group, port a copy of the model to a facility, like NanoHUB (NanoHUB.org 2011), to place an optimized version of the model, sample run-time parameters, and associated files and test suites for use in the larger user community. If computational models are stored in a repository with all the applications, programs, data files, scripts, and run-time parameters needed to replicate the original results, a common means of collaborating on model development is created, and the timescales to establish effective collaborations are reduced from years (in print media) to days.
Models of biologic systems— including organelle, cell, tissue, and organ— are used for many applications but are usually developed and validated in isolated efforts with little public record of either the process or the reference data. Such models could benefit from efforts to accelerate their development, validation, and reuse. Increased collaboration in QSAR and quantitative structure-property relationship (QSPR) development would provide a more complete pic-
ture of their accuracy and the correlation of their results with genomic, proteomic, and metabolomic data. Finally, there is a possible benefit in providing tools for collaborative code development for risk modeling and management. Although regulatory agencies have highly developed methods and processes for their own needs that are referenced to specific data streams, cross-fertilization may provide more rapid incorporation of novel computational methods, submodels, and techniques (Haimes 2009; IOM 2011).
A Scenario for Nomenclature and Terminology
Development of functional and adaptable nomenclature and terminologies for ENMs is critical for all informatics efforts. Adaptability is crucial because the precision with which ENM structure and composition are known is continuously improving. There have been many attempts to develop namespace terminologies by various organizations, such as SDOs. In many cases, the high-level concept definitions are inconsistent because the experts developing the terminology defined terms relevant to specific sub-disciplines, applications, or interest groups. As a result, concept definitions for a given term may differ substantially within a single SDO because sufficient resources are rarely available to achieve consistency or to produce properly framed definitions. Developing relationships among terms through simple taxonomies or ontologies also introduces a high degree of variability because different namespaces will set priorities for those relationships differently. That disparity has led to the development of mapping tools to aid in generating consistent mapping among terminologies, taxonomies, and ontologies within and among namespaces.
This scenario outlines steps that may be taken to accelerate the use of ontologies in achieving Semantic Web implementation for nanotechnology. It draws on the existing body of ontologies and expertise in semantic-tool development to provide a common infrastructure for semantic search to allow interoperable searching among databases curated by different scientific disciplines that return only what is specifically requested. The approach could promote common terminology but recognizes that different disciplines may use substantially different definitions for a given term or may impart nuance to such differences in ways that should be captured. That is, a semantic search capability recognizes that dictionaries may have a number of definitions for a given term, each relevant for a different namespace, and allows retrieval of all data relevant to the namespaces being searched. As in the previous scenarios, the relevance of this capability to implementation lies in the ability to query a federated system of databases, permitting transparent access into each discipline’s data while using the terminology of the requester’s discipline.
1) Solidify and continuously advance the precision of structural descriptors (for example, composition, size, shape, and surface coating) for ENMs to establish a basis for a functional nomenclature for these materials.
2) Tap existing pilot efforts in nanotechnology nomenclature, concept definitions, vocabulary, metadata, and ontologies to coalesce a Semantic Web (W3C 2011a) community of interest among stakeholders in nanoinformatics that spans several distinct disciplines. Examples include the NanoParticle Ontology for nanomedicine, the Annotation Ontology for image and document annotation, and the Gene Ontology. Each represents a specific discipline with a common terminology (“namespace”) whose terminologies overlap in nanomedicine.
3) Form a core ontology to describe ENMs and share environmental, animal, and human nanotoxicology concepts among the different namespaces to construct a mapping of synonymous terms. The metadata should consistently describe data and models in a particular namespace.
4) For common terms having definitions that differ substantially, define the differences in underlying concepts by determining the additional concepts, attributes, or relationships that are embodied in different definitions to develop a mapping among the namespace terms, including the conceptual differences or nuances as modifiers. At this level, an ontology-user community has been defined that can begin to set user requirements for the larger nanoscience and nanotechnology community.
5) Expand the mapping among ontologies by including other ontologies, such as the Open Biological and Biomedical Ontology Foundry.
6) Develop mapping tools in conjunction with other mapping efforts. For example, a pilot project to develop an ontology crawler to enlarge and update mappings automatically may be undertaken with commercial, academic, and government participation.
7) Begin to develop a high-level ontology or taxonomy for the sciences that calls for participation by experts in major scientific disciplines that are integral to nanotechnology, such as physics, chemistry, material science, biology, and medicine.
8) Map the nanotechnology ontology to other scientific ontologies and develop standards for mapping based on the combined user-group experience.
9) Extend the mapping to metadata synonyms expressed in different natural languages, using existing or emerging ontologies for nanotechnology. Include mapping among terms having conceptual differences in terminology in different natural languages.
A Scenario for a Data-Sharing Infrastructure for Nanotechnology
The previous scenarios illustrated activities related to implementing required aspects of an informatics infrastructure: information content related to data quality and reliability, model quality and reliability, and the use of nomenclature and terminology for sharing, curating, and annotating information among disciplines. This scenario illustrates aspects of implementation of the informatics infrastructure itself and its capability to support collaboration.
1) Identify and assess existing pilot databases and knowledge bases that have been independently established to share data among specific sectors or by specific institutions.
2) Develop and adopt freely available software to federate databases to provide resources for the entire user community. Although a single, central database might conceivably satisfy all user requirements for sharing nanotechnology data across all stakeholder agencies and institutions, establishing a centralized, monolithic system is rarely possible, since each agency and entity must support its own particular mission and requirements. Such an approach would have difficulty in accommodating the heterogeneity of current database structures, security requirements, semantics and applications while respecting agency autonomy, and would have difficulty in providing uniform, expert data curation. An example of a possible federating system is the Cancer Biomedical Informatics Grid (caBIG®) sponsored by the NCI, supervised by the NCI Center for Bio-informatics and Information Technology (NCI-CBIIT), and used by caBIG’s Nano Working Group. Others are NIH’s National Center for Research Resources (NCRR) Biomedical Informatics Research Network (BIRN) and the EU-funded INFOBIOMED project on medical and biologic data interoperability and management.
3) Develop the infrastructure openly, leveraging available tools already in use where possible. At any stage, demonstrations or trials of an existing capability can be used to elicit needs and requirements from the relevant user community to inform the next iteration of updates and to improve software implementation. Examples of open scientific informatics software and tool development are Linked Open Data initiatives (W3C 2011b) which include measures of data quality and the OpenScience project (The OpenScience Project 2011).
4) Establish governance of the development process so that it is directly under the control of the stakeholder and user communities. Issues of data rights, intellectual property, and academic credit should be investigated early in the process to ensure satisfaction of user needs and requirements as part of the iterative cycle.
5) Develop, evaluate, and implement relevant technologies that enable the incorporation of new social and institutional mechanisms of interaction among users. One example is Google Wave, a digital application for rapid video development, annotation, and modification. Although Google has recently withdrawn Google Wave from the market, the tools for video annotation and update are still available for use (Google Wave 2010). Other tools, such as wikis, are now used in a number of scientific applications and are freely available through commercial vendors and by such applications as the CSN. Scientific web sites routinely allow feeds and blogs, remote participation in experiments at user facilities are becoming common, and sites for collaborative software development reflect standard practice. Cloud computing also has rapidly emerged as a new freely available tool for management of data archives and computational resources.
6) Where possible, prototyping activities should incorporate expert users throughout the data life cycle (from data generation to data-mining) to evaluate
possible new system capabilities, tools, or applications and to estimate resources required for the incorporation of those capabilities into the infrastructure.
REFERENCES
CSN. 2011. Nanocollaboratory. Collaboratory for Structural Nanobiology [online]. Available: http://nanobiology.utalca.cl or http://nanobiology.ncifcrf.gov [accessed Dec. 14, 2011].
Google Wave. 2010. The Google Wave Blog: News & Updates from the Google Wave Team [online]. Available: http://googlewave.blogspot.com/ [accessed July 8, 2011].
Haimes, Y.Y. 2009. P. xii in Risk Modeling, Assessment and Management, 3rd Ed. Hoboken, NJ: John Wiley & Sons.
IANH (International Alliance for NanoEHS Harmonization). 2011. International Alliance for NanoEHS Harmonization [online]. Available: http://www.nanoehsalliance.org/ [accessed Nov. 9, 2011].
InterNano. 2011. InterNano. Resources for Manufacturing. Nanoinformatics 2020 Roadmap [online]. Available: http://eprints.internano.org/607/ [accessed Nov. 9, 2011].
IOM (Institute of Medicine). 2011. Building a Framework for the Establishment of Regulatory Science for Drug Development, Y. Lebovitz, A.R. English, and A.B. Clairborne, eds. Washington, DC: National Academies Press.
JoVE. 2011. Journal of Visualized Experiments [online]. Available: http://www.jove.com/About.php?sectionid=0 [accessed July 8, 2011].
NanoHUB.org. 2011. NanoHUB [online]. Available: http://nanohub.org/ [accessed Dec. 14, 2011].
Nanoinformatics. 2011a. Nanoinformatics: NanoinformaticsCommunity [online]. Available: http://www.internano.org/nanoinformatics/index.php/Nanoinformatics:NanoinformaticsCommunity [accessed Nov. 8, 2011].
Nanoinformatics. 2011b. Nanoinformatics 2011 Meeting Program, December 7-9, 2011, Arlington, VA [online]. Available: http://nanotechinformatics.org/program [accessed Nov. 8, 2011].
The OpenScience Project. 2011. The OpenScience Project [online]. Available: http://www.openscience.org/blog/?p=269 [accessed Nov. 8, 2011].
W3C (World Wild Web Consortium). 2011a. Semantic Web [online]. Available: http://www.w3.org/standards/semanticweb/ [accessed July 8, 2011].
W3C (World Wild Web Consortium). 2011b. LinkingOpenDate [online]. Available: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData [accessed Nov. 8, 2011].
Wang, C.Y., W.E. Fu, H.L. Lin, and G.S. Peng. 2007. Preliminary study on nanoparticle sizes under the APEC technology cooperative framework. Mass Sci Technol. 18:487-495.
wwPDB. 2011. Worldwide wwPDB Protein Data Bank [online]. Available: http://www.wwpdb.org/ [accessed Dec. 14, 2011].
This page intentionally left blank.