Previous sections of this report outline the vision of distributed geolibraries, discuss the problems and issues related to their social and institutional context and define their services and functions. This chapter addresses the process of building distributed geolibraries, the steps that will need to be taken to implement the vision, and related issues. It is impossible to be precise, of course, because of uncertainties surrounding future technologies, because the outcomes of research are in principle impossible to anticipate, and because many issues can only be resolved by constructing and working with prototypes. Given these constraints, this report attempts to address a number of key questions and to find answers where possible:
At a higher level one might ask how it is possible to know the answers to these questions. Complex software systems and new institutions arise through an iterative process in which the end result may not be apparent until the process has been under way for some time. Creating a vision is part of that process, but the vision may be wrong or unachievable. Large-scale prototypes are sometimes built in part because it is difficult or impossible to know what is possible without such large-scale experimentation. Without building a distributed geolibrary prototype, it may not be possible to identify exactly what it will do successfully and what it will not do. It may be difficult to know at an early stage how much a distributed geolibrary will cost or whether its costs will be exceeded by its benefits.
The Panel' vision of distributed geolibraries views them as a primary distribution mechanism for getting geospatial data and geographic knowledge resources into the hands of all stakeholders. Traditionally, the primary source of geospatial data in the United States, as in many other countries, has been the national mapping agency. Dissemination has been predominantly a one-to-many operation, as a single source provided information to a distributed user base. The vision of the National Spatial Data Infrastructure (NSDI) is very different and reflects an increasing degree of empowerment of individuals and agencies as significant producers of geospatial data. This vision is many-to-many, replacing a single source with a much more complex array. It is also complicated by the fact that the user/producer distinction is no longer as clear. Many users of geospatial data add value and become producers, and many users serve their own networks of clients. Many users of geospatial data are producers of geographic knowledge, which they may want to publish or make available through the mechanism of distributed geolibraries.
The many-to-many paradigm is familiar to librarians, who have traditionally acted as brokers between the publishers and the users of information. Thus, the paradigm shift that is occurring in geospatial data dissemination, in part through a process of technological empowerment, provides a strong reason to look to the library as a metaphor for new dissemination models and suggests that the library is a good place to look for models of distributed geolibraries and for solutions to problems and issues that may arise in building them. On the other hand, the timescale of library operations has been far slower than is normal with digital data dissemination. It may take years for information to pass fully through the complex process of publication and cataloging until it is finally available to the traditional library user. Users of the WWW are accustomed to delays on the order of minutes not years. Thus the library model will be useful only if its customary timescales can be compressed by many orders of magnitude.
The following sections address the needs of distributed geolibraries in terms of standards and protocols, data sets, georeferencing, cataloging, visualizations, and knowledge creation. Later sections discuss research needs and institutional arrangements. The final section of the chapter discusses the measurement and assessment of progress in building distributed geolibraries.
Geospatial applications are already supported by a large number of standards and protocols, and many more are in various stages of development. The set of particular relevance to distributed geolibraries includes:
Other standards of relevance to distributed geolibraries include those under discussion on intellectual property rights in digital data, standards of geospatial data quality, definitions of geographic feature types, and general mapping standards. They are being developed through a multitude of standards organizations, including, for example, the ISO, the American National Standards Institute (ANSI), the FGDC, and the International Cartographic Association.
The Internet and the WWW are built on a series of standards and protocols that have been widely accepted not because of any compulsion or mandate but because they clearly work and enable interesting applications. They include TCP/IP and HTTP. In the coming years it is likely that these standards will be extended repeatedly, and it appears that the architecture of the Next-Generation Internet will be significantly enhanced. Although none of these developments have been driven or are likely to be driven by the special needs of distributed geolibraries, as in the past we can expect them to be exploited in whatever ways are interesting, valuable, and appropriate.
|
Finding 11 New technological initiatives such as the Next-Generation Internet and Internet II are likely to provide extensions to Internet and WWW protocols and orders of magnitude increases in bandwidth. Many of these developments are expected to be relevant to distributed geolibraries. |
Libraries assist their users in many ways; some of the most important are the mechanisms of abstraction employed to help users find relevant information. The process of cataloging is assisted by a number of data sets known as authorities that provide essential indices and lists.
In distributed geolibraries an essential authority is the gazetteer. A distributed geolibrary's gazetteer will differ in several key respects from the traditional version found in the back pages of atlases:
footprint of a city name may vary depending on context and usage, the official footprint is most often defined by the city limits. Users of distributed geolibraries will want to be able to search based on place names that are not officially recognized but nevertheless in common usage, such as "downtown."
|
Finding 12 A comprehensive gazetteer, linking named places and geographic locations, would be an essential component of a distributed geolibrary. A national gazetteer would be a valuable addition to the framework data sets of the NSDI. These framework data sets are being coordinated by the FGDC, which also has the responsibility for associated standards and protocols. Production and maintenance of the national gazetteer could be through the National Mapping Division of the U.S. Geological Survey (USGS) in collaboration with other agencies and could be an extension of the USGS's Geographic Names Information System. |
Another type of authority used by libraries is the thesaurus. In the geoinformation case, various kinds of authorities would be useful: lists of standard feature types, standard data themes, standard attribute definitions. For example, it would be useful if the meaning of vegetation and associated terms could be standardized, and much effort by the FGDC has been devoted over the past few years toward this end. In a world in which everyone can be a data producer, it is no longer possible to rely solely on the federal government to define essential mapping terms.
At the same time it is important that distributed geolibraries reflect the contemporary social norms of their users. The very term authority suggests a command-and-control philosophy that may be orthogonal to the prevailing culture of the Internet and the WWW, which is dominated by individual empowerment and voluntary consensus. An authority for a distributed geolibrary is clearly something different from a traditional library authority, and digital technology must be used to serve different ends. Instead of a single authority created by a central agency and enforced top-down on the community through regulation, mandate, or incentive, digital
technology should be used to support translation and interoperability between a variety of different meanings and interpretations in a bottom-up process that accommodates diverse communities and groups and their associated terminologies. If the term downtown means something different to user A than to user B, distributed geolibraries should use the power of digital technology to make the two meanings interoperable, rather than to support the imposition of a single interpretation on all users.
The system of latitude and longitude has been subject to international standards since the late nineteenth century. However, the definitions of latitude and elevation are dependent on the mathematical function used to approximate the shape of the Earth, and many such functions are in use. Thus, latitude is not fully interoperable, and two points near each other on the Earth and measured from opposite sides of certain international boundaries do not converge perfectly. Additional complications occur in the use of other world coordinate systems, such as UTM (Universal Transverse Mercator coordinate system) and between the U.S. State Plane coordinate systems. If distributed geolibraries are to be useful to people who do not understand the complexities of geodetic datums and cartographic projections, it will be necessary for systems to be developed that are capable of hiding such details or making them fully transparent to the user. Thus, a user ought to be able to access data sets in different projections and based on different datums and expect the system to handle the differences automatically. Such transparency is not yet available in standard geospatial software products and data sets, and its feasibility has not been demonstrated.
Other general ways of referencing the surface of the Earth are gaining popularity because of interest in global environmental change and other processes that operate at the global level. These include standard hierarchical grids such as QTM (Dutton, 1984) and the sampling grids used by the EMAP program (White et al., 1992).
Such hierarchical systems may be important internally as indexing schemes for distributed geolibraries (Goodchild and Yang, 1992).
Reference was made earlier to the need to compress the traditional timescales of the library world. Nowhere is this more important than in cataloging, which serves the critical function of abstracting the information users need to find, examine, assess, and retrieve data. In effect, metadata are the key to the many-to-many structure that allows many users to search across many potential suppliers, and its timely creation will be crucial if distributed geolibraries are to function. Unfortunately, the process of metadata creation for digital geospatial data can be as lengthy and labor intensive as its traditional equivalent. The task of creating a full metadata record for a geospatial data set using the FGDC metadata standard can be much greater than the task of cataloging a simple book. The geospatial data community appears to have accepted the notion that metadata creation is largely the responsibility of the producer, whereas the prevailing notion in the library community is that cataloging is the responsibility of the librarian. This reflects a distinct difference in philosophy, since the library practice is based on the notion that the librarian may be more skilled in abstracting information on behalf of the user than is the producer of the information.
If time is of the essence in the digital world of the Internet, it makes good sense to try to replace the labor-intensive cataloging process with automated methods. The Internet world's solution to this problem has been the WWW search service, exemplified by AltaVista, Yahoo, and Excite. To be successful, a search service designed to help the user of distributed geolibraries find geospatial data and geographic knowledge would have to place heaviest emphasis on the determination of an information object's geographic footprint, either by detecting or inferring coordinates or by identifying an appropriate place name, to be converted to coordinates using a gazetteer. Such tools would perform the functions of abstracting and metadata creation automatically. Such
automated discovery, indexing, and abstracting tools do not yet exist and will require extensive research and development. Three models that provide alternatives to the search service are described in Chapter 4. They are technically much simpler, but require practices that appear to be incompatible or only partially compatible with the culture of the Internet.
One of the most powerful advantages of the concept of distributed geolibraries is the ability for the user to interact with a representation of the surface of the Earth. Information about the Earth's surface is naturally conceptualized as belonging to the surface, and globes, which are actual scaled representations of the Earth, provide a familiar and easily understood information source. The notion of doing the same in the digital world, of presenting information as if it were actually located on the surface of the globe, is termed the Digital Earth metaphor, and lies behind the idea described earlier in Chapter 2.
Some types of geoinformation illustrate close approximations to actual appearance and can be rendered by draping onto a curved surface. These include optical imagery and false-color imagery, where colors are used to render information that corresponds to some other possibly invisible part of the spectrum.
Other information in distributed geolibraries is not rendered so easily. How, for example, would one portray economic information such as average household income using the Digital Earth metaphor? In some cases there may be clever ways of making visible what is normally invisible; in other cases it may be necessary to represent the presence of information using symbols that exploit some other metaphor, such as books or library shelves. This is a novel area with no obvious guideposts, and research will be needed to determine how best to make the user of distributed geolibraries aware of the existence of information and of its important characteristics. In particular, we know almost nothing about how to render dynamic geospatial data or how to indicate
availability, yet we anticipate that such data will be increasingly available to the users of distributed geolibraries.
Users of distributed geolibraries will need tools for analysis, modeling, simulation, decision making, and the creation of new geographic knowledge. An important component will be the workspace in which the user can process data using many of the functions found in today's GIS, along with other functions such as those described earlier in Chapter 4. Given the massive investment in GIS, the easiest way to achieve this will be through collaboration between the builders of distributed geolibraries and the developers and vendors of GIS software. Compatibility and interoperability between GIS products and distributed geolibraries will be needed. For example, the metadata used to discover, assess, and retrieve data should be processed and updated by the GIS as data are manipulated and used to create new data sets. Metadata should be generated automatically when new knowledge is created by analysis and modeling. Current software products are generally incapable of these functions, and much research remains to be done to make them generally available.
Many of the topics discussed in this report fall under the heading of ''things we do not yet know how to do." In some cases, such as the building of a distributed geolibrary itself, there may be no obviously missing piece of theory or understanding; rather, it may be that we have not yet tried and that given sufficient resources the necessary knowledge will be available. But other items require more focused research. Among them are the following:
|
Finding 13 The success of a distributed geolibrary will be largely dependent on the ability to integrate information available about a place. That ability is severely impeded today by differences in formats and standards, access mechanisms, and organizational structures. Removal of impediments to integration should become a high priority of government agencies that provide geospatial data. |
|
Finding 14 Significant research problems will have to be solved to enable the vision of distributed geolibraries. Research is needed on indexing, visualization, scaling, automated search and abstracting, and data conflation. Research on these issues targeted to improve access to integrated geoinformation might be pursued by the National Science Foundation and other agencies sponsoring basic science, as well as by the National Mapping Division of the USGS, and the National Imagery and Mapping Agency. |
Many mechanisms and programs already exist to move this research agenda forward. Examples include the following:
In addition to these formal mechanisms, significant research and development activities are under way in the private sector among vendors of GIS software and among defense and intelligence contractors that can be expected to push in the direction of distributed geolibraries over the next few years. For example, the vendors of new commercial space imagery could use systems like distributed geolibraries for the dissemination of their data products to the broad user community. The FGDC is also a potential source of research initiatives in this area, given its relevance to the future dissemination mechanisms of the NSDI.
Many of the research needs identified here are basic in nature, and it may be many years before solutions can be found. On the other hand some issues such as the need for better methods of data integration, are so widely recognized, technical in nature, and strongly motivated that significant progress can be expected in a comparatively short period.
Although elements of a distributed geolibrary already exist in the form of prototype clearinghouses and other projects, it is easy to lose sight of the broader concept and the degree to which it represents a radical departure from current and past practices as reflected in our institutions and their accepted functions. More specifically:
ment that has occurred as a result of the almost universal adoption of information technologies, especially geographic information technologies, over the past two decades has called them into question. Yet such institutions as the national mapping agencies still reflect this legacy. The vision of distributed geolibraries represents a broadly based restructuring of past institutional arrangements for the dissemination of geospatial data and one that is much more bottom-up, decentralized, and voluntary. The institutional arrangements of the WWW provide an excellent model.
|
Finding 15 While traditional production of geospatial data has been relatively centralized, the vision of distributed geolibraries represents a broadly based restructuring of past institutional arrangements for the dissemination of geospatial data and one that is much more bottom-up, decentralized, and voluntary. |
Some of these issues are specific to geoinformation and geospatial data, but others are generally applicable to the emerging information society, which is being driven by technological change and by the desire for greater access to information. Lopez and Larsgaard (1998) discuss this relationship between the needs of the geospatial data and the broader institutional setting of the evolving digital library. That relationship is complex, and it is clear that distributed geolibraries are part of a larger vision of the digital library of the future. But the central role they give to searches based on location makes them clearly distinct, as do the research problems identified in the previous section. The development of distributed geolibraries will require a unique set of partnerships between developers of information technologies, geographic information scientists, application domain specialists, and user communities. It is unlikely, therefore, that the vision of distributed geolibraries will be realized through broadly based efforts to research and develop digital libraries in general; instead, efforts are needed that are directed specifically at distributed geolibraries and geoinformation. Funding and coordination are needed to develop prototypes, stimulate basic research, and build partnerships that specifically address the vision of distributed geolibraries.
The workshop convened by the Mapping Science Committee (see preface) was designed to help identify a vision of distributed geolibraries and the steps needed to realize that vision. An important element of building distributed geolibraries is, therefore, the measurement of progress: how will we know how much progress has been made and how much remains to be done? In this section we offer some possible bases for measurement.
response to queries of that nature. Some of the sites listed in Appendix D can already respond to that type of query. A simple measure would be complicated by the various conditions under which information is available, such as cost, intellectual property restrictions, and quality.
In addition, progress toward the vision of distributed geolibraries could be measured through the volume of accumulated research results, the sophistication of prototypes, and the lessons learned from each.