WERA1015: Developing the US National Virtual Herbarium
(Multistate Research Coordinating Committee and Information Exchange Group)
WERA1015: Developing the US National Virtual Herbarium
Duration: 10/01/2014 to 09/30/2019
Statement of Issues and Justification
The idea of digitizing all the nation's herbarium collections and making them web accessible was first endorsed in 2004 at a meeting for invited participants. Although 2020 was suggested as a target date, there was no discussion of how the goal might be achieved. The US Virtual Herbarium (USVH) project was initiated in 2008 when botanists attending an open meeting endorsed the goal and the formation of a multistate coordinating committee to help achieve it. Some herbaria had web-accessible databases but, with a few exceptions, these were individual efforts. Potential users of the information had to go to multiple sites, each using a different design, to find information. The goal of US Virtual Herbarium project was identified as enabling access via a single site to all specimen information for all the specimens in all the nations herbaria. The primary mechanism identified for accomplishing this goal was to work with regional herbarium networks in disseminating knowledge and promoting collaboration. The nature of the US Virtual Herbarium, whether a single huge database, a federation of databases, or some other architecture, was not specified.
The landscape has changed since 2008. In 2010 the National Science Foundation established a program for development of a national hub for aggregating data from non-federal biological collections, iDigBio, and for funding digitization of specimens that would enable addressing major scientific questions. This funding, together with support from other sources, both public and private, and ongoing commitment of those working in herbaria, has led to the development of improved software, several regional networks and four organismal networks. It has also increased the number of herbaria in which digitization, the process of making records web-accessible, is taking place. iDigBio also organizes workshops and makes information about collection digitization available online. During this same period, the National Biological Information Infrastructure program, which was an active partner of the USVH project in 2008, was terminated.
The USVH project has assisted in developing the existing resources by helping those in charge of herbaria become aware of ongoing developments and encouraging them to contact the network managers best placed to assist them. The project's annual meeting draws 70-100 people and is appreciated as a forum for discussing digitization issues facing herbaria. Nevertheless, about 60% of the country's herbaria have not yet started digitizing their collections, large parts of the country are not in a regional network (Barkworth et al. 2013), and there is one (but only one) organismal group (microalgae) for which no network is in development. Moreover, users still need to visit multiple sites, but fewer than before, to obtain all available data and coverage is very uneven, both geographically and taxonomically. Equally importantly, few collectors provide their specimen records in a manner that eliminates the need for retyping each record when their specimens are incorporated into a herbarium. In addition, relatively few undergraduate students are being exposed to the possibilities opening up in the realm of biodiversity informatics. Thus, there is still a need for a project that has as its goals:
- Incorporating digitization [= capturing label data, georeferencing the collection site, imaging appropriate specimens, and posting the resulting files to the web in a form that enables interoperability] into all US herbaria;
- Enabling integrated access to specimen information for all specimens in all US herbaria, regardless of ownership;
- Developing and promoting resources that will encourage collectors to maximize the value of their collections by assisting them in providing high quality data in a format that permits easy ingestion into the databases of recipient herbaria;
- Expanding use of the information being made available by educators and professionals.
Stakeholders in the US Virtual Herbarium project can conveniently be divided into a) those that use, or could use, the resources USVH would make available; b) those responsible for developing and maintaining herbaria (for brevity, we call these individuals Collection Managers (CMs)); and c) the institutions that own herbaria. The first category, data users is diverse. It includes biological consultants, research scientists, educators, students, and members of the general public. They use herbarium information for development of preliminary checklists for a region; preliminary identification; determination of the environmental factors limiting a species distribution; planning field work; verifying reports of occurrences that seem questionable; exploring changes in phenology; locating specimens for use in chemical analyses; tracking the spread of invasive species; and examining the association between morphological characteristics and plant distributions. Digitizing specimens makes such studies not just possible but feasible. Usage data for existing networks demonstrate that they are already well used, e.g., SEINet (http://swbiodiversity.org received almost 262,000 visits between Nov 25, 2012 and Nov 24, 2013, the average time on site being over 5 minutes, and the average number of pages per visit being over 4.5; most of the visitors were under 45 (data from Google Analytics®).
At larger herbaria, the CM is a full time position but, for many, managing the herbarium is added to other responsibilities, such as teaching, research, outreach, and organization of displays, responsibilities that may be more important to their employer than the herbarium. Digitization adds to their workload but it also it easier for CMs to demonstrate the unique aspects of the collection while encouraging more widespread use if the information it holds. It also exposes the collection to experts who can identify specimens meriting closer examination. Another benefit comes when an expert has reviewed the specimens in a herbarium. If the herbarium is part of a network, changes in distribution or identification made can be made immediately available to others and herbaria with duplicates can be advised of the changes. If a herbarium is not part of a network, such changes only become known when they are published in a scholarly paper or when someone visits the herbarium concerned.
Joining a network also increases digitization efficiency by enabling importing data from duplicate specimens in other herbaria, georeferencing sites across multiple collections, and the use of off-site individuals to assist in data capture and georeferencing. In the process, all involved become more aware of the standards that need to be followed to maximize the value of a specimen and of tools that enable them to minimize errors. CMs also benefit from the ability of a herbarium network to distribute the work required for digitization. There are several crowd-sourcing tools that enable volunteers to assist in capturing data from specimen images but these tend to work on a collection basis. Such tools would be more efficient if they enabled working across all collections and allowed used of duplicate discovery and batch georeferencing. All these benefits would be increased if there were a single herbarium network for the US.
Participating in a network also exposes those who work in herbaria (many of whom are students) to what is needed to maximize the value of their collections and attracts a generation that is accustomed to working interactively. These are advantages we need to build on if we are to engage more people in the study of plants, fungi, and algae and their interaction with other components of the biosphere. It can also be used to help expand the pool of biodiversity informaticists in the US.
The benefit of digitization to institutions owning herbaria is that it makes the herbarium globally visible. It enables them to demonstrate how their holdings contribute to the nation's knowledge infrastructure. No herbarium, not even the largest, can provide the wealth of information that will come from providing integrated access to the holdings of all the nation's herbaria. Some herbaria have holdings from around the world, others only from their own backyard. All enhance our understanding of the past and present distribution of the organisms housed in herbaria.
In the absence of a national herbarium network, many analyses have to be conducted using less than optimal data and others cannot be completed. There will be parts of the country for which there is little information available and institutions in which students have minimal exposure to the demands of providing high quality data in this digital age and to the kinds of analyses that are possible once the data are available. The best US students will be attracted to countries that provide more data to answer big questions and provide greater encouragement for the exploration of biodiversity data.
Specimens. Protocols for capturing collection data and georeferencing collection sites for herbarium specimens exists and are being used in multiple herbaria. Methods of increasing the efficiency of the existing protocols and for enabling crowd sourcing continue to be developed but the impediments to completing these tasks are funding and training, not technical. Protocols for imaging most groups have also been developed. It is simple for standard specimens of vascular plants and macroalgae because they are preserved as two-dimensional samples attached, together with a label, to standard sized paper sheets. Imaging specimens of lichens, bryophytes, and fungi is more difficult because they are usually stored in packets and are often 3- dimensional. Protocols are still needed for specimens stored on microscope slides, e.g., microfungi, microalgae, and pollen grains. The difficulty is efficiently recording the location of the imaged specimen on the slide. There are also no protocols for imaging plant fossils and wet collections but procedures developed for zoological specimens can be adapted for use with in herbaria.
Networks. There are multiple networks that present information from herbarium specimens. Four are national networks, one each for lichens, bryophytes, fungi, and macroalgae. The other networks are based on regions, some covering a multistate region, others a single state. Some regional networks present information only about specimens from their region; others present information from specimens collected anywhere. Some are explicitly for vascular plants; others include all organisms stored in the region's herbaria. This means that many herbaria have to provide their records to multiple networks, a less than optimal situation. Such issues can be resolved with appropriate programming. It would be far more efficient if there were a single network for all organisms housed in herbaria. There are currently no networks storing information about fossil plants or microalgae but creating them is a financial, not a technical, challenge.
USVH, a National Herbarium Network. A de facto national herbarium network would provide real time access to all records currently stored in the multiple networks and provide expert IT support to all who need it. It can be envisioned as enabling queries between the networks, a relatively simple technical task. It would take advantage of the features provided by existing networks and support their continued development. Very large herbaria could connect to it directly, thereby enabling them to provide records to a single network rather than multiple networks, as is currently necessary. The national herbarium network could also incorporate records from all herbaria, regardless of ownership.
A de facto national virtual herbarium will make it easier to address a major concern of many landowners and conservationists, that of restricting access to locality information for species of concern and records of occurrences on private lands whose owners support conservation but do not want to provide open access to information about their propertys biodiversity. Herbarium networks record but restrict access to such information. Data aggregators, such as iDigBio, BISON, and GBIF, do not restrict information access. Thus a national herbarium network would serve as a useful intermediate, one that restricts the information made available to aggregators but allows authorized users access to complete information.
A national network would add value to tools developed by different networks. For instance, the Pacific Northwest (PNW) Network enables searching for species within a polygon, rather than a circle. At present, it can only access data in the PNW network; its value would be increased if it could access data from all the networks.
One problem is that the National Science Foundation (NSF) has funded the iDigBio program at the University of Florida to aggregate and provide access to collections data from the non- federal biological collections, including herbaria. iDigBio's first obligation is to the NSF funded networks. At present iDigBio provides access to 3.1 million records. The existing herbarium networks already house over 5 million records, many from herbaria that are not part of an NSF funded network and some of which are federal herbaria.
The advantages of a multistate effort
Development of a US Virtual Herbarium is inherently a multistate task. Each state has multiple herbaria and the challenges they face in making their specimens internet accessible are similar. Most contain specimens from outside their own state. In some instances, the first record of a species has been found in an out-of-state herbarium. At the same time, it is essential to work with the individuals within each herbarium; for this the most effective approach is to develop human networks of individuals with overlapping interests such as those represented in the organismal and regional networks. In its second five years, the US Virtual Herbarium initiative will continue to function primarily as a promoter of discussion, dissemination, and collaboration among herbaria, emphasizing aspects not addressed or not emphasized by iDigBio.
Encourage sharing of records and images among herbarium networks.
Create free, online instructional modules concerning the concepts and processes required to digitize specimens for integration in a network. At present, almost all instruction is individual and focused primarily on training personnel in performing particular tasks. The modules will increase the benefits of being engaged in digitization while reducing the time demands on CMs by providing the conceptual background and instructions on different tasks. These resources will be open access (CC-A).
Create materials and data to support requests for maintenance and development of herbarium networks. There are costs associated with the ongoing maintenance of any infrastructure, whether physical, such as highways, or knowledge, such as herbarium networks. Currently, much of this support comes from grant proposals but the cost of preparing such proposals is significant. The materials developed will be designed so that they can be modified for appealing to different groups, both public and private. Government agencies and private companies must be persuaded that it is in their best interests to provide ongoing support for basic maintenance of herbarium networks that their personnel.
Disseminate information about how digitization and its products can be integrated into research, teaching, and outreach.
Promote development of tools and resources that will make collection information more valuable.
Obtain and share data concerning progress towards the project's objectives.
Procedures and Activities
1. Continue surveying US herbaria for progress in digitization. Results of previous surveys (http://www.wiu.edu/USvirtualherbarium/files/Download/2012Survey. pdf) have been helpful in identifying areas and regions needing attention. Future surveys need to ask for information about use of the information being developed. (AL, AR, CA, CO, CT, DE, IA, ID, IL, IN, KS, LA, MI, MN, NC, NH, NJ, NM, OH, OK, OR, RI, SD, TX, UT, WA, WI, WV)
2. Develop free, online instructional modules concerning the concepts and processes required to digitize specimens for integration in a network. At present, almost all instruction is individual and focused primarily on training personnel in performing particular tasks. The modules will increase the benefits of being engaged in digitization while reducing the time demands on CMs by providing the conceptual background and instructions on different tasks. They will also provide an accessible introduction for students and others interested in exploring the potential of biodiversity informatics. (AL, AR, CA, CO, DE, IA, ID, IL, IN, KS, LA, MI, MN, NC, NH, NJ, OH, OK, OR, RI, SD, TX, UT, WA, WI, WV)
3. Develop materials for seeking support for maintenance and development of herbarium networks. There are costs associated with the ongoing maintenance of any infrastructure, whether physical, such as highways, or knowledge, such as herbarium networks. Government agencies and private companies must be persuaded that it is in their best interests to provide ongoing support for basic maintenance of herbarium networks that their personnel use, thereby freeing network and herbarium personnel to focus on making more information available. Currently, much of the support comes from grant proposals but the temporal and fiscal cost of preparing such proposals is significant. The materials developed will be designed so that they can be modified for appealing to different groups, both public and private. (CA, CO, DE, IA, ID, IL, IN, KS, LA, MI, MN, NH, NJ, NM, OH, OR, RI, SD, TX, UT, WA, WI, WV)
4. Promote integration of digitization activities into the activities of all herbaria, including digitization of new specimens. At present fewer than 40% of herbaria are engaged in the usual first step, data capture, and only about 30% provide their records to a regional network. Furthermore, some of the data capture is into spreadsheets or databases that lack tools for ensuring data quality and some herbaria rely on an outside institution for digitization. This last is not a long term solution, nor does it enhance the capabilities of the US workforce. Unfortunately, many CMs have so many obligations that adding digitization to them is unreasonable. The project will seek ways of reducing the costs of digitization so that it is more valuable, in terms of the institutions priorities, to digitize a collection than not. (CA, CO, DE, IA, ID, IL, IN, KS, LA, MI, MN, NH, NJ, OH, OR, RI, SD, TX, UT, WA, WI, WV)
5. Promote and publicize development of tools and resources that will make collection information more valuable. The project will encourage sharing of ideas, initiatives, and lesson plans that use the information being provided to address research questions, improve and increase educational opportunities, and engage more members of the public with plants, fungi, and algae. These will include resources designed for use by researchers, educators, and the general public. (CA, CO, DE, IA, ID, IL, IN, KS, LA, MI, MN, MO, NC, NH, NJ, NM, OH, OR, RI, SD, TX, UT, WA, WI, WV)
6. Encourage data sharing information among herbarium networks while promoting their continued development. (AL, AR, CA, CO, DE, IA, ID, IL, IN, KS, LA, MI, MN, MO, NC, NH, NJ, NM, OH, RI, SD, TX, UT, WA, WI, WV)
Expected Outcomes and Impacts
- Change the way in which people search for information about plants, fungi, and algae in the US
- Reduce the inequalities now experienced by students at institutions with different sized herbaria.
- Introduce the next generation of botanists, mycologists, and phycologists to the new ways of thinking about collections and collection information.
- Provide an additional impetus to encourage US students to learn how to work with big data and, in the process, enable them to address new questions.
- Reduce the amount of time Collection Managers need to spend on answering questions about plant distributions and flowering times and handling loans by increasing access to such information.
Projected ParticipationView Appendix E: Participation
1. Create free, open access resources about the concepts and principles underlying the protocols for digitizing the organisms found in herbaria, including making the information discoverable on the web. Offer certificate courses based on the contents of these resources.
2. Promote development of a system for use by collectors in recording information and links to images that would encourage recording of more complete information (including images) at the time of collection and enable more efficient ingestion of information into the databases of recipient herbaria.
3. Share educational modules that enhance or integrate use of information derived from the existing networks.
4. Promote knowledge of the project at meetings attended by mycologists, phycologists, and paleobotanists as well as meetings attended primarily by vascular plant botanists.
The US Virtual Herbarium project is run by an executive committee. It currently includes 6 people, two co-chairs, a secretary, web master, IT rep, and an at-large representative. All current members were appointed. Three of the individuals, the two co-Chairs and the at-large member, have been members of the executive since the start of the project, a period of 5.5 years (including the preliminary year).
We shall initiate a more open committee structure this spring. To make this possible, we shall declare that the maximum term for any member of the executive committee is five years. There will not be a limit on the number of terms a member may serve. We shall hold an election in the spring for two positions, Chair and at-large member. This will reduce the size of the executive committee to five. Newly elected individuals will serve for only three years; this will become the standard term for members of the executive.
At the 2014 annual meeting, we shall present an outline of the changes and a schedule for the terms of each position that will ensure a balance between continuity and change in the projects leadership. We shall also seek volunteers to form committees to address critical areas, e.g., development of online certificate courses; development of fund-raising resources and training; creation and sharing of educational resources.
Barkworth, M.E., E.A.Dean, B. Legler, M. Mayfield, Z. Murrill, and Eric Ribbens. Progress in Digitizing US Herbaria, 2011-2012. http://www.wiu.edu/USvirtualherbarium/files/Download/2012Survey.pdf