- Facilitate public access to interoperable digital earth science data
- Reduce the cost of online data publication
- Preserve ownership, credit, and control of existing data
- Distribute the logistical overhead associated with sharing digital information
- Minimize reliance on proprietary software applications
In order to accomplish these objectives, USGIN participants join or construct data-sharing networks that conform to USGIN specifications.
Public access to interoperable digital earth science data
Broken down, “public access to interoperable digital earth science data” connotes the following:
- earth science data is data that describes geophysical resources like petroleum, natural gas, or geothermal energy
- Digital data is data that is stored in a computer
- Interoperable data can be processed by a computer with minimal user interpretation or input
- Public access to data is achieved when the general public can access the data with little more than a web browser
In order to become part of a data-sharing network, data must first be digitized – that is, converted from a physical object such as a map, photograph, or well log, into a computer file of some kind. This can be both labor- and resource-intensive, particularly in situations in which analog data has not been adequately inventoried.
The digitization process usually involves a combination of scanning analog documents and manually entering data into appropriate software applications. For example:
- Analog maps can be scanned as raster images; some maps require a wide-bed scanner
- Well logs can be digitized via a combination of scanning and hand-digitization
- Data records that use quantitative or qualitative values to describe geophysical resources can be manually entered into databases
In general, agencies contributing data to USGIN data-sharing networks use their own resources and staff to digitize any data they wish to share.
Interoperability is usually achieved by structuring data in such a way that it can be exchanged with minimal user interpretation or input. Data can be divided into three tiers of interoperability:
The higher the tier, the greater the interoperability. In general, data contributed to a USGIN data-sharing network must achieve Tier-3 interoperability.
Tier 1: Text, Images, or Recorded Sound
Text, images, and recorded sound are unstructured: an extensive degree of user interpretation is required before they can be processed by computers or used to perform analyses:
- Text files, such as PDFs or Microsoft Word documents, must be parsed by human operators: mined for data that is then broken down and structured in such a way that it can be processed by computers or used for analysis
- Images must be processed by human operators in a variety of ways, depending on the context in which the image will be used:
- If the image will be used to create a web map service, it must be georeferenced
- If the image will be part of a web feature service, it must be vectorized, often by hand
- Audio files must be transcribed and, like text files, parsed
Sometimes, text files, images, and audio files will be parsed into database form. For example, data derived from field notes or log files can be manually entered into database fields.
Tier 2: Data Structured by a Non-USGIN Schema
Tier-2 data is structured: it conforms to a schema that determines what kind of data goes where within a given document.
- Tier-2 data must be encoded in a file format that can be determined by inspection or based on file name or metadata
- Web feature services and web map services may be deployed using Tier-2 data, but users will have to determine how to extract useful information on a case-by-case basis
Tier 3: Data Structured by a USGIN Schema
Tier-3 data is data that has been structured by a USGIN schema, such as a schema provided by a National Geothermal Data System (NGDS) content model. NGDS content models are available on the USGIN Schemas subdomain.
Tier-3 data may be stored in a file-based table with field names matching those in the content model, or in an XML document that validates against the XML schema that corresponds with a content model of the appropriate version.
Web Accessibility and Discoverability
Data must be physically stored on a server in order to be web-accessible.
USGIN specifies that any server may be part of a USGIN data-sharing network, provided the data and the server are appropriately configured; USGIN provides specifications to facilitate the process of configuring data and servers in such a way that they are part of a USGIN data-sharing network. This allows users to host and control their own data.
In addition to being web-accessible, data that is part of a USGIN data-sharing network must also be discoverable: users must be able to locate and access it.
USGIN facilitates data discovery by means of a metadata catalog that can be accessed via web browser. Data that is part of a USGIN data-sharing network is described by metadata registered in a metadata catalog (Figure 1). For example, any data that is part of the National Geothermal Data System (NGDS) is described by a metadata record in the NGDS Catalog. The NGDS Catalog is an interface that permits users to locate NGDS metadata records; these records are then used to locate and access the data resources they describe, in much the same way as a card catalog in a public library is used to locate books.
Figure 1: The relationship between USGIN data and metadata records in a USGIN catalog
Preservation of Data Ownership, Credit, and Control
Data ownership and control are common problems associated with digital data distribution. To wit: if a data provider uploads their data to a server they do not own or otherwise control, the data provider effectively loses control and ownership of their data.
Though USGIN specifies that any server can be used to deploy their data, USGIN data providers deploy their data on servers owned or controlled by the data provider. This can, and often does, impose a financial burden on data providers: servers and Internet connections are not free. But by paying for and controlling the computing resources used to deploy their data, USGIN data providers thereby pay for ownership and control of their data. For many data providers, this is a worthwhile tradeoff.
Provision for Open-Source Software Applications
USGIN specifications make room for free-and-open-source alternatives to proprietary software applications such as ESRI ArcGIS. Free-and-open-source software alternatives make it possible for USGIN users to deploy and access data without assuming what can become an unsustainable financial burden. For example, users wishing to deploy a web service can choose to contact ESRI and purchase an ArcGIS Server license; alternatively, users may take advantage of GeoServer, which is free-and-open-source web server software that is capable of providing a web service. USGIN supports both options.
Though many open-source software applications are free, the primary disadvantage of open-source software applications such as GeoServer is the need for expertise: many open-source software applications can be described as expert-friendly and therefore difficult for inexperienced users to approach.
To compound this problem, it is impractical for USGIN to provide detailed guidance on the use of open-source software: open-source software tends to be developed by a community of volunteers, and development cycles are often unpredictable.
Data that is part of a USGIN data-sharing network is:
- Structured by a USGIN schema
- Described by metadata records registered in a web-accessible catalog
- Published and accessed via open-source software applications
As a result of these characteristics, USGIN data is capable of interoperable discovery and analysis.