This page defines terms as they are used within the context of the United States Geoscience Information Network (USGIN). Vocabulary terms that are used only in a specialized context (such as a specific tutorial or a specific NGDS content model) are not defined here, but are instead defined within the specific instance in which they are used. Likewise, geoscience vocabulary is not defined here.
For example, attributes describing a fault feature might include:
|Attribute (Markup Language)||
For a much more detailed overview of attributes, see the USGIN XML Tutorial.
An explicit logical association between any two things. For example, a binding can exist between two resources; a binding can also exist between a location and the resource found there.
Content models provide NGDS schemas that are designed to facilitate interoperability. Any data submitted for the AASG Geothermal Data project by Arizona Geological Survey subrecipients must be structured by an NGDS schema provided by an NGDS content model.
Data constitutes observations or measurements that are used to describe things; data often describes features.
Though the terms data and information are often used interchangeably, it should be noted that data technically indicates raw observations; information connotes interpreted observations. A discrete cluster of related data is known as a dataset.
A method of storing data. In a database, data is divided up into database records; in turn, database records are divided up into database fields. The advantage of a database is that it can be sorted and searched by field contents.
Though modern databases are usually digital, a physical example of a database is a card catalog in a public library. In a card catalog, data is divided up into individual cards, which are directly analagous to database records. Each card (record) in the catalog corresponds with and describes a book. The information about each book is divided up into fields: title; author; subject; publication date; etc.
Digital databases can be in tabular format (that is, a table) in which rows represent individual records and columns constitute fields; or they can be viewed record-by-record.
Using the analogy of a card catalog in a public library: if the card catalog is directly analagous to a database, and if the cards in the catalog are directly analagous to database records, then the different subdivisions of information found in each card in the catalog (title, author, publishing date, etc.) are all database fields.
A database field designed to contain values that are used to organize and maintain the uniqueness of records within a database. Database keys identify records in such way that they can be referenced by other databases; consequently, database keys allow databases to refer to records in other databases.
A subdivision of a database. Using the analogy of a card catalog in a public library, each record in a database is analogous to an index card in the card catalog; each card (record) corresponds with and describes an individual book. Database records are further subdivided into fields, which contain specific kinds of data.
A more concrete example is a card catalog in a public library: if one chose to look only at the Title field of each card in the catalog, the act of doing so would constitute a discrete view of the catalog.
For example: academic papers and articles often cite other papers or articles as sources; the act of following one of these references and displaying the source document is the act of dereferencing.
In the context of the World Wide Web, dereferencing usually involves the act of displaying the document to which a given hyperlink refers.
Elements are logical document components found in markup languages such as XML and HTML. Elements simultaneously define the structure and content of a document. An element is demarcated by markup language tags. For example:
The first tag opens the element; the second tag closes it; everything between the opening tag and the closing tag constitutes the content of the element.
In a more concrete example, HTML uses the <em> element to demarcate text that should be emphasized. So, an emphasized element of an HTML document would appear as follows:
In a web browser, this element would produce the following result:
Markup language elements do not always need an opening tag and a closing tag because some elements can close themselves by including a space followed by a forward slash (/) within the element. For example, the following element is self-closing:
Markup language elements are functionally similar to database fields, in that both serve to subdivide the content of a document.
For a much more detailed overview of elements, see the USGIN XML Tutorial.
The definition of the term feature therefore depends on the context within which it is used.
Note: GIS features do not always correspond with geologic features because GIS software can be used to represent anthropogenic objects such as buildings, roads, or canals.
A feature class can be either a method of storing GIS features of the same geometry (point, line, or polygon); or it can be a discretionary or subjective grouping of homogenous GIS features. For example, "highways, primary roads, and secondary roads can be grouped into a line feature class named "roads."
|Geographic Information Systems (GIS)||
A system designed to capture, store, manipulate, analyze, manage, and present all types of geographically referenced data.
HTML stands for Hypertext Markup Language. HTML is the predominant language in which web pages are written.
HTTP stands for Hypertext Transfer Protocol. HTTP is the networking protocol that is used to transfer information over the World Wide Web.
HTTP defines four basic operations (requests) made by clients to servers:
These HTTP requests correspond to standard database CRUD operations:
HTTP also defines a variety of header parameters that may be included with requests; these header parameters specify language, desired media type for response, character encoding, time stamps for resources, etc. In addition, HTTP defines a collection of codes automatically used in response to HTTP requests; these codes indicate various success, error, or redirect conditions. A particularly infamous HTTP code is 404: Not Found.
HTTP is defined by an Internet Engineering Task Force (IETF) Request for Comment (RFC) document: http://tools.ietf.org/html/rfc2616.
Functionally, identifiers can be compared to names. We give people, places, and things names to distinguish them from one another.
A URI is a specific kind of identifier.
Interchange formats are file formats that can be used to exchange data between hardware platforms and software applications, regardless of platform or application configuration. A useful example can be found in modern printers: files sent to the printer are exported in a format that all printers can read; this format constitutes an interchange format.
Interchange formats facilitate interoperability in two ways:
As web services are used for data exchange, web-accessible data that is formatted and structured for deployment as a web service can be said to be an implementation of an interchange format. Likewise, data that has been conformed to an application-neutral schema or file format constitutes an interchange format.
The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units. Often, interopability is facilitated by structuring data in such a way that it is consistently machine-processable without user interpretation or input.
For more information on interoperability as a goal of USGIN, see the USGIN Objectives page.
In the context of data, mapping is the process of interpreting and restructuring data. Often, data mapping takes place from one schema to another, a process referred to as schema mapping.
Schema mapping is typically accomplished by conforming data to fit the structure of a given document. A simple example of schema mapping is the conversion of dates in a given document from the MM-DD-YYYY format to the YYYY-MM-DD format. Another example would be the conversion of units of measure from inches to meters, or converting unit notation from millimeters to mm.
Often, schema mapping is slightly more complex than the above examples would indicate. Sometimes, data must be mapped from a single field into multiple fields; here, an example would be the act of mapping dates from a single Date field into three separate fields corresponding with Day, Month, and Year. Likewise, schema mapping sometimes involves combining data from multiple fields into one field.
The word mapping can also be used as a noun. A mapping is an instance in which data has been mapped from one schema to another.
Markup languages, such as HTML and XML, use elements to structure documents in such a way that the structure of the document is visible and readily distinguishable from the content of the document. Consequently, markup language documents can be used to store data, because the visible structure of the document permits users to subdivide data into elements in much the same way that data in a database is subdivided into records and fields.
Literally, "beyond data," metadata is often conceived as "data about data."
Any data used to organize, categorize, locate, or discover something is metadata. Because metadata is merely data that is used to find something, metadata can be (and often is) stored in databases.
For more information about metadata, see the USGIN Metadata Tutorial.
Software can be considered open-source when it complies with the crieteria of the Open Source Initiative. Briefly summarized: to be considered open-source, the software or license must...
A more detailed list of these conditions may be found here.
A profile is a limited, specific implementation of a standard. Standards often permit a wide array of possible implementations; profiles implement a specific configuration of values selected from the range of values provided by the standard.
For example: MPEG-4 is an international audio-video encoding standard (ISO/IEC CD 14496); MPEG-4 Part 2 deals specifically with encoding video. MPEG-4 Part 2 standards specify that media should be encoded between 64 and 8000 kilobits of visual data per second (kbit/s), a range of data rates that accommodates anything from the audio stream of a digital telephone to the video stream of a DVD video (at 40,000 kbit/s, Blu-Ray video is well beyond the scope of MPEG-4 Part 2 and instead conforms to Part 10 of the MPEG-4 standard).
To simplify things, the MPEG-4 Part 2 standard lists several specific implementations, or profiles, of the MPEG-4 Part 2 standard, each of which specifies a maximum bitrate (in concert with other variables, such as frame rate, that are not included here):
So, digital video encoded at 1200 kbit/s would conform to Level 3b of the Advanced Simple Profile of the MPEG-4 Part 2 standard.
A useful physical analogy is a traffic light: on green lights, cars pass through intersections; on red lights, cars are not permitted to pass through an intersection. The rules represented by traffic lights can be considered traffic protocols; the rules for computer-related tasks such as data transfer are computing protocols.
HTTP is one example of a computing protocol.
Rasters use a data model in which data, usually images or continuous datasets, can be stored and represented visually as values within a grid of cells. Raster grid cells are assigned values that represent specific properties; these grid cell values usually can be decoded as colors. Consequently, a raster dataset is rather like a sheet of graph paper in which each cell contains a color that corresponds with the data represented by the cell. The resolution of a raster dataset is the number of cells on the X and Y axes of the raster grid.
Raster datasets with which most users will be familiar are digital images, including JPEG, TIFF, or GIF images. The individual raster grid cells of these images are referred to as pixels. If the resolution of such images is large enough, individual pixels will be difficult to discern with the naked eye (depending on the scale at which the image is viewed).
The amount of color data that can be stored in a given pixel depends on the format of the raster image. Common raster formats are JPEG, TIFF, GIF, PNG, and BMP.
Raster images are optimal primarily for the storage and display of continuous data sets, which model phenomena without distinct boundaries such as temperature gradients over a given area. Continuous data sets are difficult to display as vectors. A disadvantage of rasters is that raster image files can be very large depending on the resolution, color depth, and compression of the image.
A feature that fulfills a specific requirement.
Almost anything can be a resource, as long as it is identifiable and fulfils a requirement.
For example, a database is one form of document; a database schema describes:
As a practical example, a database schema might determine whether or not dates should be entered in a given field according to the MM-DD-YYYY or YYYY-MM-DD formats.
Database schemas thereby dictate where and how data should be entered into a database.
Schema validation is the process of checking data in a database against a schema. A database record containing data that has not been entered in accordance with the appropriate schema, such as data that has not been entered into the appropriate column or formatted in the appropriate way, is invalid.
A server can be:
The ability to serve requests is usually conferred by installing server software on a computing platform and then configuring the computing platform appropriately (though individual applications are also capable of functioning as servers on their own). Server software designed to create a web server (a server capable of serving HTTP requests) is called web server software.
Subrecipients are the state geological surveys (or equivalent state agencies) that are subcontracted by the Arizona Geological Survey (AZGS) under the Department of Energy (DOE) contract No. DE-EE0002850 for the National Geothermal Data System (NGDS) project to perform the subcontract requirement to make "at risk" geothermal data available online to promote geothermal development throughout the Unites States.
A token is a discrete, logical, non-elementary component of an information stream (here, non-elementary means that a token is not irreducible and can be reduced to smaller components). For example, a sentance is an information string; words are non-elementary components of a sentence and therefore tokens.
Computing information streams, such as URIs can also be broken down into tokens. For example, USGIN URIs conform to the following syntax:
...wherein the following are tokens:
For more information about USGIN URIs, see the USGIN URI tutorial.
|Uniform Resource Identifier (URI)||
Uniquely identifies a resource; in the context of USGIN, URIs typically identify database records, features, and vocabulary terms. USGIN URIs are also designed to dereference to representations of the resources they identify. See the USGIN URI Tutorial for more information about URIs.
Vectors use a data model in which data, usually categorical or discrete data, is stored and represented visually as coordinate points, or vertices. A single vertex is a point; multiple vertices strung together can form lines (referred to as arcs in ESRI products), polygons, or even three-dimensional objects in virtual space; these vector-based objects can be rasterized with relative ease.
GIS software takes advantage of the characteristics of vector data to generate maps in which the features on the map are defined by vertices. Each vertex is georeferenced, often taking advantage of Global Positioning System (GPS) satellites; each feature on the map is then described by attributes, which provide the user with information that can be used to locate features and perform geospatial analysis.
Vector datasets are flexible, and their file size is usually small: they grow in size only as more data is added to the dataset. The primary disadvantage of vector datasets is their inability to represent phenomena without distinct boundaries.
|Web Server Software||
Web server software is a software package that allows a computer to listen for, and respond to, incoming HTTP requests. An appropriately configured computer on which web server software has been installed can act as a web server.
In addition, the term web service is often applied to data hosted using an application that provides web services.
Web services facilitate interoperability by allowing the client and the server to develop independently. This means that regardless of client or server make and model, regardless of changes to content on the server, a web service will be able to respond to requests as long as those requests are made using correct syntax.
The Open Geospatial Consortium has produced several different flavors of web service that are relevant to geographic information systems, USGIN, the National Geothermal Data System (NGDS), and the AASG Geothermal Data project. These include:
XML stands for Extensible Markup Language. XML acts as the basis for more specialized markup languages such as GML, GeoSciML, and KML.
Because XML elements are functionally similar to database fields, USGIN specifies the usage of XML documents as an interchange format for database records. To use XML documents as an interchange format, USGIN defines XML schemas in which each XML document corresponds with a specific database record and each element in a given XML document corresponds with data entered in a database field. In these documents, elements define the database fields.
For example, a Date field containing the date 12/7/1941 would appear as follows in an XML document:
For a much more detailed overview of XML, see the USGIN XML Tutorial.