Entry Overview
Information science can look abstract because many of its most important words are used casually in everyday speech. People say information, data, metadata, search, relevance, archive, and classification as if the…
Information science becomes easier to understand once its key terms are made practical
Information science can look abstract because many of its most important words are used casually in everyday speech. People say information, data, metadata, search, relevance, archive, and classification as if the meanings were obvious. In professional work, those terms are much more precise. A practical glossary therefore does more than define jargon. It shows how the field thinks about finding, organizing, preserving, describing, connecting, and evaluating knowledge. The terms below are some of the ones readers need most often when they move from casual talk about information to serious study of how information systems actually work.
Information, data, document, and record
Information is meaningful content that reduces uncertainty or supports understanding. The same raw fact may become useful information in one context and meaningless noise in another. Data is a narrower term. It refers to encoded observations, measurements, entries, or symbols that can be processed, stored, or analyzed. Data becomes information when it is interpreted in context. Document refers to a bounded item that carries content: a letter, report, image, webpage, dataset, audio file, or article. Record is a document or data object preserved as evidence of an activity, decision, or transaction. Records management cares not only about content but about authenticity, retention, and accountability.
This distinction matters because information science often studies not ideas in the abstract, but information objects moving through systems. A report is not just a text. It may be a document with metadata, a record with legal value, a searchable item in an index, a citation source in a database, and a preservation object in an archive.
Metadata, schema, and standard
Metadata is data about a resource. It can describe title, creator, subject, format, date, rights, provenance, technical properties, and relationships to other resources. Metadata makes discovery, management, and preservation possible at scale. Schema is the structured set of elements and rules used to organize metadata. It tells a system which fields exist and how they relate. Standard is a shared specification that allows systems and institutions to exchange or interpret information consistently. Without standards, one database may call a creator field “author,” another “originator,” another “responsible agent,” and interoperability becomes far more difficult.
Readers should think of metadata as the descriptive layer, schema as the design pattern for that layer, and standards as the agreement that lets many systems speak in compatible ways. These concepts are foundational because information science is rarely concerned with one isolated collection only. It asks how information moves across systems, over time, and between communities.
Classification, taxonomy, ontology, and controlled vocabulary
Classification is the practice of placing items into categories according to an organized principle. It helps users browse large collections and identify relationships. Taxonomy usually refers to a hierarchical arrangement of categories, often moving from broad classes to narrower subdivisions. Ontology goes further by modeling entities, properties, and relationships in a formal way, especially in knowledge representation and semantic systems. Controlled vocabulary is a curated set of approved terms used consistently for indexing and retrieval.
These tools exist because ordinary language is flexible, ambiguous, and full of synonyms. One user searches for automobiles, another for cars, another for vehicles. One archive describes a creator by full name, another by abbreviated form. Controlled vocabularies and classification systems reduce that chaos. They do not eliminate judgment, however. Every classification scheme reflects choices about what distinctions matter, whose language counts, and what kinds of knowledge fit poorly into fixed categories.
Indexing, cataloging, and description
Indexing is the assignment of terms or descriptors that make a resource searchable by topic, name, date, place, or concept. Cataloging is broader. It creates the descriptive record that identifies and organizes a resource within a collection or system. Description refers to the explanatory metadata that tells users what a resource is, what it contains, and how it relates to other resources.
In practice these functions overlap. A catalog record may include both descriptive metadata and indexed subject headings. The important point is that search systems do not become useful by accident. Someone or something has to create structured pathways between a user’s question and a resource that may answer it.
Information retrieval, query, relevance, precision, and recall
Information retrieval is the field concerned with finding useful information in response to a need. Search engines, library catalogs, legal databases, discovery layers, recommendation systems, and archival search tools all operate within this broad area. A query is the expression of the user’s request, whether typed in keywords, spoken, selected through filters, or implied by behavior. Relevance is the degree to which a retrieved item matches the user’s need. This may depend on topic, authority, timeliness, format, or task.
Two classic evaluation terms are precision and recall. Precision asks how many retrieved results are actually useful. Recall asks how many of the useful items in the collection were successfully retrieved. High precision with low recall may give a clean but incomplete answer. High recall with low precision may bury the good items in clutter. Information science studies how to balance these outcomes depending on the user’s goal.
Ranking, signal, noise, and relevance feedback
Ranking is the ordering of results so that more useful items appear earlier. Search systems rank using many signals, including term matching, citation structure, link patterns, popularity measures, field weighting, freshness, user behavior, or learned models. A signal is any feature that helps the system infer usefulness. Noise is irrelevant or low-value material that interferes with discovery. Relevance feedback occurs when the system uses explicit judgments or user behavior to refine later results.
These terms matter because search is not just about matching words. It is about predicting usefulness under uncertainty. Ranking is therefore one of the most consequential parts of modern information systems. It influences what is found first, what is overlooked, and which sources become effectively visible or invisible to users.
Corpus, collection, repository, and archive
Collection is a general term for an organized group of information objects. Corpus is often used for a body of texts or other data assembled for analysis, especially in computational work. Repository refers to a managed system that stores and provides access to digital objects such as papers, datasets, images, or institutional outputs. Archive has a more specific meaning in many professional settings: records or materials preserved because of enduring value, along with the institutions and practices that manage them.
People often use archive loosely to mean any storage of old material, but archival practice is more exacting. It cares about provenance, original order, authenticity, and long-term preservation. Information science pays attention to these distinctions because finding something and preserving it are related but not identical problems.
Provenance, authenticity, preservation, and curation
Provenance is the history of origin, custody, and transmission of an information object. It helps users judge context and trust. Authenticity concerns whether an object is what it claims to be and whether it has remained trustworthy as evidence. Preservation is the set of actions taken to keep information accessible and usable over time. In digital settings, this may include format migration, checksum verification, redundancy, and documentation of technical dependencies. Curation is the ongoing selection, organization, enhancement, and stewardship of resources so they remain meaningful and usable for intended communities.
These ideas are increasingly important in an era of synthetic media, unstable links, disappearing platforms, and rapidly aging software environments. Information science does not only help users find content now. It also helps societies prevent cultural, scientific, legal, and administrative memory from collapsing later.
Interoperability, identifier, and linked data
Interoperability is the ability of different systems to exchange and meaningfully use information. Identifier is a stable label assigned to an entity, resource, or concept so it can be referenced reliably across systems. Linked data is an approach to publishing structured data in ways that make entities and relationships machine-readable and connectable across the web or across repositories.
These terms matter because modern information environments are distributed. A researcher may move between a catalog, an institutional repository, a citation database, a publisher platform, an authority file, and a preservation system in one workflow. Without identifiers and interoperable standards, each transition introduces ambiguity, duplication, or loss of context.
Information behavior, literacy, and usability
Information behavior refers to how people seek, encounter, avoid, evaluate, share, and use information. Information literacy is the set of abilities needed to recognize an information need and to locate, evaluate, and use information effectively and ethically. Usability concerns how easily users can interact with a system to accomplish a task. These terms remind readers that information science is not just about databases. It is also about human beings with limited time, partial knowledge, emotional pressures, and unequal access to tools.
That final point is crucial. An elegant retrieval system is not truly successful if ordinary users cannot understand its options, trust its outputs, or navigate it without frustration. Information science cares about systems, but it studies them in relation to human needs.
Authority, access, and trust are also core information-science concepts
Authority refers to the degree to which a source is regarded as credible or reliable for a given purpose. In information science, authority is contextual. A source may be authoritative for a technical specification but not for a community’s lived experience, or vice versa. Access means the ability to discover, reach, and use information. This includes legal permission, technical availability, accessibility for disabled users, and practical usability. Trust concerns whether users believe a system or source is dependable, transparent, and worthy of reliance.
These terms are essential because information systems fail in more ways than one. A perfectly indexed collection is of limited value if licensing blocks use, interfaces exclude disabled readers, or users do not trust provenance and ranking. Information science pays close attention to these layered barriers because they shape whether information is merely stored or genuinely usable.
Algorithm, model, knowledge graph, and interoperability in modern systems
Algorithm is a set of steps for processing data or solving a problem. In information systems, algorithms help rank search results, recommend items, deduplicate records, detect entities, or classify content. Model is a representation, often statistical or conceptual, used to describe patterns and support prediction or organization. Knowledge graph is a structured representation of entities and their relationships that supports richer search, linking, and inference. Interoperability, already noted above, becomes especially important here because modern models and graph-based systems depend on data that can move cleanly across institutions.
Readers should note that these terms point to a major modern shift. Information science is no longer only about shelves, catalogs, and documents in a narrow sense. It is also about computational structures that shape discovery, recommendation, summarization, and entity resolution across vast digital environments. The classical vocabulary of description and the modern vocabulary of modeling now increasingly belong in the same conversation.
For the research frame that uses this vocabulary in practice, see How Information Science Is Studied.
Search Intent Paths
These intent paths are built to capture the exact queries readers commonly ask after landing on a topic: definition, comparison, biography, history, and timeline routes.
What is…
Definition-first route for readers asking what this subject is and how it fits into the larger field.
History of…
Historical route for readers looking for development, background, and turning points.
Timeline of…
Chronology route that organizes the topic into milestones and sequence.
Who was…
Biography-first route for readers asking who this person was and why the figure matters.
Explore This Topic Further
This panel is designed to catch the search behaviors that usually follow a first encyclopedia visit: what is it, how is it different, who was involved, and how did it develop over time.
Information and Knowledge Science
Browse connected entries, definitions, comparisons, and timelines around Information and Knowledge Science.
“History Of…” and “Timeline Of…” Routes
Timeline entries that place the topic in chronological sequence and field development.
Timeline: Information Science Timeline: Major Eras, Breakthroughs, and Turning Points
Historical milestones and field development for this topic.
Related Routes
Use these routes to move through the main subject structure surrounding this entry.
Subject Guide: Information and Knowledge Science
Central route for this branch of the encyclopedia.
Field Guide: Information and Knowledge Science
Central route for this branch of the encyclopedia.
Leave a Reply