Theme 1: How are knowledge infrastructures changing?
What does it mean to “know” in an age of social networks, big data, interdisciplinary research, and new modes of access to “bigger,” “wider,” “longer,” and “faster” information? How is knowledge now being generated, maintained, revised, and spread?How are open data, web publication, and commodity tools affecting concepts of expertise, processes of peer review, and the quality of knowledge?
Building on extensive literatures in science & technology studies, including previous work by members of this group (Edwards et al. 2007), Edw
ards (2010) defined knowledge infrastructures as “robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds.” This framing aimed to capture routine, well-functioning knowledge systems such as the world weather forecast infrastructure, the Centers for Disease Control, or the Intergovernmental Panel on Climate Change. Under this definition, knowledge infrastructures include individuals, organizations, routines, shared norms, and practices.
Key to the infrastructure perspective is their modular, multi-layered, rough-cut character. Infrastructures are not systems, in the sense of fully coherent, deliberately engineered, end-to-end processes. Rather, infrastructures are ecologies or complex adaptive systems; they consist of numerous systems, each with unique origins and goals, which are made to interoperate by means of standards, socket layers, social practices, norms, and individual behaviors that smooth out the connections among them. This adaptive process is continuous, as individual elements change and new ones are introduced — and it is not necessarily always successful. The current situation for knowledge infrastructures is characterized by rapid change in existing systems and introduction of new ones, resulting in severe strains on those elements with the greatest inertia.
The workshop concluded that at least the following phenomena require sustained attention:
Knowledge in perpetual motion. A transition is underway from what Weinberger (2012) calls “knowledge as a series of stopping points” — printed journal articles, books, textbooks, and other fixed products — to a world where knowledge is perpetually in motion. Today, what we call “knowledge” is constantly being questioned, challenged, rethought, and rewritten. As Weinberger describes the current situation, we face a world of abundant information, hyperlinked ideas, permission-free resources, highly public interaction, and massive, unresolved disagreement. Individual expertise is (many argue) being replaced by the wisdom of crowds: noisy and endlessly contentious, but also rich, diverse, and multi-skilled. In part, this means that the divide between knowledge producers and knowledge consumers is increasingly and radically blurred. In such a world, the missions of educational institutions such as schools and colleges, research institutions such as laboratories and universities, and memory institutions such as libraries, archives, and museums are bleeding into each other more than ever before. New forms of collective discovery and knowledge production, such as crowdsourced encyclopedias, wikis of all sorts, shared scientific workflows, and citizen science are springing up within and across many academic disciplines (De Roure et al. 2011; De Roure et al. 2010; Goble & De Roure 2007; Shilton 2009; Shirky 2009, 2010; Takeda et al. 2013; Wade & Dirks 2009). The quality and durability of knowledge produced by such efforts remains uncertain, but their tremendous vigor and growing utility cannot be questioned.
Shifting borders of tacit knowledge and common ground. Consider that the study of knowledge by the social sciences and the humanities has been based on the same premises now being challenged by emerging forms. For example, several decades of scholarship in sociology and anthropology of knowledge established the difficulty of communicating local practices and understandings without face-to-face contact (H. M Collins 1985; H. M. Collins & Pinch 1993). The phrase “distance matters” — because technology-mediated communication makes it more difficult to establish common ground — became a watchword in computer-supported cooperative work. Tacit knowledge and common ground were, and still are, regarded as major stumbling blocks to long-distance collaboration (Olson & Olson 2000; Olson et al. 2009). Yet an increasing amount of important knowledge work occurs under precisely these conditions; both technology and human skills are evolving to meet the challenge (Rosner 2012; Rosner et al. 2008; Vertesi 2012; Wiberg et al. 2012). In a world of Skype, Google Hangouts, Twitter, YouTube videos, and highly developed visualization techniques, the roles of tacit knowledge and common ground are changing, and a renewal of our understanding is required (Cummings et al. 2008).
Complexities of sharing data across disciplines and domains. Excitement continues to mount over new possibilities for sharing and “mining” data across scientific disciplines. Vast data repositories are already available to anyone who cares to use them, and many more are on the way. Yet data sharing begs urgent questions (Borgman 2012). In science, at least, the meaning of data is tightly dependent on a precise understanding of how, where, and when they were created (Bechhofer et al. 2010; Burton & Jackson 2012; Gitelman 2013; Ribes & Jackson 2013; Vertesi & Dourish 2011). But the rapid “commodification” of data — the presentation of datasets as complete, interchangeable products in readily exchanged formats — may encourage misinterpretation, over reliance on weak or suspect data sources, and “data arbitrage” based more on availability than on quality . Will commodified data lead to dangerous misunderstandings, in which scientists from one discipline misinterpret or misuse data produced by another? How far can the standardization of data and metadata be carried, and at what scale? What new kinds of knowledge workers are needed to bridge the gaps, both technical and social, among the many disciplines called on to address major scientific and social issues such as climate change, biodiversity triage, or health care for an aging population? Can the reputation systems of science be re-tuned to recognize and compensate these vital, but too often invisible and unrewarded, workers?
New norms for what counts as knowledge. Scientific data analysis increasingly uncovers significant and useful patterns we cannot explain, while simulation models too complex for any individual to grasp make robust predictions (e.g., of weather and climate change). Will these phenomena add up, as some predict, to an “end of theory” (Anderson 2008)? The question of how to evaluate simulation models — of whether they can be “validated” or “verified,” and whether they require a fundamentally different epistemology than theory and experiment — had already been puzzling both scientists and philosophers for several decades (Giere 1999; Heymann 2010; Jackson 2006; Morgan & Morrison 1999; Norton & Suppe 2001; Oreskes et al. 1994; Petersen 2007; Sismondo 1999; Sundberg 2009; 2010aa; 2010b; Suppe 2000). Data-driven science poses a similar, even harder problem of evaluation. Do we “know” things if we cannot explain why they are true? Whatever the case, norms for what can count as “knowledge” are clearly changing (Anderson 2008; Hey et al. 2009).
Massive shifts in publishing practices, linked to new modes of knowledge assessment. Historically, knowledge institutions depended on costly, hierarchically organized forms of credentialing, certification, and publishing. These set severe limits not only on outputs (in the form of published articles, books, etc.), but also on who could count as a valid participant in knowledge assessment practices such as peer review. Today, these mechanisms are challenged on all fronts. Much less costly modes of publication permit the early release and broad dissemination of virtually all data and models used in science; one result is a broad-based movement toward publication practices that permit results to be readily reproduced, at least in the computational sciences (Stodden 2010a, 2010b, 2011). Commodified data analysis tools and widely available software skills permit a much larger number of participants to analyze data and run models. Networked social forms permit many more participants to comment publicly on knowledge products, bypassing traditional credentialing and certification mechanisms (De Roure et al. 2011; De Roure et al. 2010; Kolata 2013).
Challenges to traditional educational institutions. Both research universities and teaching colleges face extraordinary challenges. For decades, costs to students have risen faster than inflation, while Coursera, open courseware, and online universities offer new, lower-cost alternatives. The majority of university students no longer attend 4-year residence programs. Many of those who do appear more motivated by the university as a rite of passage and a lifestyle than by learning itself, as reflected in numerous measures of student learning and the amount of time spent studying (Babcock & Marks 2010, 2011; Mokhtari et al. 2009). Classroom teaching competes directly with online offerings; professors are no longer seen as infallible experts, but as resources whose facts can be checked in real time. As institutions, research universities display patent-seeking behavior that makes them increasingly difficult to distinguish from corporations, and indeed corporate sponsorship and values have penetrated deeply into most universities. Some have been more effective than others at building firewalls between sponsors’ interests and researchers to protect their objectivity, but no institution is immune to these challenges (Borgman et al. 2008). K-12 education faces related, but different challenges, as schools struggle to adapt teacher training, equipment, and teaching methods to the screen-driven world most children now inhabit. Major benefits will accrue to institutions and students that find effective ways to meet these challenges — and doing so will require new visions of their place in larger infrastructures of knowledge, from national science foundations to corporate laboratories to educating new generations of researchers.
Navigating across scales of space and time, and rates of change. Given the layered nature of infrastructure, navigating among different scales — whether of time and space, of human collectivities, or of data — represents a critical challenge for the design, use, and maintenance of robust knowledge infrastructures. A single knowledge infrastructure must often track and support fluid and potentially competing or contradictory notions of knowledge. Often invisible, these notions are embodied in the practices, policies, and values embraced by individuals, technical systems, and institutions. For example, sustainable knowledge infrastructures must somehow provide for the long-term preservation and conservation of data, of knowledge, and of practices (Borgman 2007; Bowker 2000, 2005; Ribes & Finholt 2009). In the current transformation, sustaining knowledge requires not only resource streams, but also conceptual innovation and practical implementation. Both historical and contemporary studies are needed to investigate how knowledge infrastructures form and change, how they break or obsolesce, and what factors help them flourish and endure.
standards and ontologies. A quintessential tension surrounds the deployment of standards and ontologies in knowledge infrastructures. Fundamentally, it consists in the opposition between the desire for universality and the need for change.
Robust hypotheses require information in standardized formats. Thus the spread of a particular disease around the world cannot be tracked unless everyone is calling it the same thing. At the same time, medical researchers frequently designate new diseases, thus unsettling the existing order. For example, epidemiologists have sought to track the phenomenon of AIDS to periods predating its formal naming in the 1980s (Grmek 1990; Harden 2012). However, using historical medical records to do so has proven difficult because prior record-keeping standards required the specification of a single cause of death, precluding recognition of the more complex constellation of conditions that characterize diseases such as AIDS.
How might one solve this problem (if it is solvable at all)? One could review the old records and try to conjure them into modern forms. This could work to an extent; some fields, such as climate science, routinely investigate historical data before adjusting and re-standardizing them in modern forms to deepen knowledge of past climates (Edwards 2010). Yet this is possible largely because the number of records and their variety is relatively limited. In many other fields such a procedure would be extremely difficult and prohibitively expensive. Alternatively, one could introduce a new classificatory principle, such as the Read Clinical Classification, which would not permit that kind of error to propagate. Here too, due to the massive inertia of the installed base, it would cost billions of dollars to make the changeover. On top of that, it would complicate backward compatibility: every new archival form challenges the old (Derrida 1996). In practice, this adds up to very slow updating of classification standards and ontologies, marked by occasional tectonic shifts.
Today, hopes for massively distributed knowledge infrastructures operating across multiple disciplines consistently run headlong into this problem. Such infrastructures are vital to solving key issues of our day: effective action on biodiversity loss or climate change depends on sharing databases among disciplines with different, often incompatible ontologies. If the world actually corresponded to the hopeful vision of data-sharing proponents, one could simply treat each discipline’s outputs as an “object” in an object-oriented database (to use a computing analogy). Discipline X could simply plug discipline Y’s outputs into its own inputs. One could thus capitalize on the virtues of object-orientation: it would not matter what changed within the discipline, since the outputs would always be the same. Unfortunately, this is unlikely — perhaps even impossible — for both theoretical and practical reasons (Borgman et al. 2012).
An “object-oriented” solution to these incompatibilities is theoretically improbable because the fundamental ontologies of disciplines often change as those disciplines evolve. This is among the oldest results in the history of science: Kuhn’s term “incommensurability” marks the fact that “mass” in Newtonian physics means something fundamentally different from “mass” in Einsteinian physics (Kuhn 1962). If Kuhnian incommensurability complicates individual disciplines, it has even larger impacts across disciplines. A crisis shook virology, for example, in the 1960s when it was discovered that “plant virus” and “animal virus” were not mutually exclusive categories. Evolutionary biology suffered a similar, and related, crisis when it was learned that some genes could jump between species within a given genus, and even between species of different genera (Bowker 2005). Suddenly, disciplines that previously had no need to communicate with each other found that they had to do so, which then required them to adjust both their classification standards and their underlying ontologies.
In practice, an object-oriented solution to ontological incompatibilities is unlikely because we have not yet developed a cadre of metadata workers who could effectively address the issues, and we have not yet fully faced the implications of the basic infrastructural problem of maintenance. We do know that it takes enormous work to shift a database from one medium to another, let alone to adjust its outputs and algorithms so that it can remain useful both to its home discipline and to neighboring ones. Thus three results of today’s scramble to post every available scrap of data online are, first, a plethora of “dirty” data, whose quality may be impossible for other investigators to evaluate; second, weak or nonexistent guarantees of long-term persistence for many data sources; and finally, inconsistent metadata practices that may render reuse of data impossible — despite their intent to do the opposite.
We expect our knowledge infrastructures to permit effective action in the world; this is the whole impulse behind Pasteur’s Quadrant or Mode II science (Gibbons et al. 1994; Jackson et al. 2013; Stokes 1997). And yet, in general, scientific knowledge infrastructures have not been crafted in such a way as to make this easy. What policymakers need and what scientists find interesting are often too different — or, to put it another way, a yawning gap of ontology and standards separates the two. Consider biodiversity knowledge. In a complex series of overlapping and contradictory efforts, taxonomists have been trying to produce accounts of how species are distributed over the Earth. However, the species database of the Global Biodiversity Information Facility, which attempts to federate the various efforts and is explicitly intended for policy use, does not produce policy-relevant outputs (Slota & Bowker forthcoming).The maps of distribution are not tied to topography (necessary to consider alternative proposals such as protecting hotspots or creating corridors), they give single observations (where what is needed is multiple observations over time, so one can see trends), and for political reasons, they do not cover many parts of the planet (which one needs in order to make effective global decisions). Similarly, in the case of climate change, for decades the focus on “global climate” — an abstraction relevant for science, but not for everyday life — has shaped political discourse in ways that conflicted with the local, regional, and national knowledge and concerns that matter most for virtually all social and political units. Climate knowledge infrastructures have been built to produce global knowledge, whereas the climate knowledge most needed for policymaking is regional, culturally specific, and focused on adaptation (Hulme 2009).