Data in Society Jeopardy Template

We, the Data

Data & Design Justice

AI Hype

AI & the Environment

100

What does Wong mean when she says "private decisions have individual and collective effects"?

"The private, individual decisions to put up video doorbells have collective effects. Each neighbor might be doing so for justifiable reasons, but in effect, each doorbell collects information not just about that neighbor but whoever passes by that neighbor's house. In other words, these private decisions have individual and collective effects." (Ch. 1) -> Wong argues that "private" "individual" decisions like installing a Ring doorbell, even for "good intentions" such as convenience or package theft prevention, have profound collective effects. For instance, as illustrated by the Madeups’ neighborhood, a proliferation of video doorbells creates a "video doorbell surveillance community" by capturing pedestrians and transgressing their rights to move in public without fear of surveillance.

100

How is "data visceralization" different from visual minimalism?

Data Visceralization is a design approach that moves beyond mere visual display and aims to create data experiences that the whole body can experience, "emotionally, as well as physically." This is exemplified by the performance "A Sort of Joy," which made the audience feel the gender differential in an art collection through performer silence, rather than just seeing data. This approach differs from visual minimalism (like Tufte's "data-ink ratio") because it actively centers emotion and embodiment - the very "excess" cast out by traditional data science that seeks plainness and distance to achieve a mythical sense of 'objectivity.'

100

What is AI hype, what functions does it serve, and why is it harmful?

The aggrandizement of a technology sold to consumers and investors with the repeated message that they absolutely must buy or invest in it, lest they miss out. By relying on two seemingly contradictory visions (boosterism and doomerism), hype serves a commercial function to boost sales and attract investment, often by connecting this commercial goal with a popular fantasy of sentient machines or existential risk (cultural function). Bender and Hanna argue that AI hype generates unwarranted user trust, which can be very dangerous because it obfuscates real-world problems, including daily harms, such as the false arrest and detention of Black people due to faulty facial recognition, or the use of AI systems in military operations. A leading example of AI hype are Elon Musk's repeated claims that Tesla's humanoid robot is on the verge of achieving full autonomy, despite the technology repeatedly failing tests and concealing the massive amount of human labor necessary, because such promises boost investor confidence and consumer demand while obscuring the system's risks and limitations.

100

What are hyperscale data centers, and how do they affect communities?

Hyperscale data centers are massive computing facilities operated by tech giants like Amazon, Google, and Microsoft, built to handle the immense data and processing demands of AI and big data. While they can offer economic benefits like temporary construction jobs, they also negatively affect communities due to their intense energy consumption, the large quantities of water they require for cooling, and the noise and air pollution they cause. For example, Google's data center in Quilicura, Chile has been widely criticized for drawing millions of liters of water per day from an already drought-stricken region.

200

What, according to Wong, is the difference between humans as data subjects and data stakeholders?

She suggests it's insufficient to be seen as "data subjects" because this term implies passivity and restricts participation in decisions about data, like political subjects who lack agency in lawmaking. Instead, Wong proposes we are rights-bearing "co-creators of data" and should strive to be data citizens or stakeholders. This shift in framing is important because it asserts our inherent right and entitlement to act, demand protection, and participate in shaping datafication's impact on our relationships and society.

200

Why is data never "raw"?

Data is never "raw" because it is not a neutral or objective entity that exists independently in the world. All data is the result of human choices and power relations at every stage of the data lifecycle. These choices (e.g., what to count or how to categorize) are always influenced by the social, political, and historical contexts of the people creating the dataset. The label "raw" is dangerous because it masks this situatedness and implies the data is an unbiased reflection of reality when it is, in fact, a manufactured artifact that embeds the worldview of its creators.

200

What is the ELIZA Effect and why is it dangerous?

The tendency of users to uncritically ascribe human-like qualities or intelligence to a computational system, even when they know the system is merely pattern-matching or generating text based on statistical likelihoods. The term originates from Joseph Weizenbaum's 1960s chatbot ELIZA, where users readily believed the program genuinely cared and understood them even though they were familiar with the simple mechanism behind it. In the context of LLMs, Bender and Hanna emphasize that this effect is dangerous because it encourages users and developers to overestimate the system's competence and knowledge and overlook its profound limitations, including its inability to reason, its reliance on biased training data, or its potential for generating fluent but nonsensical or harmful output.

200

What's the economic principle that explains why efficiency gains in training and operating LLMs or data centers may not reduce, but instead increase, overall energy and resource consumption?

The principle is known as the Jevons Paradox. It states that as technological improvements increase the efficiency of a resource (like making data centers consume less energy per operation), the cost of using that resource effectively decreases. This lower effective cost leads to a sharp increase in demand and frequency of use (e.g., users running more LLM queries and training larger models), which ultimately results in higher overall consumption than before the efficiency gains were introduced.

300

What is "datafication" and why does Wong argue it's fundamentally different from other kinds of technological changes?

The recording, analysis, and archiving of our everyday activities as digital data. This constant recording of "nearly the entirety of our daily activities" transforms human life into digital information. It's fundamentally different from other because "it changes humanity in a personal way." "Where railroads, electrification, and the shipping container all shifted our economies and relationships by massively changing what was possible, datafication does so at the individual and collective human level by recording (nearly) the entirety of our daily activities." One example is the use of a fitness tracking app: before datafication, activities like taking a walk or sleeping were private & subjective. Now, digital apps continuously transform them into machine-readable data points: steps taken, distance covered, minutes of deep sleep, heart rate variability, etc. This shift means that the everyday act of exercising or resting becomes a constantly recorded and analyzable data stream, which is then used by insurance companies, advertisers, or health researchers.

300

What is the "god trick," and what alternative knowledge framework can challenge it?

Philosopher Donna Haraway uses the concept of the "god trick" to describe the dangerous illusion of achieving "a view from nowhere" as an omniscient & universally objective perspective that claims to see everything as it really is and without bias. A common example is Google Maps, which presents a seemingly all-knowing and objective "view from above," masking the fact that it is built from partial data and political decisions about what counts as a border or landmark, while users experience it as if it were the single true representation of the world. This perspective is inherently flawed because it attempts to mask the specific position from which all knowledge is actually produced. "Situated knowledge" challenges this trick by asserting that all perception and knowledge is necessarily incomplete and partial: each of us is always rooted in a specific body, location, and cultural context. By forcing us to acknowledge where and how we know something, situated knowledge dismantles the claim of universal objectivity and demands intellectual accountability and making space for diverse perspectives.

300

What is "fauxtomation"?

"Fauxtomation" is a term coined by author and filmmaker Astra Taylor to describe processes marketed as technologically advanced or fully automated when, in reality, they are significantly powered by human labor that has been deskilled, concealed, or shifted onto the user or consumer. In other words, fauxtomation is not the elimination of work, but rather the displacement and externalization of work that forces the consumer or a network of hidden "ghostworkers" to perform the necessary tasks. This deceptive practice serves to reinforce the perception that unpaid or marginalized work (e.g., social reproduction) holds no economic value, while simultaneously acclimating society to the misleading idea of human obsolescence in the face of supposedly seamless technology. The McDonald's self-service kiosk is a prime example: the system is presented as efficient automation, but it effectively shifts the work of order-taking, previously performed by a compensated employee, directly onto the customer, who now needs to perform data entry, order customization, and payment processing themselves.

300

What are the main forms of extractivism that structure the AI supply chain?

According to writers like Karen Hao and Kate Crawford, the AI supply chain operates as a layered system of extractivism with at least three main dimensions: materials, data, and human labor. Material extraction starts deep in the ground, with the mining of minerals like lithium and copper for chips and servers. These activities often cause severe and lasting damage to local ecosystems. This is accompanied by human labor extraction, from dangerous and underpaid mining jobs to the largely invisible and psychologically damaging "ghost work" of annotating, moderating, and refining data for very low wages. On top of that, the system depends on large-scale data extraction, in which vast amounts of human-generated content (often collected without meaningful consent) are scraped and stored to train AI models.

400

Why does Wong argue that the "the right to be forgotten" is insufficient and what alternative framework does she propose in its place?

It's hard to enforce in the case of "public interest;" Google, not us, decides whose information is forgotten; it's largely reactive, as it takes place after data creation, and due to data stickiness, data are effectively forever even if the original post has been deleted. Instead she argues for the "right against data collection," which confronts the problem at its source: "That such comprehensive data about our daily activities exists is the problem." It addresses data stickiness by limiting its creation from the outset and moves us from being "subjects" of datafication to "stakeholders" in its creation, use, and processing.

400

How did we define "missing data," and how does this differ from the definitions commonly used in data science or statistics?

From a feminist data justice perspective, "missing data" means information that institutions fail to prioritize, collect, maintain, or publish, even when communities are actively demanding it. Following Mimi Onuoha, it's political because it signals both a lack and an ought: the data isn't there, but people insist it should be. This notion is situated and relational, not 'neutral,' and often covers data that's sparse, unreliable, misclassified, or inaccessible, not only data points that are entirely absent, as in conventional statistics and data science. In class, we discussed the example of feminicide in Puerto Rico: for years, nonprofits showed that official systems undercounted cases (by as much as 27%) because there was no clear legal definition (until 2021) and no adequate counting infrastructure. In other words, the data wasn't "missing" in a technical sense - it was inadequately collected and poorly classified, which made it impossible to know the actual number of feminicide cases.

400

What is the "Clever Hans Effect" in LLMs?

The Clever Hans Effect in LLMs refers to the misleading appearance of comprehension or reasoning, where a model seems to solve a task correctly but is actually relying on spurious patterns or shortcuts in the data rather than any genuine understanding. In other words, it's "right for the wrong reasons," because it's exploiting massive statistical regularities and learned linguistic patterns instead of engaging in real intelligence. Just like the horse Clever Hans, who only appeared to do arithmetic by unconsciously reading human cues, an LLM produces fluent text by predicting the next most plausible token, which can make users overestimate its cognitive abilities, e.g., summarizing a familiar topic flawlessly because it has been trained on thousands of similar summaries, yet failing on a genuinely novel problem because it is optimizing for linguistic plausibility, not 'truth.'

400

What are "AI's sacrifice zones"?

AI's "sacrifice zones" are marginalized places and communities treated as expendable and forced to absorb the disproportionate environmental and health burdens that make the AI industry possible. This pattern extends a much older logic of extractive industries like mining, where land, water, and local well-being are exploited for the benefit of distant corporations. In the context of AI, these zones host the most resource-intensive parts of the supply chain, from high-pollution mining of rare minerals for hardware to the construction and operation of massive data centers, such as those in Memphis, Tennessee, or Quilicura, Chile. Residents in these areas face harms like constant noise pollution from industrial cooling systems, air pollution from diesel backup generators, and huge withdrawals of local water supplies for cooling, while the economic gains flow upward to multinational tech companies. The communities bearing the brunt of these impacts typically see little, if any, meaningful benefit in return.

500

What is the "Costcoization of data"?

Legal scholar Sarah Lamdan uses the term to critique the consolidation of academic publishers into powerful "information dealers." This model promotes these firms as a "one-stop shop" for massive government agencies' and corporate clients' data needs. In that sense, "[these] companies are informational Costco's or Sam's Clubs selling economy-sized information by the bucket. But unlike Costco…. the data analytics companies aren't so careful about quality control." because it packages vast quantities of personal data—gathered from multiple sources—into a single, convenient product that can be bought and used by institutions with little oversight. One example is LexisNexis' Risk Solutions products, which promise a seamless "360° view of identity," by turning unsuspecting people's continuously updated personal information into a bulk commodity, sold as an all-in-one data product rather than as discrete consent-based records.

500

What is "helicopter research" and what's an example of an alternative framework?

It's the extractive practice where researchers descend upon a local community (often marginalized or vulnerable) to collect valuable data and insights, only to quickly depart without sharing findings or ensuring the research benefits the community studied. E.g., when AI developers harvest speech recordings from a community for training a voice-recognition model, then leave without sharing results, compensating participants fairly, or ensuring the technology serves (or even reaches) the people whose data made it possible. This dynamic often reinforces existing power imbalances, and treats local people merely as data sources. Alternative frameworks are "participatory research," "co-design," "design justice," or RTDS, all of which mandate that the community be involved as equal partners throughout the entire research lifecycle, from framing the initial research questions and collecting data to analyzing the results and jointly determining how the findings are disseminated and applied for local benefit.

500

What is "automated austerity," and what are three concerns Bender and Hanna (2025) raise about adopting it?

"Automated austerity" describes the strategy governments (and increasingly, companies) use when facing shrinking budgets, where instead of adequately funding public services, they turn to automated systems as cheaper substitutes for human labor in key areas like schools, healthcare, social services, and administrative agencies. Bender and Hanna (2025) argue that this approach is deeply flawed for many reasons. First, automated systems rely on problematic benchmarks, meaning they measure success according to narrow and often decontextualized metrics (proxies, often lacking sufficient construct validity), e.g., "risk scores," that distort the real goals of public services. Second, these systems frequently appear to work, but often only because of the Clever Hans effect (they're merely exploiting statistical correlations rather than demonstrating any real understanding, which creates a dangerous illusion of competence). Third, automated austerity promotes what they call a "cartoon understanding" of specialized work, where complex human-centered processes (like teaching or diagnosing patients) are reduced to simplistic tasks that ignore care and professional judgment or the broader context. Additionally, AI systems tend to have high error rates for marginalized populations and widen inequality rather than improving access. Most importantly, automation becomes a political distraction: it does nothing to solve the underlying problem (i.e. chronic underfunding) and instead allows governments and companies to outsource responsibility under the guise of innovation. A key example is the COMPAS risk assessment algorithm, used across the United States to help judges determine bail and sentencing decisions. It was introduced as an "efficient" tool to reduce caseload burdens and speed up judicial decision-making. Investigations by ProPublica found that it wrongly labeled Black defendants as "high risk" at nearly twice the rate of white defendants and labeled white defendants "low risk" far more often, despite similar or worse re-offense histories. These disparities arose because the system relied on biased training data, including arrest history and neighborhood policing patterns. COMPAS exemplifies how automated austerity amplifies inequality and outsources state responsibility to a flawed algorithm instead of addressing structural problems in the legal system.

500

What is the "Big Tech Playbook"?

The "Big Tech playbook" refers to the set of tactics employed by multinational hyperscalers, predominantly based in the Global North, to systematically secure critical resources (like land and water) while simultaneously shielding themselves from environmental and social accountability in the communities they enter. A core tactic is obfuscation and secrecy, where companies enter communities under the guise of "shell companies" and then rely on classifying technical details, such as water consumption and energy use, as "confidential" or "proprietary," which makes it exceedingly difficult to accurately track their environmental footprints. This is often paired with superficial engagement: offering shallow community programs (like Google's Quilicura Urban Forest project) that distract from profound structural harms. The playbook also relies on imbalanced power dynamics exemplified by incidents where company representatives use language barriers and legal power to disregard local concerns and leave communities feeling intimidated rather than heard.