This study path asks the learner to evaluate ontology creation in relation to automated metadata creation for audio visual digital materials, and asks learners to reflect on ways to disrupt the Anglo and Western ontologies that are often embedded in these systems.
Software helps companies coordinate the supply chains that sustain global capitalism. How does the code work—and what does it conceal? Posner’s article is both brilliant and approachable, investigating the ramifications of the modular design of supply-chain software: the modular design of both the code and the supply chain make it impossible to fully know what happens at all levels of the supply chain, and thereby makes it impossible to ensure fair labor practices. This article pairs particularly well with discussions of object-oriented programming models.
Noble writes evocatively about the effect of search algorithm biases on users — in this case, young black girls who will find that Google searches for “black girls” do not lead to books about black girls or communities in which young black girls might connect, but instead pornography as the top results. Noble investigates how search engines can actually maintain unequal access and representation, yet are such a foundational aspect of modern life that they are often unquestioned. She also notes that commercial interests often subvert subvert a diverse or at least realistic range of representations.
A Google search for a person’s name, such as “Trevon Jones”, may yield a personalized ad for public records about Trevon that may be neutral, such as “Looking for Trevon Jones?”, or may be suggestive of an arrest record, such as “Trevon Jones, Arrested?”. This writing investigates the delivery of these kinds of ads by Google AdSense using a sample of racially associated names and finds statistically significant discrimination in ad delivery based on searches of 2184 racially associated personal names across two websites. First names, assigned at birth to more black or white babies, are found predictive of race (88% black, 96% white), and those assigned primarily to black babies, such as DeShawn, Darnell and Jermaine, generated ads suggestive of an arrest in 81 to 86 percent of name searches on one website and 92 to 95 percent on the other, while those assigned at birth primarily to whites, such as Geoffrey, Jill and Emma, generated more neutral copy: the word “arrest” appeared in 23 to 29 percent of name searches on one site and 0 to 60 percent on the other. On the more ad trafficked website, a black-identifying name was 25% more likely to get an ad suggestive of an arrest record. A few names did not follow these patterns. All ads return results for actual individuals and ads appear regardless of whether the name has an arrest record in the company’s database. The company maintains Google received the same ad text for groups of last names (not first names), raising questions as to whether Google’s technology exposes racial bias.
Sweeney, L. (2013). Discrimination in Online Ad Delivery. Communications of the ACM, 56(5), 44–54. arXiv.org version available online.
This heavily theoretical piece provides a vital counterweight to the pressure for “scale” in technological projects, and can give cultural heritage project managers a useful vocabulary for questioning demands to follow tightly regulated software development processes when it is not appropriate for community-driven, humanistic work. Tsing shows that while “scalability” is defined as projects that can become larger without changing the nature of the project — expand without changing — such scalability is possible “only if project elements do not form transformative relationships that might change the project as elements are added.” Tsing then highlights the fact that those transformative relationships are necessary for the emergence of diversity, and powerfully argues that meaningful diversity is “diversity that might change things” — and that the model of “scalability” is antithetical to meaningful diversity. These theoretical concepts can be applied to almost any digital community archive project.
“When small projects can become big without changing the nature of the project, we call that design feature “scalability.” Scalability is a confusing term because it seems to mean something broader, the ability to use scale; but that is not the technical meaning of the term. Scalable projects are those that can expand without changing. My interest is in the exclusion of biological and cultural diversity from scalable designs. Scalability is possible only if project elements do not form transformative relationships that might change the project as elements are added. But transformative relationships are the medium for the emergence of diversity. Scalability projects banish meaningful diversity, which is to say, diversity that might change things.
Scalability is not an ordinary feature of nature. Making projects scalable takes a lot of work. Yet we take scalability so much for granted that scholars often imagine that, without scalable research designs, we would be stuck in tiny microworlds, unable to scale up. To “scale up,” indeed, is to rely on scalability—to change the scale without changing the framework of knowledge or action. There are alternatives for changing world history locally and for telling big stories alongside small ones, and “nonscalability theory” is an alternative for conceptualizing the world. But before considering these alternatives, let me return to that familiar domain for experience with scalability: digital technology.”
I generally prefer to write about big picture subjects for my Learning pieces at Source. But today, let’s start from something small that illuminates the way even simple choices affect what we can represent and the stories we can tell. Let’s talk about the most basic datatype we often build our databases from: Boolean fields.
This article explores prototype theory as an alternative to classical theories of classification. This article points to other, more fine-grained methods for classification than traditional systems with rigid boundaries and hierarchies. While this article does not delve into the technical systems needed to implement prototype theory, it is a very useful foundation for discussions on how such a technical system could be designed.
“Classical theories of classification and concepts, originating in ancient Greek logic, have been criticized by classificationists, feminists, and scholars of marginalized groups because of the rigidity of conceptual boundaries and hierarchical structure. Despite this criticism, the principles of classical theory still underlie major library classification schemes. Rosch’s prototype theory, originating from cognitive psychology, uses Wittgenstein’s “family resemblance” as a basis for conceptual definition. Rather than requiring all necessary and sufficient conditions, prototype theory requires possession of some but not all common qualities for membership in a category. This paper explores prototype theory to determine whether it captures the fluidity of gender to avoid essentialism and accommodate transgender and queer identities. Ultimately, prototype theory constitutes a desirable conceptual framework for gender because it permits commonality without essentialism, difference without eliminating similarity. However, the instability of prototypical definitions would be difficult to implement in a practical environment and could still be manipulated to subordinate. Therefore, at best, prototype theory could complement more stable concept theories by incorporating contextual difference.”
We set out to assess one of the commercial tools made by Northpointe, Inc. to discover the underlying accuracy of their recidivism algorithm and to test whether the algorithm was biased against certain groups.
A foundational article in both Artificial Intelligence and critical technical practice, containing a powerful theoretical framework for thinking about the ways that human assumptions and bias enter programming decisions at even the most basic level.
“A critical technical practice will, at least for the foreseeable future, require a split identity — one foot planted in the craft work of design and the other foot planted in the reflexive work of critique. Successfully spanning these borderlands, bridging the disparate sites of practice that computer work brings uncomfortably together, will require a historical understanding of the institutions and methods of the field, and it will draw on this understanding as a resource in choosing problems, evaluating solutions, diagnosing difficulties, and motivating alternative proposals. More concretely, it will require a praxis of daily work: forms of language, career strategies, and social networks that support the exploration of alternative work practices that will inevitably seem strange to insiders and outsiders alike.”
By delving into the material processes of Optical Character Recognition (OCR), as well as the history of OCR tools, this article shows how the statistical models used for automatic transcription can embed cultural biases into the output. This article is particularly relevant to multilingual projects, as it unpacks the effects of OCR software that generally assumes monolingual and orhthographically simple documents.
“Early modern printed books pose particular challenges for automatic transcription: uneven inking, irregular orthographies, radically multilingual texts. As a result, modern efforts to transcribe these documents tend to produce the textual gibberish commonly known as “dirty OCR” (Optical Character Recognition). This noisy output is most frequently seen as a barrier to access for scholars interested in the computational analysis or digital display of transcribed documents. This article, however, proposes that a closer analysis of dirty OCR can reveal both historical and cultural factors at play in the practice of automatic transcription. To make this argument, it focuses on tools developed for the automatic transcription of the Primeros Libros collection of sixteenth century Mexican printed books. By bringing together the history of the collection with that of the OCR tool, it illustrates how the colonial history of these documents is embedded in, and transformed by, the statistical models used for automatic transcription. It argues that automatic transcription, itself a mechanical and practical tool, also has an interpretive effect on transcribed texts that can have practical consequences for scholarly work.”