css.php

Author Archives: Jeffrey Binder

Realism and Utopianism in Discussions of Digital Labor

All of the readings we did for this week touch on a tension that exists in many forms of Left politics. On the one hand, there is a desire to find sites of possible resistance and encourage the development of an affective basis for modes of activity that exist outside of Capitalism, which seems to be what Barbrook is attempting to do in his discussion of the early open source community. On the other hand, there is the desire to bring injustices into the light—hence, the “Wages for Facebook” manifesto’s attempt to enable a discussion of the exploitative nature of Facebook by referring to people’s use of the service as “work.” One thing I find particularly striking about this juxtaposition is the way in which naming an injustice produced by Capitalism seems to require that we take on the language of Capitalism, referring to an ostensibly personal form of activity as “work,” while Barbrook’s claim that the Internet is “anarcho-communist”—as radical as the terminology of this argument is—seems to imply a much less critical attitude towards the world as it is.

While I am generally more sympathetic to Ptak than Barbrook, I am concerned that interventions like “Wages for Facebook” will only get us so far. Referring to Facebook activity as “work” brings an issue that was hidden to the foreground, but it also seems to go right along with a broad trend in American discourse towards framing more and more activities within Capitalist and managerial categories. Referring to clicking like buttons as “work” and stating that ad-driven services treat users as “products” may serve the interest of social realism, but this rhetorical move could also backfire, feeding right into the madness that, for instance, leads people to obsess over their “personal brands” and use the word “metric” to refer to any sort of standard against something is to be judged, acting as if the entirety of the world worked like a marketing firm.

One writer who has noted this specific issue in oppositional discourse is Theodore Adorno. At the end of his book Minima Moralia, Adorno arrives at a critique of Hegelian dialectic from the perspective of the anti-fascist Left. One of the issues that he raises is the tendency of dialectical thought to inadvertently reinforce the system that it is meant to critique. To use Adorno’s example, the Left must acknowledge that the romantic view of marriage can cover up the exploitative economic relations that underlie the institution. But if we instead reframe marriage as a purely economic arrangement—realistic as this view may be—we can lose sight of the possibility that it could or should be something more. The structural analogy between the two sides of a dialectic, Adorno argues, makes the immediate division of Hegel’s followers into Left and Right factions inevitable—since the vocabulary of pro- and anti-Capitalist writing is necessarily similar, politics comes to be discernable less in the formal character of a work than in the social, institutional, and discursive formations in which it is enmeshed.

We might find an example of this structural analogy between Left and Right in Franco Moretti’s work on the literary marketplace, which we discussed last semester. In a cursory reading, Moretti’s Marxist account of the “slaughterhouse of literature” could easily be mistaken for a Capitalist analysis—the terms of discussion (market, product, consumer) are largely the same. Christopher Prendergast’s accusation that Moretti’s use of evolutionary ideas makes him a Social Darwinist is unfair, but I don’t think it’s too implausible that Moretti’s work could be mistaken for a Social Darwinist project on a cursory reading, given how much his terms of analysis borrow from this viewpoint. The difference is a hair’s breadth.

Adorno responds to this seeming bind with one of his most famous aphorisms: “The only philosophy which can be responsibly practiced in the face of despair is the attempt to contemplate all things as they would present themselves from the standpoint of redemption” (247). This formula suggests a different grounding for criticism that aims for transcendence rather than for “realism” of the sort that borrows its terms from the prevailing order. There are some ways in which this response is problematic, and it certainly dates itself as pre-1968. Adorno’s approach bears a suspicious resemblance to the positivist idea of a “view from nowhere;” and his big problem, of course, is his tendency to presume knowledge of what is best for other people. But I think this formula might still be useful in thinking about our own motivations in undertaking online labor. How would we articulate our reasons for blogging, for contributing to Wikipedia, or for clicking a like button if exploitation finally came to an end? How distorted would our activity appear from this standpoint? Framing the question in this way allows us to name the negative aspects of reality within terms that are not determined by the current order—but unlike the sort of techno-utopianism that peaked in the 1990s, it allows us to keep in mind that the alternative is, and perhaps will always remain, not wholly real.

Course Blogs and the Effects of Exposure

I am teaching with a blog for the first time this semester, and I found the Davis and Halavais articles both useful and resonant with my experiences. The blog helps, first of all, to make sure that the students do the readings carefully, but it also has the potential to improve the quality of in-class discussions. I have found it useful to tailor my lesson plans based on what students write in their posts, making sure to cover things that they have shown an interest in or seem to be having trouble with. Finally, I am using the blog as a part of the scaffolding for the major paper assignments, giving the students opportunities to try things out and get feedback on their writing and ideas before they begin writing their papers.

One reservation I have, though, is that opening students’ work to a wider audience could have a negative effect on some students. Halavais notes that having students read each others’ writing puts more pressure on them—many students feel more embarrassed when they share bad work with their peers than when they share it with their instructors only. My own experience confirms this. But I worry that this sort of exposure could have a chilling effect on students who feel marginalized within the university community or within a particular class. Imagine a queer student who is only partially out in their university. They might be interested in writing a paper about queer themes, but discouraged from it because they don’t want to reveal their identity to their peers. Of course, in most classes they would have he option to write about something unrelated to sexuality—but this situation would encourage a sort of self-censorship that is eerily reminiscent of the panopticon.

I don’t have a particular solution to this problem in mind. One idea that occurs to me would be to have the students write pseudonymously, so that they don’t know who is writing what—but is would only partially solve the problem (people still might fear rejection in the online space), and it would also make it more difficult to transfer discussions between the blog and the classroom. I am also not sure whether this potential drawback outweighs the benefits of exposing students to each others’ writing—which seem, in my as-yet limited experience, significant.

I’m wondering what the rest of you think about this—and I’d love to hear about your experiences incorporating blogging into a class.

Jeff Binder’s Project Proposals: Language Models and Clichés

#1: The Distance Machine

Over the past year, I have been working on a program called the Distance Machine, the primary function of which is to identify words in a text that were uncommon at a given point of time according to a statistical model of Google’s Ngrams data. At present, though, this program doesn’t quite accomplish what I ultimately want to do in this project, which is to look at how the statistical approach to studying the English Language relates to earlier forms such as the dictionary. In the current version, the user is required to select a corpus upon entering a text, and there is no way to change the selection short of re-entering it. As such, although it provides an easy way of finding exceptions to the patterns that appear in one particular corpus, it also makes it far too easy to take a single model as a ground truth about how the language has changed over time. I would like to rework the program so that it is easier to compare different representations.

Personae:

Dr. Casaubon is a scholar of 19th-century American literature. He is working on a critical edition of Charles Brockden Brown’s political writings, for which he is trying to understand the implications of certain political terms at the time when Brown was writing.

Annie Cratylus is an undergraduate English major. She has taken an interest in the ways in which language can uphold hierarchical systems. She is currently working on a paper about how the language of radical feminist writings from the 20th century deviates from the ordinary usage of the time.

Prof. Trotsky is a Marxist literary critic doing a project on working-class British poetry from the early 19th century. She is interested in investigating the class aspects of language standardization efforts in that time period, especially in regard to the choices of vocabulary used in the poetry of John Clare.

Use case:

After hearing her project idea, Annie Cratylus’s professor tells her she might want to look into the idea of a corpus. In researching the concept, she comes across the Distance Machine. She pastes a copy of a chapter from Judith Butler’s Gender Trouble into the program and clicks “Go.” The program shows a number of instances where Butler’s language deviates from the expectations set up by the Google Books corpus. Using some of these examples to illustrate her point, Jenny writes a paper arguing that radical writing has to confound the expectations created by attempts to contain language within the bounds of the statistical.

Ideal version:

A full-fledged version would expand the program so that it can work with language models that incorporate information about word order in addition to word frequency. One way of doing this would be to incorporate the full Ngrams data set, rather than just the frequencies for single words. Based on this, the tool could highlight phrases of up to five words. Processing the full data set would require supercomputing resources, and the program would have to be transferred to a server with at least a few terabytes of storage capacity. I would also have to change the interface so that it could highlight overlapping units of the text, rather than discrete words. A somewhat less computationally intensive way of experimenting with more complex language models would be to generate a model based on a smaller corpus, which would present somewhat less of a challenge in terms of data management.

Dealing with a data set this large would require some skills that I don’t have at present. The scripts that I wrote to process the data would have to be changed so that they could run in parallel. There is also a chance that MySQL wouldn’t be up to the task of storing that much data, so I might have to learn another database system; and I might also have to change my PHP code to be more efficient so that the program is not excessively slow. This is a project that I would be unable to do without getting a major grant.

Simple version:

In this version, I would stick with simple word-frequency models, but add a number of different corpora, covering various time periods and genres of literature. One corpus that would be particularly useful is Phase I of the EEBO-TCP (Early English Books Online-Text Creation Partnership) corpus, which includes over 25,000 books published between 1500 and 1700. Another one would be the English Fiction version of the Google Ngrams corpus. I am also interested in creating corpora based on the full text of long-running periodicals like the New England Quarterly or The Atlantic. After preparing the corpora, I would have to change the PHP code so that it produces annotations for all corpora rather than just one and add a user interface for switching between them.

I don’t think this will require any major skills that I don’t already have, apart from learning the quirks of the various data sets that I will be using and possibly figuring out how to get the texts I need from a database. It would likely take a few months to get this done, since some of the data processing scripts could take days to run and, being realistic, I will likely have to try them multiple times before I get them to work right.

 

#2: Reading Clichés

Even in its most radical forms, literary criticism has generally centered its analysis on the text, tending to refer back to the original even when its authority is under suspicion. But one of the ways in which literary works can resonate most strongly in a culture is through modes of transmission other than the reproduction of a text. This project will attempt to provide a way of “reading” a very different cultural form from the text: the cliché. It will provide visualizations representing the history of particular clichés, including graphs showing trends in what sorts of books used them over the past few hundred years and markers indicating significant events (prominent usages, relevant historical events, shifts in usage or meaning). It will also include some text explaining the project and providing a theoretical context for the project.

Personae:

Dr. Casaubon is working on a book about the reception history of Herman Melville’s work. He wants to include a section about process by which people adopted the “White Whale” as a general way of referring to an object of obsession.

Prof. Trotsky is investigating the spread of political slogans. She wants to understand the ways in which phrases that originally had a political charge can come to be drained of it through repetition, and she is curious whether the numbing effect that repetition can have is dependent on a particular set of social conditions.

Mr. Shandy is a college-educated administrative assistant with an enthusiasm for language trivia. He is fascinated by the history of phrases, and is interested in finding out how the most famous quotations from his favorite books came to be widely known.

Use case:

Mr. Shandy comes across the site while searching for information about the phrase “time is money.” Looking at the visualization for this phrase, he learns that, although the phrase is often thought to have originated in Benjamin Franklin’s Poor Richard’s Almanack, it appeared decades earlier in a periodical called The Free-Thinker. He is also able to see some of the other books that used the phrase, such as Charles Dickens’s Nicholas Nickleby and get a sense the way in which the phrase came into increasing prominence in the late 20th century.

Ideal version:

A full-fledged version of this project would consist of a large database of clichés, either taken from a curated list or identified automatically, with information about the usage history of each. If the size of the database is to be large, it would be necessary either to produce the data for the timelines automatically or crowdsource it. For each cliché, the user would be able to open a page with a visualization showing the usage over time in a number of broad genres of books (Fiction, Biography, Biology, etc.), along with markers showing prominent and representative usages. I also might be able to include information about which clichés tend to co-occur together in the same books.

If this project is feasible at all, it would probably be doable in about 6 months. The major difficulties would be in procuring the necessary data and defining the bounds of what counts as a cliché. Assuming I could figure out a way of doing that, I should be able to get together what I need using scripting tools I am already familiar with. I could make the visualizations I am envisioning using JavaScript and D3. I would also need some sort of content management system for the site; which one would be best for this project I am not really sure, so I would need to do some research into this and maybe learn a new system.

Simple version:

A simpler way of going about this would be to pick a small number of clichés (about 12) and semi-manually gather the information I need. For each cliché, I would present a graph showing its usage in various types of books, along with some indications of prominent and representative usages. The prominent usages could be identified manually or on the basis of some data about the popularity of books (although this could potentially become problematic, so I would have to think hard about my choices). I could identify representative usages by taking random samples from various time periods and manually going through them.

It should be possible to get this mostly done in a few months. I believe I could get all the data I need using HathiTrust’s Solr API, although I ought to get in touch with them before I run large queries. Once I have access, I could easily write a script that downloads bibliographic data about all books that match a particular phrase. I could build the visualization using tools I already know, and the simple version would not be too demanding on the content management system.

Jeffrey’s project ideas: text manipulations

Here are some ideas I had for my project this semester.

  1. One idea would be to try to develop a way of visualizing the establishment of clichés over time—especially ones that originate in quotations from literary texts. It would be possible to track the histories of relatively short clichés using the Google Ngrams data set, although that would require Big Data-level computing. I could also do this on a smaller scale (and with a lot more flexibility) using the just-released EEBO-TCP corpus, which includes manually transcribed versions of over 25,000 early modern English books.
  1. I might try to do something with computerized outlining tools. The work that I’ve done so far is way on the complicated side, so in the spirit of this class it might be useful to try to come up with a minimal viable product. In an ordinary outline, one line might be indented beneath another for any number of reasons—it might expand on an idea, provide an example, give a possible counterargument, etc. By including symbols that make these relationships explicit, it is possible to manipulate the structure using a computer—something that can be used, for instance, to play around with different possible structures for a paper in an interactive way.
  1. I’ve been toying around with the idea of developing a programming environment specifically designed for working with texts. There was an attempt to create a programming language for humanists way back in 1970, but nothing this century as far as I know. We have mostly picked up general-purpose languages like Python. But some of the basic operations that we have to do in manipulating texts—stripping tags, parsing document structures, tokenizing—can be awkward in these systems, and it can be difficult to the user to tell whether these operations are working right with a particular body of text. It would be much easier to work in an environment with immediate feedback. Imagine having your code on one side of the screen and a visualization of a text on the other, with annotations that indicate how the text is being chopped up, and that change immediately when you change the code. This project would constitute a desktop application along with either an interpreter for a new programming language or a library for an existing one that includes functions for the interactive manipulation of texts.

Bio: Jeff Binder

I am a 2nd year doctoral student in the English program. My focus is on technologies of reading and writing in the 18th and early 19th centuries; some of the things my work has focused on include the development of the back-of-the-book index, dictionaries and language standardization efforts in the early U.S., and imaginary accounts of poetry-writing machines in the long 18th century. I am interested in bringing these older technologies into dialogue with contemporary computer technologies, including digital publishing and text mining/corpus linguistic techniques.

Before I came to the Graduate Center, I worked as a database and data visualization programmer, first at Nature Publishing Group and then at NYU Medical School. Although I studied literature in college and at the MA level, I also have an extensive background in computers, especially in graphics and compilers/programming language design. The inspiration for the work that I am doing now came in part from my experience working with faculty data in a large university. One of my overall goals is to historicize the roles that databases and other computer technologies play in organizations like universities. What assumptions, I want to ask, underlie the way in which we incorporate these technologies into our institutions at the present moment?

Like many people who have switched from programming to the humanities, I am strongly committed to a humanistic approach, and I am wary of scholarly approaches that straightforwardly attempt to apply computer science methodology to the humanities. What I have tried to do instead is engage with computers as a historically-situated object of study that sits on a level with material from the past. I began taking this approach in my first major project, a collaboration with Collin Jennings in which we looked at the index from the 1784 edition of The Wealth of Nations in comparison with a topic model generated using the text. I also wrote a sort of manifesto about the possibility of a critical approach to text mining for Core 1, and I am hoping to carry on in this vein in Core 2.

On a more practical level, I have been working on developing software to help with scholarship and teaching in the humanities. One project that is fairly far along in its development is the Distance Machine, a program that identifies words in a text that were unusual at a particular point in time based on a statistical model of the Google Ngrams corpus. I also have been experimenting with ways of manipulating outlines of texts using computer logic, either as a way of helping people come up with ideas for writing or for playing with conjectures about the structure of an existing text (this program is not online at the moment, but I have a prototype that I could demo on my computer if anyone is interested). Last but not least, I am a fan of Twitter bots. I created one so far—Coleridge Bot—and I have a few more ideas in the pipelines.