I have seen some new trends in the social sciences and in the humanities, where data mining and frequency analysis methods to examine an increasing variety of works of art, literature, and even communications. Reducing Shakespeare’s Hamlet to a database, which can be searched for the actions and personal connections every character has in the play seem to have reaped some interesting new perspectives… This also creates new categories of questions one can bring into deep examination of some of these cultural artifacts…
The following article about ‘hacking’ Skype by use of frequency analysis techniques is something which the NSA has been doing for a long time (albeit not to Skype…necessarily…?). This is a form of what could be seen as cryptanalysis…to ‘decode’ the content of what is being communicated.
For me, the idea that there are some new and powerful methods to dig deeper into great works of literature is wonderful…I do, however, prefer to spend the time with the play using a pen and paper…I guess that I am far too old school for some things…
Hamlet and the region of death
http://www.boston.com/bostonglobe/ideas/articles/2011/05/29/hamlet_and_the_region_of_death/?page=full
For as long as anyone can remember, the basic task of literary scholarship has been close reading. Sit down with a book, pencil in hand, read, pay attention—, and then tell the world what you noticed.
Franco Moretti, however, often doesn’t read the books he studies. Instead, he analyzes them as data. Working with a small group of graduate students, the Stanford University English professor has fed thousands of digitized texts into databases and then mined the accumulated information for new answers to new questions. How far, on average, do characters in 19th-century English novels walk over the course of a book? How frequently are new genres of popular fiction invented? How many words does the average novel’s protagonist speak? By posing these and other questions, Moretti has become the unofficial leader of a new, more quantitative kind of literary study.
To many readers — and to some of Moretti’s fellow academics — the very notion of quantitative literary studies can seem like an offense to that which made literature worth studying in the first place: its meaning and beauty. For Moretti, however, moving literary scholarship beyond reading is the key to producing new knowledge about old texts — even ones we’ve been studying for centuries.
Most recently Moretti has turned his attention to what might be the most familiar text in English literature: “Hamlet.” Using the play as a kind of test case, Moretti diagrammed and quantified the plot, charting the relationships among characters as a network based strictly on whether they speak to one another at any point in the play. He published the results in an article, “Network Theory, Plot Analysis,” in the March/April 2011 issue of the New Left Review.
Seen through Moretti’s network diagrams, “Hamlet” often seems brand new. You could notice, for example, that of all the characters who speak to both Hamlet and Claudius, only two manage to survive the play (Moretti calls this part of the network the “region of death”). Or one notices that Rosencrantz and Guildenstern, the most famous pair of minor characters in all of Shakespeare, never speak to each other.
Linguists break into Skype conversations
http://www.newscientist.com/blogs/onepercent/2011/05/words-leak-from-encrypted-onli.html
Chatting over internet phone networks like Skype may not be as secure as once thought:
Security researchers have shown that encrypted voice-over-internet-protocol (VoIP) conversations can be partially understood by an eavesdropper.
Transmitting voice data through the internet securely involves encoding and then encrypting speech. This combination of two signal-processing techniques means the size of the encrypted data packets reflect properties of the original speech, a key vulnerability that allowed a team of computer scientists and linguists at the University of North Carolina at Chapel Hill to reconstruct words and phrases from a VoIP call.
The team listens in by splitting the sequence of encrypted VoIP data packets into sequences that correspond to phonemes, the short sounds that form the building blocks of speech. They then apply linguistic rules to turn a string of phonemes into words – for example, the spoken conjunction that sounds like "zzdr", which occurs in the middle of "eavesdrop" (say it out loud and you’ll hear it) never appears at the start of an English word.
The researchers compare the technique to the way infants learn to understand speech, segmenting the stream of sound coming from an adult’s mouth into words by using linguistic clues such as separating out their own name.
Users don’t need to worry about people listening in on their entire Skype conversation though, as the success of their technique varies widely. The team tested it on 6300 recordings in eight American English dialects and evaluated the performance using METEOR, a widely used scoring system for comparing machine translation techniques. Only 2.3 per cent scored over 0.5, meaning they are generally considered understandable, though some scores were much higher with near-perfect recovery of full sentences.


