Markov and You

Goran · June 11, 2008, 12:00am

How about Markov/Dilbert project - oh wait, it already works that way…

phatmonkey · June 11, 2008, 12:00am

Jeff, that should be simple enough, there’s a Twitter Python API. I’ll try it now.

GrahamS · June 11, 2008, 12:00am

Is the same technique used for the order of the words?

i.e. Once you have used Markov Chains to randomly generate a big pool of words could you then also use a word-level Markov Chain to say “this word is typically followed by one of these words”?

Weeble · June 11, 2008, 12:00am

If only I could use Markov chains to automatically write code…

GrahamS · June 11, 2008, 12:00am

Never mind. Having played with the online Markov Generator I see they are doing it on a word-level there.

Like ChrisL I am now wondering what happens if you apply this to music.

Or maybe even analyse adjacent pixels in graphics/photos and see what kind of weird amorphous blob it produces.

EricD · June 11, 2008, 12:00am

Interesting…

I’m thinking tweetkov, make a twitter account, follow as many people as possible, use their random gibberish as input and see if tweetkov’s tweets make any more sense than the average twitter user.

…I just might do that.

Victor · June 11, 2008, 12:00am

Any good sources for Markov chain generators which take binary input? It could be fun to see what kind of executables it would produce…

Could this be used to test a platform’s stability? I guess Markov chains would generate something executable much more often than a simple random binary generator, if based on existing binaries.

Jheriko · June 11, 2008, 12:00am

interesting. i never knew what a markov chain was before… even though i have implemented more than one… after all getting a distribution from data and then using it to produce a model by sampling the distribution isn’t really a big leap of understanding. i’d always just labelled this as “a monte-carlo method” and never accorded it special distinction…

interesting that it is so effective for langauge too… although still, the sentences are gibberish. however… i wonder if a more comprehensive model of writing could not be made by analysing more than just the constituent letters of words or their order in text.

something for me to mess with coding whilst bored at work anyway…

Tom_Clarke · June 11, 2008, 12:00am

Or maybe even analyse adjacent pixels in graphics/photos and see what kind of weird amorphous blob it produces.

I’m wondering if the new game Spore uses something like Bayesian or Markovian chains for its procedurally generated artwork. I would actually really like to see a procedurally generated computer game storyline based around Markov chains. I very much doubt anyone would notice…

Dave · June 11, 2008, 12:00am

In response to your statement that “Lasagna Cat” is the one link we must click on I only have this to say (with compliments to Adam Sandler)…

… what you’ve just said is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it. I award you no points, and may God have mercy on your soul.

Nick · June 11, 2008, 12:00am

I’m sorry, it can’t be a paul graham essay parody without an excessive number of footnotes.

Jheriko · June 11, 2008, 12:00am

there is a lot of talk about randomly generated code here…

genetic algorithms are good for this, and infact can independently solve or optimise many problems (nothing complicated sadly). you will probably want to write a language and compiler specially though…

the basic premise is to copy how evolution works, i.e. introduce random changes to the code and test it against some fitness function. keeping the most fit programs and culling the remainder, then repeating the randomisation process. it looks like markov chains could be applied to optimise this… (i.e. use that distribution instead of ‘true’ randomness) if someone hasn’t already done this.

might be a way to avoid creating a language just for this purpose… since it could recreate good syntax for an existing language, at least some of the time.

DennisH · June 11, 2008, 12:00am

The first time I came across Markov Chains was a blog post about creating fake Italian surnames.

http://doubtingtommaso.blogspot.com/2008/03/markov-chains.html

Pretty cool tech! Quite powerful given how broadly it can be applied.

Kris · June 11, 2008, 12:00am

These are those fun comments I see on some of my sites (ussually followed by a link to some spam site), I never knew quite how they were generated. If only we could somehow apply them to mad-libs…

JoshM · June 11, 2008, 12:00am

Well hi! Thanks for, er, making an example of me, Jeff. Glad you like it.

“Is it a coincidence that I wrote a chatbot with MegaHal(another 4th order markov chain) for a game lobby this morning or are their spy cameras in my cereal!?”

I can’t speak for building security, Tom, but MegaHal is unquestionably what first turned me onto Markov theory back in the day.

“I’d like to see this applied to music and see what happens if you feed it all the works of Mozart…”

It’s been done. Heck, I did a very small version of it myself for a project in college → analyzing only melodies, and looking at melodic interval and time intervals as two separate tables rather than at melodic+time as a single block. What I produced were melodies that sounded like Mozart teeteringly drunk and standing on one foot before the keyboard; but that’s audibly different from truly randomly generated notes, so the project was considered a success.

One of the challenges with using Markov theory to synthesize creative output is that you have to decide how far-seeing your criteria is, and that leads to a tradeoff between novelty and coherence.

A 1-order markov model says, here’s word A, which set of words {B} can follow it? Then you pick a single word B from {B}, and see what {C} contains, and so on. It turns out that 1-order chains lead to miserable nonsense. Garkov (and MegaHal, and I think probably most Markov language toys) is a 2-order model: given words A and B in sequence, look at possible followups {C}. The three-words-at-a-time process gets us a lot closer to coherence, but it also means that the sentences are less novel → there are fewer places where [A, B] has multiple options in {C}, and there are fewer options in {C} when it is plural.

A high order markov model produces very coherent output but rarely produces novel output. But there’s also a balance of the order of your model against the size of your corpus → the collection of data you’re feeding into it → and as corpus size goes up, coherence creeps down. So choosing the order of your model depends on how much data you’re working with as well.

Which is all to say, as it works for words, it also can work for music, but what the results are like depends on (among other things) how you parse the music on input, how much music you’re processing, and the order of model you’re using to analyze and synthesize new stuff.

“That’s my IRC bot which uses a Markovian style algorithm.”

Markov IRC bots are one of the best ideas since sliced bread, yeah.

“That’s more a feature than a bug By using a specific text as an input you can produce a text sounding like the input. E.g. texts in old English or try poems etc.”

Exactly, David. Try it with arithmetic for some really bizarre stuff; try it with source code as well. Markov is naive, but very willing.

“How about Markov/Dilbert project > oh wait, it already works that way…”

Heh. In principle, the Garkov code is actually Garfield agnostic → it’s just a bidirectional markov structure and some generic display code, paired up with a garfield display font, some garfield background strips, and a bunch of garfield transcripts. I’d very much like to do some other comics with it in the future → the big trick is those transcripts, so if anybody wants to spend a saturday plowing through Dilbert (or, better, Mary Worth), let me know.

“Or maybe even analyse adjacent pixels in graphics/photos and see what kind of weird amorphous blob it produces.”

Ha! That could be fun. Difficult, but fun. Putting together a big corpus → and preprocessing it somehow to get the image complexity to a nice middle area in terms of total number of colors → would probably be the biggest challenge.

“Is the same technique used for the order of the words?”

I know you a-righted yourself already, but Markov models done at the letter rather than the word level have been done before as well, and are rather fun. If you want to generate some new plausible non-words in the language of your choice, a 2-order model does a great job. There is a nice brief writeup that touches on that line of thinking (with a bilingual twist) : http://www.teamten.com/lawrence/projects/markov/ . I know I have seen others, but I can’t google to save my life this morning.

“Well, this may finaly explain the comic Zippy. It is a Markov chain.”

Bill Griffith is a genius.

I’m going on at irresponsible length. This is fun stuff, and again, I’m glad you like the Garkov.

-j

Manni · June 11, 2008, 12:00am

Only because you brought it up (in a way), Jeff: Are you still using POPFile?

Karl · June 11, 2008, 12:00am

Chapter 3 “Design and Implementation” of Kernighan and Pike’s The Practice of Programming, features the Markov Chain algorithm. The chapter also compares the algorithm’s implementation in C, C++, Perl, Java, and Awk. Read it; it is elegant.

MariusG · June 11, 2008, 12:00am

I found a svada generator some time ago, which probably works in the same way. It generates grammatically correct sentences without any meaning.

http://www.geekhouse.no/~anders/Svada/

Here is a sample output:
http://www.geekhouse.no/~anders/Svada/wrapper.cgi?lines=42grammar=oopl

khafra · June 11, 2008, 12:00am

Machine intelligence music composers are getting better and better. “Vladimir” is one of the latest efforts, and it really does quite well with classical music; less so with more modern music–although I don’t know that you could tell the difference between a 12 tone row based composition by a human or a robot.

https://web.archive.org/web/20080719235713/http://www.aaai.org/AITopics/pmwiki/pmwiki.php/AITopics/Music

Andrew · June 11, 2008, 12:00am

@phatmonkey - you sir, are a freaking CompSci legend.