Markov and You

In response to your statement that “Lasagna Cat” is the one link we must click on I only have this to say (with compliments to Adam Sandler)…

“… what you’ve just said is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it. I award you no points, and may God have mercy on your soul.”

Jeff,
Some day you should try this site in IE8.
It looks weird.

I’m sorry, it can’t be a paul graham essay parody without an excessive number of footnotes.

there is a lot of talk about randomly generated code here…

genetic algorithms are good for this, and infact can independently solve or optimise many problems (nothing complicated sadly). you will probably want to write a language and compiler specially though…

the basic premise is to copy how evolution works, i.e. introduce random changes to the code and test it against some fitness function. keeping the most fit programs and culling the remainder, then repeating the randomisation process. it looks like markov chains could be applied to optimise this… (i.e. use that distribution instead of ‘true’ randomness) if someone hasn’t already done this. :slight_smile:

might be a way to avoid creating a language just for this purpose… since it could recreate good syntax for an existing language, at least some of the time.

Well, this may finaly explain the comic Zippy. It is a Markov chain.

The first time I came across Markov Chains was a blog post about creating fake Italian surnames.

http://doubtingtommaso.blogspot.com/2008/03/markov-chains.html

Pretty cool tech! Quite powerful given how broadly it can be applied.

These are those fun comments I see on some of my sites (ussually followed by a link to some spam site), I never knew quite how they were generated. If only we could somehow apply them to mad-libs…

Well hi! Thanks for, er, making an example of me, Jeff. Glad you like it.

  • “Is it a coincidence that I wrote a chatbot with MegaHal(another 4th order markov chain) for a game lobby this morning or are their spy cameras in my cereal!?”

I can’t speak for building security, Tom, but MegaHal is unquestionably what first turned me onto Markov theory back in the day.

  • “I’d like to see this applied to music and see what happens if you feed it all the works of Mozart…”

It’s been done. Heck, I did a very small version of it myself for a project in college – analyzing only melodies, and looking at melodic interval and time intervals as two separate tables rather than at melodic+time as a single block. What I produced were melodies that sounded like Mozart teeteringly drunk and standing on one foot before the keyboard; but that’s audibly different from truly randomly generated notes, so the project was considered a success.

One of the challenges with using Markov theory to synthesize creative output is that you have to decide how far-seeing your criteria is, and that leads to a tradeoff between novelty and coherence.

A 1-order markov model says, here’s word A, which set of words {B} can follow it? Then you pick a single word B from {B}, and see what {C} contains, and so on. It turns out that 1-order chains lead to miserable nonsense. Garkov (and MegaHal, and I think probably most Markov language toys) is a 2-order model: given words A and B in sequence, look at possible followups {C}. The three-words-at-a-time process gets us a lot closer to coherence, but it also means that the sentences are less novel – there are fewer places where [A, B] has multiple options in {C}, and there are fewer options in {C} when it is plural.

A high order markov model produces very coherent output but rarely produces novel output. But there’s also a balance of the order of your model against the size of your corpus – the collection of data you’re feeding into it – and as corpus size goes up, coherence creeps down. So choosing the order of your model depends on how much data you’re working with as well.

Which is all to say, as it works for words, it also can work for music, but what the results are like depends on (among other things) how you parse the music on input, how much music you’re processing, and the order of model you’re using to analyze and synthesize new stuff.

  • “That’s my IRC bot which uses a Markovian style algorithm.”

Markov IRC bots are one of the best ideas since sliced bread, yeah.

  • “That’s more a feature than a bug :slight_smile: By using a specific text as an input you can produce a text sounding like the input. E.g. texts in old English or try poems etc.”

Exactly, David. Try it with arithmetic for some really bizarre stuff; try it with source code as well. Markov is naive, but very willing.

  • “How about Markov/Dilbert project - oh wait, it already works that way…”

Heh. In principle, the Garkov code is actually Garfield agnostic – it’s just a bidirectional markov structure and some generic display code, paired up with a garfield display font, some garfield background strips, and a bunch of garfield transcripts. I’d very much like to do some other comics with it in the future – the big trick is those transcripts, so if anybody wants to spend a saturday plowing through Dilbert (or, better, Mary Worth), let me know.

  • “Or maybe even analyse adjacent pixels in graphics/photos and see what kind of weird amorphous blob it produces.”

Ha! That could be fun. Difficult, but fun. Putting together a big corpus – and preprocessing it somehow to get the image complexity to a nice middle area in terms of total number of colors – would probably be the biggest challenge.

  • “Is the same technique used for the order of the words?”

I know you a-righted yourself already, but Markov models done at the letter rather than the word level have been done before as well, and are rather fun. If you want to generate some new plausible non-words in the language of your choice, a 2-order model does a great job. There is a nice brief writeup that touches on that line of thinking (with a bilingual twist) : http://www.teamten.com/lawrence/projects/markov/ . I know I have seen others, but I can’t google to save my life this morning.

  • “Well, this may finaly explain the comic Zippy. It is a Markov chain.”

Bill Griffith is a genius.

I’m going on at irresponsible length. This is fun stuff, and again, I’m glad you like the Garkov.

-j

Josef became a novel expands on the south west side of energy). She names him as a children’s advocacy group that he moved to the overbridge at the National Urban League; Alan Wolfe of the Kapelle, Reicha was served mainly by Sheffield Midland - Leeds stopping services. The line is markedly virtuosic, reflecting his own time, Miss Fellowes fights the Brookings Institution; Norman J. Ornstein, resident scholar Ludwig Schiedermair in Wallerstein, before his letters to return Timmie to liberate Timmie.
Josef Reicha wrote music publishing firm, who played in ‘Der junge Beethoven’ (Leipzig, 1925) gave specific examples taken from

Only because you brought it up (in a way), Jeff: Are you still using POPFile?

Chapter 3 “Design and Implementation” of Kernighan and Pike’s The Practice of Programming, http://books.google.com/books?id=to6M9_dbjosC, features the Markov Chain algorithm. The chapter also compares the algorithm’s implementation in C, C++, Perl, Java, and Awk. Read it; it is elegant.

I found a svada generator some time ago, which probably works in the same way. It generates grammatically correct sentences without any meaning.

http://www.geekhouse.no/~anders/Svada/

Here is a sample output:
http://www.geekhouse.no/~anders/Svada/wrapper.cgi?lines=42grammar=oopl

Just to point it out, it looks like the Garfield Randomizer appears to be defunct no longer.

Machine intelligence music composers are getting better and better. “Vladimir” is one of the latest efforts, and it really does quite well with classical music; less so with more modern music–although I don’t know that you could tell the difference between a 12 tone row based composition by a human or a robot. http://www.aaai.org/AITopics/pmwiki/pmwiki.php/AITopics/Music

@phatmonkey - you sir, are a freaking CompSci legend.

Jeff,

you can hear similar nonsense without Markov chains if you interview people for programming jobs on a daily basis.

“Never underestimate the power of human stupidity.” (Robert Heinlein)

@Victor,

“Any good sources for Markov chain generators which take binary input? It could be fun to see what kind of executables it would produce…”

That, my friend, would be wicked cool! Although I would run such a program inside a vault just as a precautionary act, in case our Markovian baby decides to write anything on the hard disk! :slight_smile:

@Victor:

It might be easier to randomly generate Assembly Language

Marius, owing specifically to it’s strong gramaticallity, the Svada thing looks to be a different beast from a Markov model (at least at the word-by-word level).

Markov’s greatest strength is it’s simplicity – it is a tiny whisp of a principle from which all kinds of exciting and surprising things can emerge, including e.g. novel, semi-coherent text. But you can never depend on anything more than semi-coherence on the whole.

The model’s greatest weakness is it’s lack of context. It knows nothing of sentence structure, or of the logical relationships between words in its tables beyond sheer proximity. Part of what makes Markov language output amusing in many cases is the tendency toward sudden surreal shift across an unexpected point of transition; that that transition is sometimes grammatical or even semantically plausible is just a chance of luck, not a feature of the model.

A pure Markov model is in that sense literally more context-free than even a Context-Free Grammar.

It’s possible that the Svada code depends on a CFG for generating its sentences; or it may rely some more structurally complex model, or on templates. It’s very neat work, regardless, but I don’t believe at a glance that it is Markovian at all at the level of sentence-construction.

Ah, sweet Finnegan:

riverrun, past Eve and Adam’s, from swerve of shore to bend
of bay, brings us by a commodius vicus of recirculation back to
Howth Castle and Environs.
Sir Tristram, violer d’amores, fr’over the short sea, had passen-
core rearrived from North Armorica on this side the scraggy
isthmus of Europe Minor to wielderfight his penisolate war: nor
had topsawyer’s rocks by the stream Oconee exaggerated themselse
to Laurens County’s gorgios while they went doublin their mumper
all the time: nor avoice from afire bellowsed mishe mishe to
tauftauf thuartpeatrick: not yet, though venissoon after, had a
kidscad buttended a bland old isaac: not yet, though all’s fair in
vanessy, were sosie sesthers wroth with twone nathandjoe. Rot a
peck of pa’s malt had Jhem or Shen brewed by arclight and rory
end to the regginbrow was to be seen ringsome on the aquaface.
The fall (bababadalgharaghtakamminarronnkonnbronntonner-
ronntuonnthunntrovarrhounawnskawntoohoohoordenenthur-
nuk!) of a once wallstrait oldparr is retaled early in bed and later
on life down through all christian minstrelsy. The great fall of the
offwall entailed at such short notice the pftjschute of Finnegan,
erse solid man, that the humptyhillhead of humself prumptly sends
an unquiring one well to the west in quest of his tumptytumtoes:
and their upturnpikepointandplace is at the knock out in the park
where oranges have been laid to rust upon the green since dev-
linsfirst loved livvy.