Markov and You

If you play with the Markov text synthesizer, you’ll quickly find that Markov methods are only as good as their input corpus. Input a bunch
of the same words, or random gibberish, and that’s what you’ll get back.

That’s more a feature than a bug :slight_smile: By using a specific text as an input you can produce a text sounding like the input. E.g. texts in old English or try poems etc.

As ChrisL suggested, it will be more intersting to try Markov model on music than some text.
Generated music, if it is a function of the previous 5-6 notes will be great, I think.
Text anyway, does not make any sense.

One of my pet projects that I have yet to fully realise is building a markov-model chat bot out of the Web 1T 5-gram corpus: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13

One barrier to implementation is that it’s way too big to fit in memory on a single (reasonable) machine.

Also related is frac.notdot.net, an amusing free-association game that uses a 1st-order markov model to ‘play’ free association with a human player, expanding its corpus as it goes. The corpus is quite sizeable by now.

Only because you brought it up (in a way), Jeff: Are you still using POPFile?

No, I’ve completely outsourced all my mail to Google / Microsoft / Yahoo now that they allow hosted email. I figure they’re a lot smarter than I am. Fighting spam is interesting, but thankless and repetitive. Gmail does the best job, though occasionally (about once every two weeks) something will creep through.

The generator always correctly predicts that a carriage return is followed by one of these prefixes, so the generated text still looks like an IM conversation.

That’s the amazing thing about Markov chains. As Josh Millard, the author of Garkov, pointed out in this very thread: they’re so ridiculously simple, yet they work almost eerily well.

Also thanks, Josh, for making an appearance and offering such useful insights on your Markov experience.

They’re a reminder that Garfield CANNOT TALK. When you remember that Arbuckle cannot know what Garfield is thinking, the strip gains a new level of ridiculousness.

Ah, yes. You’re absolutely right. That’s what I wasn’t getting out of Arbuckle ( http://www.tailsteak.com/arbuckle/ ).

“That’s more a feature than a bug :slight_smile: By using a specific text as an input you can produce a text sounding like the input. E.g. texts in old English or try poems etc.”

Yes, that is a fantastic feature. If you start talking to Kooky in other languages (German and Danish are the ones I know about, there are probably others), he will reply in that language. You never see these languages show up in English channels simply because there are no relationships with the foreign words (apart from a few exceptions - die, bin in german etc).

phatmonkey, there were some requests to get kookybot on Twitter ( http://twitter.com/orph/statuses/832119604 ), though I’m not sure how that would work, as it’s a radically different environment than IRC.

Those top kookybot quotes are genius. I cannot remember the last time I have laughed so hard. WARNING, however, do not click on any links the KookyBot provides. You’ll be sorry. :frowning:

t – 1? Is that t + 1?

How about Markov/Dilbert project - oh wait, it already works that way…

Jeff, that should be simple enough, there’s a Twitter Python API. I’ll try it now.

Is the same technique used for the order of the words?

i.e. Once you have used Markov Chains to randomly generate a big pool of words could you then also use a word-level Markov Chain to say “this word is typically followed by one of these words”?

@Graham Stuart,
I don’t know. phatmonkey will be able to answer that.
But if it is possible, then we could improve the system by tolerating similar words.

Sorry Graham… I screwed up with the spelling…

If only I could use Markov chains to automatically write code…

“underlie”, not “underly”.

Never mind. Having played with the online Markov Generator I see they are doing it on a word-level there.

Like ChrisL I am now wondering what happens if you apply this to music.

Or maybe even analyse adjacent pixels in graphics/photos and see what kind of weird amorphous blob it produces.

Interesting…

I’m thinking tweetkov, make a twitter account, follow as many people as possible, use their random gibberish as input and see if tweetkov’s tweets make any more sense than the average twitter user. :wink:

…I just might do that.

Any good sources for Markov chain generators which take binary input? It could be fun to see what kind of executables it would produce…

Could this be used to test a platform’s stability? I guess Markov chains would generate something executable much more often than a simple random binary generator, if based on existing binaries.

interesting. i never knew what a markov chain was before… even though i have implemented more than one… after all getting a distribution from data and then using it to produce a model by sampling the distribution isn’t really a big leap of understanding. i’d always just labelled this as “a monte-carlo method” and never accorded it special distinction…

interesting that it is so effective for langauge too… although still, the sentences are gibberish. however… i wonder if a more comprehensive model of writing could not be made by analysing more than just the constituent letters of words or their order in text.

something for me to mess with coding whilst bored at work anyway… :slight_smile:

"Or maybe even analyse adjacent pixels in graphics/photos and see what kind of weird amorphous blob it produces."
I’m wondering if the new game Spore uses something like Bayesian or Markovian chains for its procedurally generated artwork.

I would actually really like to see a procedurally generated computer game storyline based around Markov chains. I very much doubt anyone would notice…