As far as I know, the CRM114 uses a combination of techniques (including HMMs) to catch spam.
In regards to hidden Markov models, the amount of words the model “remembers” is dependent on its order: a first-order HMM takes into account the last word (one) it saw when deciding which state to move to next. A second-order HMM takes into account the last two states it was in.
Most models I’ve seen are first-order, because of the exponential curve in computational/spatial price in higher order HMMs.
If you’re curious about HMMs, a great resource is Durbin (et al.)’s “Biological Sequence Analysis”, or Rabiner’s classic tutorial “A tutorial on hidden Markov models and selected applications in speech recognition”.