Transformers as Markov processes

If one can view transformers as Markov chains, then limiting results for Markov chains apply to transformers. Cosma noted for example that "[t]here are finite-state probabilistic languages which cannot be exactly represented by finite-order Markov chains.". There are also arguments that finite-state, finite-order Markov processes cannot model human language in principle (Debowski). For Cosma and…

Do LMMs really train themselves?

Recently Holger Lyre presented his paper "Understanding AI": Semantic Grounding in Large Language Models in our group seminar. And while I generally remain skeptical about his claims of semantic grounding (maybe the occasion for a separate post) here I want to address a misunderstanding in his paper about what he calls "self-learning", "self-supervised learning" or…