Transformers as Markov processes

If one can view transformers as Markov chains, then limiting results for Markov chains apply to transformers. Cosma noted for example that “[t]here are finite-state probabilistic languages which cannot be exactly represented by finite-order Markov chains.“. There are also arguments that finite-state, finite-order Markov processes cannot model human language in principle (Debowski). For Cosma and… Continue reading Transformers as Markov processes

Do LMMs really train themselves?

Recently Holger Lyre presented his paper “Understanding AI”: Semantic Grounding in Large Language Models in our group seminar. And while I generally remain skeptical about his claims of semantic grounding (maybe the occasion for a separate post) here I want to address a misunderstanding in his paper about what he calls “self-learning”, “self-supervised learning” or… Continue reading Do LMMs really train themselves?