Transformers as Markov processes

If one can view transformers as Markov chains, then limiting results for Markov chains apply to transformers. Cosma noted for example that "[t]here are finite-state probabilistic languages which cannot be exactly represented by finite-order Markov chains.". There are also arguments that finite-state, finite-order Markov processes cannot model human language in principle (Debowski). For Cosma and…

Rota’s rant about mathematical philosophy

Gian-Carlo Rota who, if you didn't know it, held a professorship for mathematics and philosophy took the opportunity of a Synthese special edition to condemn mathematical philosophy before it was invented. His basic argument is that mathematics is essentially clear and philosophy is not. Therefore any aspiration for clarity is damaging to philosophy. He also…

Do LMMs really train themselves?

Recently Holger Lyre presented his paper "Understanding AI": Semantic Grounding in Large Language Models in our group seminar. And while I generally remain skeptical about his claims of semantic grounding (maybe the occasion for a separate post) here I want to address a misunderstanding in his paper about what he calls "self-learning", "self-supervised learning" or…

The illusion of generalization

Contrary to optimistic claims in ML literature, I often cannot help but think that deep neural nets are indeed overfit and do not generalize well. But of course that claim hinges on what one means by generalizing well. About this there has been considerable confusion in the more practical engineering oriented ML literature, which at…

A study in LIME

LIME for images explanations are extremely dependent on the choice of tiling. Tiling is one inductive assumption of LIME but it is hidden from the end user. I know I'm late to the party, but here we go. I was recently asked to say something about philosophical problems in XAI and LIME is an obvious…

IACAP23 recap

Before my memory fails me – I didn't take notes – I wanted to write down whatever I remember about the IACAP conference that took place in Prague in the beginning of July. There was a certain old vs. new guard feeling pervading the whole conference which played out mainly between traditional philosophy of computation…

ML epistemology workshop – Day 2

Notes for Day 1 here Meta-Inductive Justification of Universal Generalizations – Gerhard Schurz In this talk Gerhard defended his account of induction against some objections Tom published in an earlier paper. Unfortunately I am not very well acquainted with meta-induction and Gerhard's presentation was a word document which he quickly skimmed through. So I had…

ML epistemology workshop – Day 1

I recently attended Tom's closing workshop on his Philosophy of statistical learning theory project. It was a great workshop and I learned a great deal from the talks. I provide a streamlined version of notes I took, for all those who were interested but couldn't attend. The abstracts of the talks can be found here:…