Scorzato’s Reliability and Interpretability in Science and Deep Learning is one of the few epistemology of ML papers which has a strong grasp of recent technical results in ML. It succeeds in connecting these results to a wider philosophical discussion without inventing *artificial philosophical problems in ML*.

Scorzato starts by discussing recent technical approaches to error estimation in ML. The epistemological problem lurking behind these is, that we would like to be able to estimate the errors that ML methods make, say DNNs, with as little additional assumptions as possible. Invoking Quine-Duhem he argues that, generally, error estimation just with the data alone, is impossible. He then goes on to show that this philosophical theorem is realized in Bayesian and Frequentist approaches to error analysis, they must make assumptions to estimate error.

In any case, assumption-ladenness of error analysis does not distinguish ML models from traditional scientific models – a distinction Scorzato wants to draw. So he goes on to say that ML models make **more** model assumptions than traditional scientific models. This statement could be quantified by using a measure he proposes and calls epistemic model complexity. The argument why ML models necessarily have high complexity, that is a high number of “measurable” assumptions^{1}, crucially relies on the existence of adversarials. To make their assumptions “measureable” they basically have to be treated as assumptions individually, therefore inflating the complexity of the ML model. And according to Scorzato we ought to prefer models which, ceteris paribus, are less epistemic complex than others. But this is not all: high epistemic complexity goes hand in hand with loss of overall reliability. The impossibility of assumption-free error analysis prevents us from defining any reliability measure which is agnostic of model assumptions. It therefore seems natural to assume that that more complex models are, ceteris paribus, less reliable. Thus I take Scorzato’s bottom line to be that we should prefer traditional scientific models over ML models whenever possible. He remains silent, though, on what we should do, when this is not possible.

Even if you don’t subscribe to Scorzato’s thoughts on epistemic complexity and model selection, he has to be commended for hammering home the point that error analysis cannot be had without additional assumptions (although I would prefer to call them inductive assumptions). Surprisingly he doesn’t say anything about error estimation using one of the many cross-validation techniques prevalent in ML. Maybe this is because he thinks it covered by his indictment of frequentist techniques – but alas it is used by Bayesians alike. Arguably estimating with cross-validation is the paradigmatic way of error analysis in ML (Recht defends this point forcefully). Superficially viewed cross-validation (CV) seems to give assumption free (or at least using the minimal set of assumptions possible) unbiased estimation of the generalization error^{2}. But as Malik pointed out, most often we are not in the situation to even ensure the minimal assumptions for the validity of CV (i.e. iidness) and this turns the whole thing into a game of meta-prediction or model-checking. Such epistemic issues with error estimation techniques have rarely been given the deserved space in the philosophical discussion, even though the specific assumptions of CV have been widely known in (ancient) ML and (even more ancient) statistics.

I do wish that Scorzato, discussing error analysis in a probabilistic framework, would have said more about the shortcomings of minimizing estimated prediction error for model selection. I obviously agree with him that this is not enough, but I would have expected a more detailed discussion of how mispecified models can minimize the expected prediction error and thereby arbitrarily diverge from the truth. It is also unclear to me why his central definitions of epistemic complexity, measurable quantities and the argument that the standard formulation of a DNN (whatever a standard formulation might be) gives the best estimate for its epistemic complexity are relegated to appendices. These will almost certainly be skipped by superficial readers and they will miss the core of his argument.

On last point: As noted in footnote 1, Scorzato defines measurability in a way that excludes adversarials. He writes: “Here we have assumed that measurable quantities cannot assume different values, with high confidence, for imperceptibly different data points.” Critics might take this definition to be bordering on circularity. Why should we exclude things from measurability whose difference *we* cannot perceive?

A better way to think about adversarials is that, no matter how you specify your model, they are always a consequence of the mathematical phenomenon known as concentration of measure^{3}. That means, as long as our models are defined on concentrated probability spaces, there will always be “imperceptibly different data points” being classified differently. And obviously using only our ML model, we cannot predict them in advance.

- Scorzato defines measurable quantities as “quantities [that] cannot assume different values, with high confidence, for imperceptibly different data points.” Note: This is essentially the definition of adversarials. ↩︎
- This is probably the reason why Corfield promoted CV to one of the allowed justifications in ML ↩︎
- In the context of concentration of measure this means
*probability measure*. This should no be confused with Scorzato’s concept of measureability. ↩︎