The strangest error

If the problem of induction is unsolvable then we won’t have a theory of strange errors. If the new problem of induction is unsolvable then we won’t have a theory of artifacts in ML.

A problem in ML which has not received too much attention in philosophical circles so far is the problem of strange errors or adversarials. Rathkopf and Heinrichs have a nice short piece where they basically define a strange error to be an error in ML-classification that completely runs counter to human classification. It is an error that is so far out, that it would invite questions about your sanity if you would make it.
They then voice the hope that strange errors might be predicted by a theory of ML artifacts as envisioned by Buckner.
Now Buckner has a somewhat more involved definition of artifact – he doesn’t treat artifacts necessarily as errors for example – but for the following it is sufficient that strange errors are a subclass of artifacts.
Rathkopf and Heinrichs argue, echoing a preceding consideration in Ramón’s infamous pigeon paper, that the presence of strange errors, gives rise to worries about the trustworthiness of ML-classifications, especially in high-stakes environments such as medicine. There might be ML-misclassifications that are much more damaging than human misclassifications. And we cannot do anything about that unless we have a theory of strange errors. I am not sure if they believe that we can ever have such a theory, but Buckner certainly believes that we can have a theory of artifacts in ML. He even gives an outline of one. Whatever he means by theory, it is clear that it should allow one to predict which artifacts will occur without having to run the ML-system. Now, I believe this is the reason why we cannot have such a theory. Even if we accept that artifacts can be predictively-useful, these artifacts are only latched on by the trained ML-model. There is no reason to expect that any other method will predict exactly these and only these artifacts. I also don’t think that having a theory predicting artifacts within a pre-specified similarity distance is possible. I think having such a theory would amount to having a justified general inductive principle. As Buckner himself notices: the only reasons to favour green over grue are pragmatic. If you consider the new-problem of induction unsolvable then you can’t have a theory of artifacts.

I still think adversarials and strange errors pose a challenge for certain flavours of epistemology – e.g. reliabilism. Especially strange errors highlight the difference between epistemology in a human-machine environment and a pure human one. But I don’t think they can be theorized about. They lurk in the cracks of our inductive generalizations and once you eliminate one, another will develop. Rathkopf and Heinrichs might be right in that we better learn to live with them.

Leave a comment

Your email address will not be published. Required fields are marked *