Can we evaluate “Everything in the Whole Wide World”, Raji et al., 2021

LLMs can do a lot, it’s hard to evaluate all they can do.

Now users use LLMs, and users are actors, they act with the tool, they transform the predicted use of the tools. We cannot predict the all the uses of the tools.

We are mainly studying english, and it’s so overused that in ACL papers 50% of the paper they don’t even say that they work on english, it’s given for an obvious fact.

These systems work best in english, and they have low coverage of other languages.