Discovered: Oct 4, 2023 09:41 Evaluating LLMs is a minefield (via https://www.aisnakeoil.com/p/evaluating-llms-is-a-minefield ) <- QUOTE: i) No evidence of capability degradation. ii) But behavior changed in response to certain prompts. iii) Slightly different prompts needed to elicit capability.

Leave a comment on github