Model Performance on R Code Generation
Model Performance vs. Cost
Model Pricing and Token Usage

About This Evaluation

Overview

This app displays evaluation results comparing how well various LLMs generate R code.

Methodology

  • We used the ellmer package to create connections to various models and the vitals package to evaluate model performance.

  • Models were evaluated on the are dataset (An R Eval), which contains challenging R coding problems and their solutions. are is included in the vitals package.

  • Each model’s solution was scored by Claude 3.7 Sonnet as either Incorrect, Partially Correct, or Correct.

Resources