2024 LLMs/genAI + R roundup

<!DOCTYPE html>

hexout

created with hexsession

All the contents of this post (and much more) now live in this standalone Quarto Guide

Please note: This post has been updated ~~twice~~ thrice in January 2025 to keep up with new developments in the field

This year has seen significant progress in the genAI/LLM space, and lately a lot of these tools have been integrated with R in various ways. This is nice, because not everything needs to be a standalone chat box in the web browser - which seems prone to misuse (e.g., using things like chatgpt as a search engine for no valid reason).

To help me keep track of what’s happening, I’ve put together this (potentially incomplete) list of relevant LLM+R resources.

First, it’s worth mentioning this perspective in Methods in Ecology and Evolution: Harnessing large language models for coding, teaching and inclusion to empower research in ecology and evolution, led by Natalie Cooper, and part of a special issue on the use of LLMs in ecology and evolution. These papers summarize the pros and cons of genAI/LLMs in the context of research and teaching, and their conclusions go beyond research and biological sciences.

Now, in no particular order, here is a roundup of some notable developments that I’ve come across online.

pal

pal by Simon Couch provides easy to use assistants that can edit, document, or explain code. The package provides an addin that works in both RStudio and Positron. Very cute package logo, and pal seems like a good option for writing boilerplate code and automating some of the more tedious, repetitive tasks. Works with multiple underlying models. Haven’t tried it yet.

Here’s a tutorial in Spanish for using pal with custom assistants that can have different roles.

kuzco

kuzco by Frank Hull & Johannes Breuer leverages ollamar and ellmer to provide a computer vision assistant right inside R, using LLMs as an alternative to torch for tasks including image classification, recognition, sentiment, and text extraction. The output provided comes in very nice and usable tibbles.

Cool hex logo and the example in the readme features a very cute puppy :3

gander

Simon Couch is on a roll, and released the cool gander package in early 2025.

gander leverages ellmer to go beyond a chat box and actually incorporate a context-aware assisant into our IDE of choice. Works well in both RStudio and Positron, and can actually look for context in files and inside R environment (variable names, objects, function definitions, etc.)

This short screen recording comes from the package readme and shows the tool in action:

This feels similar to using the continue extension in Positron but more oriented to R, and I’ll likely be using this going forward.

ellmer

note that ellmer was previously called elmer, and was renamed in December 2024 to avoid case-insensitive clash with ELMER on bioconductor.

ellmer is a new tidyverse-adjacent package by Hadley Wickham and Joe Cheng for interacting in R with various models, either programmatically or interactively. elmer creates R6 chat objects that remember context, and we can interact with models with a console or browser-based chat box, or programmatically within R functions.

Seems promising, quite flexible, and all the activity in the GitHub repo suggests a very active development process, a solid dev team, and lots of community input.

mall

mall is part of the mlverse ecosystem of open source Data Science and Machine Learning libraries. Rather than a chat-based approach, mall applies LLMs rowwise in the columns of a data frame. Built-in prompts include translating, summarizing, extraction, and sentiment analysis of text strings.

mall uses Ollama and is implemented for both R and Python. I will likely be using it to analyze the comments I collected about loaded packages, which I talked about in my posit::conf(2024) talk.

tidyllm

tidyllm by Eduard Brüll provides pipe-friendly access to multiple LLM APIs in a smooth and readable way. We can pass images, documents, videos, and plots from the plot pane to the package functions, and the package has very thorough documentation. Similar in some ways to ellmer, but looks very useful for both batch processing and interactive work.

gemini.R

gemini.R by Jinhwan Kim connects R with Google’s gemini model via the gemini API. With a valid API key, the gemini() function takes text prompts and the gemini_image() function can work with images and text prompts.

The package algo provides an RStudio addin for creating Roxygen documentation.

GiHub Copilot

GitHub Copilot is a widely-used and well-documented coding assistant for code completions and the possibility of turning natural language prompts into code suggestions. Works on GitHub.com and inside most IDEs.

Subscription-based, although as of a a few weeks-ago there is a free tier available. I have not tried it yet.

This recent talk by Yanina Bellini for RLadies Rome provides a good overview, and Yani’s slides also mention important considerations about the AI skill threat.

continue

A vscode extension that works nicely in Positron. Supports multiple models for chat and code completion. Easy to provide local files and folders for context and nice integration with the source editor.

Tried it out after Julia Silge mentioned it in the Super Data Science podcast. Works nicely, and I used it in Positron with Anthropic’s Claude 3.5 Sonnet model to write and edit the repetitive css, html, and javascript code that powers the hexsession package. Also helped me with some of the nested for loops that play a big role in forgts.

ensure

ensure by Simon Couch helps write code for unit tests using the testthat package. Works through an Rstudio addin, and the documentation mentions that the model has been made aware of testthat syntax and the tidy style guide for code. Will be trying it out for my more recent packages that still have poor test coverage.

codeium

Another cool VScode extension that works well in Positron. Provides autocomplete and chat. Free for individuals. The base model uses Llama 3.1 70B, but paid tiers can choose other models. I’ve tried this out in Positron with good results for code refactoring, at least when playing around with small scripts.

llmR

llmr by Angelo D’Ambrosio provides a unified API to interact with various LLMs and providers through functions with consistent grammar and syntax. Provides easy switching between models, plus logging. Seems similar to elmer. Will try it soon.

A similar package called LLMR recently appeared on CRAN but I could not find much materials about it.

lang

lang is also part of the mlverse. This package uses LLMs to translate R documentation and display it in the help pane of Rstudio or Positron. Having participated in various translation initiatives, I am very wary of machine-translated function documentation and how it may affect new learners.

The best part of this package (in my opinion) is the infrastructure for package developers to help translate documentation that after editing can be shipped as part of package with multulingual help files.

ollamar

Developed by Hause Lin and Tawab Safi, ollamar Integrates R with Ollama, for running language models locally.

With ollamar we can pull different models, interact with objects that store chat histories, and handle the outputs as data frames, lists, or vectors.

Works nicely with httr2, which is a big advantage.

ragnar

As the name implies, ragnar by Tomasz Kalinowski helps implement Retrieval-Augmented Generation (RAG) workflows in R with transparency and efficiency. Uses duckdb by default for efficient work with big data and currently supports ollama (more models likely to come soon). RAG is a way to link generative models with external resources rich in technical details to ultimately enhance the accuracy and reliability of models with real, citable facts.

I liked the example in the readme, showing how the R for Data Science book was ingested and queried using the package.

Special mention here for Shiny Assistant, a stand-alone AI tool from the Shiny team that can answer questions about Shiny and aldo build apps. Shiny assistant works with both R and python, and the tool takes advantage of Shinylive to optionally forego the need for a server running R or python.

If I missed anything let me know and I’ll add it here!

Share on

Twitter Facebook LinkedIn

Luis D. Verde Arregoitia