R-Ladies Gaborone
2025-06-28
Language Models?
Large Language Models?
Generative AI?
After drinking some water, the dog went back to the bed to feed her ___________.
puppies: 0.12, babies: 0.10, family: 0.07, offspring 0.042, kittens 0.02 etc.
Predict and generate text by estimating the probability of tokens occurring in a given sequence.
Add more parameters and massive datasets, enabling more complex tasks
A novel (2017) algorithmic architecture to help models focus on important parts of their input for more efficiency
Text generation, translation, sentiment analysis, writing and debugging code, generating images and videos, etc.
(hint: disregard the hype and use what makes you more efficient)
“If you don’t become an AI/prompt engineer by tomorrow you’re a loser with no future”
“Your product needs AI or it will be worthless”
“All your training in data and coding was a waste of time”
I believe everyone deserves the ability to automate tedious tasks in their lives with computers - Simon Willison
..providing a way for people to talk with machines in plain language constitutes a dramatic step forward in making computing accessible to everyone - Carl Bergstrom (paraphrased)
Many popular LLMs are hosted by cloud providers
Many more! (let’s have a look)
Can be costly
Need internet access
We send our data to the API
Download smaller models on our own hardware
Ollama
(Simplifies running open-source LLMs locally)
Hugging Face transformers
(lots of open-source models)
Nothing is more costly than something given free of charge
- Japanese proverb.
nothing is offered for free unless there is something to be gained for the party involved
if we are not paying for the product, then we are the product
Web-based chat interfaces, often with conversation histories and file upload features.
Application Programming Interfaces (APIs) are the rules and protocols that determine how one application can request data from another.
Let different software systems talk to each other and exchange data without having to understand each others’ inner workings.
Allow your applications (R scripts, browser-based prompts, mobile apps) to:
- Send input text (prompts)
- Specify which LLM model to use
- Receive generated text, code, or structured data back
- Handle authentication (e.g., using API keys)
Working with APIs means we need to identify ourselves to the model provider and authenticate our encrypted connection to their servers
In R most of the relevant packages can work with keys stored in Environment Variables.
H. Wickham - Managing secrets
Ted Laderas - A gRadual introduction to web APIs and JSON
AI-generated image created with DALL·E 3 via Bing Image Generator
Shifting our attention between different tasks or programs can be tiring and make us less productive and efficient.
Browser ↩︎️↪️ IDE ↩︎️↪️ Word Processor ↩︎️↪️ Cellphone
Lots of copying and pasting may introduce errors.
Learn about Quarto books
Show off 📦 hexsession
Contribute in Spanish to improve access to these tools
Keep up with developments and learn to use the tools I include
> How do I add a subtitle to my ggplot figure?
> What are the arguments for pivot_wider()?
> I can’t join my table_1 object with my dat3 data frame, help!
Trying to help someone over the phone vs. helping someone at the computer
Young Thug and Lil Durk Troubleshooting meme - imgflip
“an open-source AI code assistant”
Cool IDE features:
ellmer
and friendsRobust and designed specifically for easy interaction with LLMs directly from R
Broad provider support and meant for entreprise/production use
Allows models to extend their capabilities by executing R code
Can produce output that is immediately usable in R
ellmer
demoschatGroqDemo.R
extractPDFdemo.R
gander
ellmer
gander receives a snapshot of our environment with every request and recognizes what we are talking about, including object and variable names
ganderDemo.R
(and somewhat covered in the Quarto Guide)
Model Context Protocol (MCP): the “USB-C port for AI applications”
Resource Augmented Generation (RAG): enhance the accuracy and reliability of LLMs with additional information from relevant data sources.
AI agents: Leverage MCP to do many things in with some degree of ‘autonomy’
Team at posit making, maintaining, and improving several of these tools: Simon Couch, Garrick Aden-Buie, Hadley Wickham, Joe Cheng, Tomasz Kalinowski, Winston Chang, and many others.
MLverse team.
Albert Rapp and Chris Brown for sharing tutorials early in the life cycles of many of these tools.
Other developers and Maintainers: Frank Hull, Gabriel Kaiser
Everyone that shares, gives me feedback on the guide, or points me to need tools I missed.
🤔🤔🤔
(these slides will be linked in the LLMs Guide by Monday)