Jekyll2024-01-30T21:55:21+00:00https://luisdva.github.io/feed.xmlLuis D. Verde ArregoitiaEcology.Evolution.Conservation.DataLuis D. Verde ArregoitiaYou ‘tidyr::complete()’ me2024-01-30T00:00:00+00:002024-01-30T00:00:00+00:00https://luisdva.github.io/rstats/complete<p>Here’s a quick example of how we can use <code class="language-plaintext highlighter-rouge">dplyr</code> and <code class="language-plaintext highlighter-rouge">tidyr</code> functions to complete sequences in a data frame given start and end values stored in separate columns. This was originally asked in a Spanish-Language R Facebook group and makes for a good use case of pivoting plus the <code class="language-plaintext highlighter-rouge">complete()</code> and <code class="language-plaintext highlighter-rouge">full_seq()</code> functions from <code class="language-plaintext highlighter-rouge">tidyr</code> which I had never written about nor used in my teaching materials.</p>
<p>For a tabular object like this one below, we want to expand the sequence (days) for each category, repeating the values of longitude as needed.</p>
<table>
<thead>
<tr>
<th style="text-align: left">category</th>
<th style="text-align: right">start_day</th>
<th style="text-align: right">end_day</th>
<th style="text-align: right">longitude</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">A</td>
<td style="text-align: right">6</td>
<td style="text-align: right">8</td>
<td style="text-align: right">12</td>
</tr>
<tr>
<td style="text-align: left">B</td>
<td style="text-align: right">18</td>
<td style="text-align: right">23</td>
<td style="text-align: right">15</td>
</tr>
<tr>
<td style="text-align: left">C</td>
<td style="text-align: right">19</td>
<td style="text-align: right">21</td>
<td style="text-align: right">11</td>
</tr>
<tr>
<td style="text-align: left">D</td>
<td style="text-align: right">2</td>
<td style="text-align: right">6</td>
<td style="text-align: right">13</td>
</tr>
</tbody>
</table>
<p>We want to end up with this:</p>
<table>
<thead>
<tr>
<th style="text-align: left">category</th>
<th style="text-align: right">day</th>
<th style="text-align: right">longitude</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">A</td>
<td style="text-align: right">6</td>
<td style="text-align: right">12</td>
</tr>
<tr>
<td style="text-align: left">A</td>
<td style="text-align: right">7</td>
<td style="text-align: right">12</td>
</tr>
<tr>
<td style="text-align: left">A</td>
<td style="text-align: right">8</td>
<td style="text-align: right">12</td>
</tr>
<tr>
<td style="text-align: left">B</td>
<td style="text-align: right">18</td>
<td style="text-align: right">15</td>
</tr>
<tr>
<td style="text-align: left">B</td>
<td style="text-align: right">19</td>
<td style="text-align: right">15</td>
</tr>
<tr>
<td style="text-align: left">B</td>
<td style="text-align: right">20</td>
<td style="text-align: right">15</td>
</tr>
<tr>
<td style="text-align: left">B</td>
<td style="text-align: right">21</td>
<td style="text-align: right">15</td>
</tr>
<tr>
<td style="text-align: left">B</td>
<td style="text-align: right">22</td>
<td style="text-align: right">15</td>
</tr>
<tr>
<td style="text-align: left">B</td>
<td style="text-align: right">23</td>
<td style="text-align: right">15</td>
</tr>
<tr>
<td style="text-align: left">C</td>
<td style="text-align: right">19</td>
<td style="text-align: right">11</td>
</tr>
<tr>
<td style="text-align: left">C</td>
<td style="text-align: right">20</td>
<td style="text-align: right">11</td>
</tr>
<tr>
<td style="text-align: left">C</td>
<td style="text-align: right">21</td>
<td style="text-align: right">11</td>
</tr>
<tr>
<td style="text-align: left">D</td>
<td style="text-align: right">2</td>
<td style="text-align: right">13</td>
</tr>
<tr>
<td style="text-align: left">D</td>
<td style="text-align: right">3</td>
<td style="text-align: right">13</td>
</tr>
<tr>
<td style="text-align: left">D</td>
<td style="text-align: right">4</td>
<td style="text-align: right">13</td>
</tr>
<tr>
<td style="text-align: left">D</td>
<td style="text-align: right">5</td>
<td style="text-align: right">13</td>
</tr>
<tr>
<td style="text-align: left">D</td>
<td style="text-align: right">6</td>
<td style="text-align: right">13</td>
</tr>
</tbody>
</table>
<p>Here’s how to complete the data frame with the missing combinations of data. Let’s set up our data first.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.1.4</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.3.1</span><span class="w">
</span><span class="n">mydat</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">tibble</span><span class="p">(</span><span class="n">category</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s2">"A"</span><span class="p">,</span><span class="s2">"B"</span><span class="p">,</span><span class="s2">"C"</span><span class="p">,</span><span class="s2">"D"</span><span class="p">),</span><span class="w">
</span><span class="n">start_day</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">6</span><span class="p">,</span><span class="m">18</span><span class="p">,</span><span class="m">19</span><span class="p">,</span><span class="m">2</span><span class="p">),</span><span class="w">
</span><span class="n">end_day</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">8</span><span class="p">,</span><span class="m">23</span><span class="p">,</span><span class="m">21</span><span class="p">,</span><span class="m">6</span><span class="p">),</span><span class="w">
</span><span class="n">longitude</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="m">15</span><span class="p">,</span><span class="m">11</span><span class="p">,</span><span class="m">13</span><span class="p">))</span></code></pre></figure>
<p>Our initial data looks like this:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">> mydat
# A tibble: 4 × 4
category start_day end_day longitude
<chr> <dbl> <dbl> <dbl>
1 A 6 8 12
2 B 18 23 15
3 C 19 21 11
4 D 2 6 13</code></pre></figure>
<p>In this initial structure, the data is in wide format so we need to pivot the data so that start and end days for each group are together in the same variable.</p>
<p>Pivot longer like so:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">mydat</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">pivot_longer</span><span class="p">(</span><span class="n">cols</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">start_day</span><span class="p">,</span><span class="n">end_day</span><span class="p">),</span><span class="w">
</span><span class="n">names_to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"day_type"</span><span class="p">,</span><span class="n">values_to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"day"</span><span class="p">)</span><span class="w"> </span></code></pre></figure>
<p>We now have the key-value pairs and the data in long format.</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"># A tibble: 8 × 4
category longitude day_type day
<chr> <dbl> <chr> <dbl>
1 A 12 start_day 6
2 A 12 end_day 8
3 B 15 start_day 18
4 B 15 end_day 23
5 C 11 start_day 19
6 C 11 end_day 21
7 D 13 start_day 2
8 D 13 end_day 6</code></pre></figure>
<p>Now we need to complete the sequence. For example: for category A we need separate rows for day 6, 7, and 8. For group B we need rows for days 18, 19, 20, 21, 22, and 23, and so on.</p>
<p>To create the full sequence of values in a vector, we can use <code class="language-plaintext highlighter-rouge">tidyr::full_seq()</code>. This function takes a numeric vector and a value describing the gaps or increments that need to be filled in.</p>
<p><code class="language-plaintext highlighter-rouge">full_seq(c(1,3,7),1)</code> will return the same as <code class="language-plaintext highlighter-rouge">1:7</code>.</p>
<p><code class="language-plaintext highlighter-rouge">complete()</code> is a nice wrapper for another <code class="language-plaintext highlighter-rouge">tidyr</code> function: <code class="language-plaintext highlighter-rouge">expand()</code>. <code class="language-plaintext highlighter-rouge">complete</code> is pipe-friendly and operates on data frames (including grouped data frames).</p>
<p>On the grouped data frame, let’s complete the new day variable in increments of one, then to get the desired result lets <code class="language-plaintext highlighter-rouge">fill()</code> in the missing values in the longitude column and clean up.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">mydat</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">pivot_longer</span><span class="p">(</span><span class="w">
</span><span class="n">cols</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">start_day</span><span class="p">,</span><span class="w"> </span><span class="n">end_day</span><span class="p">),</span><span class="w">
</span><span class="n">names_to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"day_type"</span><span class="p">,</span><span class="w"> </span><span class="n">values_to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"day"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">category</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">complete</span><span class="p">(</span><span class="n">day</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">full_seq</span><span class="p">(</span><span class="n">day</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">fill</span><span class="p">(</span><span class="n">longitude</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">day_type</span><span class="p">)</span></code></pre></figure>
<p>The result:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"># A tibble: 17 × 3
# Groups: category [4]
category day longitude
<chr> <dbl> <dbl>
1 A 6 12
2 A 7 12
3 A 8 12
4 B 18 15
5 B 19 15
6 B 20 15
7 B 21 15
8 B 22 15
9 B 23 15
10 C 19 11
11 C 20 11
12 C 21 11
13 D 2 13
14 D 3 13
15 D 4 13
16 D 5 13
17 D 6 13</code></pre></figure>
<p>Pretty cool! As always feel free to contact me with feedback or questions.</p>Luis D. Verde ArregoitiaComplete sequences from start and end values stored in separate columnsWhat are people commenting about their loaded packages?2023-04-06T00:00:00+00:002023-04-06T00:00:00+00:00https://luisdva.github.io/rstats/package-comments<blockquote>
<p>This post was discussed in the Rweekly Highlights <a href="https://rweekly.fireside.fm/">Podcast</a> (Episode 118). Listen <a href="https://share.fireside.fm/episode/87RSVeFz+jyb2gyBW" target="_blank">here</a>.</p>
</blockquote>
<hr />
<p>From small one-off scripts to massive interactive apps, most workflows use packages that help us extend the capabilities of a programming language. Sometimes we only need some example data or a few additional functions. Other times we want to add grammars, data structures, printing methods, graphical devices, or OOP programming systems.</p>
<p>Rather than only working interactively, it helps to use scripts and save our source code for repeatability and transparency. When we need something from a package that fits our needs, we generally call this package from a script, so properly documenting how we use packages within scripts can make code easier to understand, debug, and share.</p>
<p><em>Can you figure out which packages here don’t actually exist?</em></p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># packages I may or may not need for this task</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">e1071</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">yulab.utils</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">wk</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">windex</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">xonkicks</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">whisker</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">labbo</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringx</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">castor</span><span class="p">)</span></code></pre></figure>
<p>One easy way to add information about the packages we use is with code comments for package load calls. In R, anything after the commenting symbol “#” will be ignored during evaluation. It’s best practice to not overuse comments and to write self-explanatory code, but recording certain details about the packages we’re calling in a script can be useful.</p>
<p>The code above can be easily enhanced with the title and source for each package (as installed on my machine), and this is potentially helpful.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># packages with title and source as comments</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">e1071</span><span class="p">)</span><span class="w"> </span><span class="c1"># Misc Functions of the Department of Statistics, Probability Theory Group CRAN v1.7-9</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">yulab.utils</span><span class="p">)</span><span class="w"> </span><span class="c1"># Supporting Functions for Packages Maintained by 'YuLab-SMU', CRAN v0.0.4</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">wk</span><span class="p">)</span><span class="w"> </span><span class="c1"># Lightweight Well-Known Geometry Parsing, CRAN v0.6.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">windex</span><span class="p">)</span><span class="w"> </span><span class="c1"># Analysing Convergent Evolution using the Wheatsheaf Index, CRAN v2.0.3</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">xonkicks</span><span class="p">)</span><span class="w"> </span><span class="c1"># not installed</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">whisker</span><span class="p">)</span><span class="w"> </span><span class="c1"># for R, Logicless Templating, CRAN v0.4</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">labbo</span><span class="p">)</span><span class="w"> </span><span class="c1"># not installed</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringx</span><span class="p">)</span><span class="w"> </span><span class="c1"># Drop-in Replacements for Base String Functions Powered by'stringi', [github::gagolews/stringx] v0.2.4</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">castor</span><span class="p">)</span><span class="w"> </span><span class="c1"># Efficient Phylogenetics on Large Trees, CRAN v1.7.8</span></code></pre></figure>
<p>The two made up packages were <code class="language-plaintext highlighter-rouge">xonkicks</code> and <code class="language-plaintext highlighter-rouge">labbo</code>.</p>
<p>To easily create code comments about the package load calls in a script, we can use the annotater package (read more about the package <a href="https://annotater.liomys.mx" target="_blank">here</a>). The functions build code comments using information already supplied by a package in its <strong>DESCRIPTION</strong> file or within its internal lists of functions and bundled datasets. This can be quite helpful for sharing code online, teaching, or making sense of existing scripts.</p>
<p>At present, annotater can add the following details to package load calls in scripts or markdown (Rmd/Qmd) files:</p>
<ul>
<li>package title</li>
<li>package version</li>
<li>package repository source (e.g. CRAN, GitHub, Bioconductor, etc.)</li>
<li>which functions from each package are being used</li>
<li>which datasets from each packages are being used</li>
</ul>
<p>The number of possible package annotations has grown thanks to feedback and community contributions, but I still had a pressing question:</p>
<h2 id="what-are-people-commenting-about-their-loaded-packages">What are people commenting about their loaded packages?</h2>
<p>With so much public code available, it is now possible to search for .R, .Rmd and .Qmd files online, look for package load calls (e.g., <code class="language-plaintext highlighter-rouge">library(data.table)</code>), and then check for code comments after these lines (e.g., <code class="language-plaintext highlighter-rouge">library(data.table) # using dev version</code>).</p>
<p>A good source of data for this can be the weekly GitHub snapshots available on Google’s BigQuery cloud platform. This 3TB+ dataset includes the content of 163 million files, all searchable with regular expressions.</p>
<p>This <a href="https://towardsdatascience.com/top-100-most-used-r-functions-on-github-9caf2b81b314" target="_blank">post</a> by <a href="https://github.com/v-kozhevnikov" target="_blank">Vlad Kozhevnikov</a> explains how to write SQL for BigQuery to select the IDs of all relevant files and then select content of R files. After that we can run another query to search the script contents for code comments after <code class="language-plaintext highlighter-rouge">library()</code> calls.</p>
<p>The queries for .R files look like this (change with your respective BigQuery project, dataset, and table names). I repeated this for .R, .Rmd, and .Qmd files.</p>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="c1">-- Find all files with .R extension (case insensitive)</span>
<span class="k">SELECT</span> <span class="o">*</span>
<span class="k">FROM</span> <span class="nv">`bigquery-public-data.github_repos.files`</span>
<span class="k">WHERE</span> <span class="k">lower</span><span class="p">(</span><span class="k">RIGHT</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span> <span class="o">=</span> <span class="s1">'.r'</span>
<span class="c1">-- Match to get file contents</span>
<span class="k">SELECT</span> <span class="o">*</span>
<span class="k">FROM</span> <span class="nv">`bigquery-public-data.github_repos.contents`</span>
<span class="k">WHERE</span> <span class="n">id</span> <span class="k">IN</span> <span class="p">(</span><span class="k">select</span> <span class="n">id</span> <span class="k">from</span> <span class="nv">`YOURBQprojectID.yourDATASET.yourTABLE`</span><span class="p">)</span>
<span class="err">–</span> <span class="n">Regex</span> <span class="k">match</span> <span class="n">library</span> <span class="n">calls</span> <span class="k">with</span> <span class="n">a</span> <span class="k">comment</span> <span class="k">after</span> <span class="n">whitespace</span> <span class="p">(</span><span class="k">except</span> <span class="k">for</span> <span class="n">newlines</span><span class="p">)</span>
<span class="k">SELECT</span> <span class="o">*</span>
<span class="k">FROM</span> <span class="nv">`YOURBQprojectID.yourDATASET.yourTABLE`</span>
<span class="k">WHERE</span> <span class="n">REGEXP_CONTAINS</span><span class="p">(</span><span class="n">content</span><span class="p">,</span> <span class="n">r</span><span class="s1">'library</span><span class="se">\(</span><span class="s1">.*</span><span class="se">\)</span><span class="s1">[ </span><span class="se">\t</span><span class="s1">]?#'</span><span class="p">)</span></code></pre></figure>
<blockquote>
<p>Many thanks to fellow instructors Steve Condylios and Amit Kohli for pointing me in the right direction!</p>
</blockquote>
<p>My latest query found <strong>500,641</strong> .R files, <strong>44,839</strong> Rmd, and <strong>242</strong> Qmd files in the <a href="https://console.cloud.google.com/marketplace/product/github/github-repos" target="_blank">GitHub Data</a> snapshot (March 14, 2023) of which <strong>3968</strong> included at least one code comment after a <code class="language-plaintext highlighter-rouge">library()</code> call.</p>
<p>A good proportion of these 545,722 files won’t be scripts that people are meant to see or interact with (for example, .R files in Shiny apps). Still, the <strong>0.7%</strong> of files with comments about the loaded packages will have some interesting information about what us as users are commenting.</p>
<p>Also following Vlad’s post, I was able to connect with BigQuery from R using <code class="language-plaintext highlighter-rouge">bigrquery</code> and download the tables to objects in my R environment. Once in R, there is some cleaning and parsing to get each file’s library calls and respective comments (if any) into a tidier structure.</p>
<p>All the R code for this post is at the Gist at the end in case anyone is interested. This approach uses regex to parse code but also other cool functions to treat code as code rather than strings. In short, I had to:</p>
<ul>
<li>get only the lines with library calls in the field that stored all the script’s content</li>
<li>split each call into a separate row</li>
<li>separate comments into their own variable</li>
<li>clean up spaces and unmatched brackets (from wrapped <code class="language-plaintext highlighter-rouge">library()</code> calls)</li>
<li>parse the calls and extract the first argument of the <code class="language-plaintext highlighter-rouge">library()</code> calls to get package names and not other arguments people sometimes use.</li>
<li>various summary statistics</li>
</ul>
<h3 id="general-overview">General overview</h3>
<p>For the 3968 (out of >500,000 files) that included at least one code comment after a library load call, the mean number of packages loaded was 6 per script (minimum 1, maximum 203!) and across these scripts over half (55%) of the packages mentioned in each one have a code comment.</p>
<p>Let’s see a set of commented calls sampled at random:</p>
<table>
<thead>
<tr>
<th style="text-align: left">pkgname</th>
<th style="text-align: left">comment</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">roxygen2</td>
<td style="text-align: left">If not availble install these packages with ‘install.packages(…)’</td>
</tr>
<tr>
<td style="text-align: left">tidyverse</td>
<td style="text-align: left">load c(dplyr, tidyr, stringr, readr) due to system doesn’t work.</td>
</tr>
<tr>
<td style="text-align: left">scales</td>
<td style="text-align: left">install.packages(“scales”)</td>
</tr>
<tr>
<td style="text-align: left">GenomicRanges</td>
<td style="text-align: left">Required for defining regions</td>
</tr>
<tr>
<td style="text-align: left">devtools</td>
<td style="text-align: left">session_info</td>
</tr>
<tr>
<td style="text-align: left">magrittr</td>
<td style="text-align: left">Pipes</td>
</tr>
<tr>
<td style="text-align: left">forecast</td>
<td style="text-align: left">varios metodos para pronóstico</td>
</tr>
<tr>
<td style="text-align: left">gridExtra</td>
<td style="text-align: left">for multiple plots on a page</td>
</tr>
<tr>
<td style="text-align: left">car</td>
<td style="text-align: left">using external libraries</td>
</tr>
<tr>
<td style="text-align: left">rgl</td>
<td style="text-align: left">Need to install this package first</td>
</tr>
<tr>
<td style="text-align: left">reshape2</td>
<td style="text-align: left">For converting wide to long</td>
</tr>
<tr>
<td style="text-align: left">MplusAutomation</td>
<td style="text-align: left">for extracting gh5</td>
</tr>
<tr>
<td style="text-align: left">ggplot2</td>
<td style="text-align: left">for plotting color plots with legend</td>
</tr>
<tr>
<td style="text-align: left">ggplot2</td>
<td style="text-align: left">For plotting</td>
</tr>
<tr>
<td style="text-align: left">DataCombine</td>
<td style="text-align: left">for the slide function</td>
</tr>
<tr>
<td style="text-align: left">irr</td>
<td style="text-align: left">fleiss’ kappa</td>
</tr>
<tr>
<td style="text-align: left">corpcor</td>
<td style="text-align: left">for fast computation of pseudoinverse</td>
</tr>
<tr>
<td style="text-align: left">ggplot2</td>
<td style="text-align: left">For plotting</td>
</tr>
<tr>
<td style="text-align: left">dplyr</td>
<td style="text-align: left">alternative, this also loads %>%</td>
</tr>
<tr>
<td style="text-align: left">scales</td>
<td style="text-align: left">For formating values in graphs</td>
</tr>
</tbody>
</table>
<p>Now let’s see the comments that repeat the most regardless of which package they’re meant for:</p>
<table>
<thead>
<tr>
<th style="text-align: left">comment</th>
<th style="text-align: right">n</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">Pipes</td>
<td style="text-align: right">206</td>
</tr>
<tr>
<td style="text-align: left">enables piping : %>%</td>
<td style="text-align: right">93</td>
</tr>
<tr>
<td style="text-align: left">For graphing</td>
<td style="text-align: right">84</td>
</tr>
<tr>
<td style="text-align: left">data manipulation</td>
<td style="text-align: right">49</td>
</tr>
<tr>
<td style="text-align: left">for data manipulation</td>
<td style="text-align: right">38</td>
</tr>
</tbody>
</table>
<p>Even with this small sample we can already see some patterns and shared themes in these comments. For example:</p>
<ul>
<li>Notes about installation</li>
<li>What the package is being called for</li>
<li>Which function or functions from the package are being used</li>
<li>Pipes</li>
</ul>
<p>Skimming the comments, a decent proportion of them are describing what a package was used for in general or mentioning the functions or datasets of interest. 22% of all comments start with the words “<em>for</em>”, “<em>para</em>”, and “<em>pour</em>” (case insensitive). Here’s another random sample:</p>
<table>
<thead>
<tr>
<th style="text-align: left">pkgname</th>
<th style="text-align: left">comment</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">bcmaps</td>
<td style="text-align: left">for BC regional district map</td>
</tr>
<tr>
<td style="text-align: left">tmap</td>
<td style="text-align: left">for maps</td>
</tr>
<tr>
<td style="text-align: left">car</td>
<td style="text-align: left">for initial parameter estimate</td>
</tr>
<tr>
<td style="text-align: left">factoextra</td>
<td style="text-align: left">for fviz_cluster(), get_eigenvalue(), eclust()</td>
</tr>
<tr>
<td style="text-align: left">clusterSim</td>
<td style="text-align: left">for Davies-Bouldin’s cluster separation measure</td>
</tr>
<tr>
<td style="text-align: left">e1071</td>
<td style="text-align: left">Para la función svm</td>
</tr>
<tr>
<td style="text-align: left">plotrix</td>
<td style="text-align: left">for plotCI(), cld()</td>
</tr>
<tr>
<td style="text-align: left">edgeR</td>
<td style="text-align: left">for DGE</td>
</tr>
<tr>
<td style="text-align: left">parallel</td>
<td style="text-align: left">for examining machine</td>
</tr>
<tr>
<td style="text-align: left">Boruta</td>
<td style="text-align: left">for feature importance</td>
</tr>
<tr>
<td style="text-align: left">missForest</td>
<td style="text-align: left">for missForest()</td>
</tr>
<tr>
<td style="text-align: left">grid</td>
<td style="text-align: left">For ‘unit’ function in tick length setting</td>
</tr>
<tr>
<td style="text-align: left">Amelia</td>
<td style="text-align: left">for missmap()</td>
</tr>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">for data handling</td>
</tr>
<tr>
<td style="text-align: left">MASS</td>
<td style="text-align: left">For the data set # Use smoke as the faceting variable</td>
</tr>
<tr>
<td style="text-align: left">Epi</td>
<td style="text-align: left">for AUC evaluator</td>
</tr>
<tr>
<td style="text-align: left">minpack.lm</td>
<td style="text-align: left">for non-linear regression package</td>
</tr>
<tr>
<td style="text-align: left">plyr</td>
<td style="text-align: left">For the function “each”</td>
</tr>
<tr>
<td style="text-align: left">MASS</td>
<td style="text-align: left">for LDA</td>
</tr>
<tr>
<td style="text-align: left">grid</td>
<td style="text-align: left">for gList for likert plots w/ histograms</td>
</tr>
</tbody>
</table>
<p>~15% of comments had some kind of note about installation instructions or package source (matched the pattern “<em>instal</em>” or ‘<em>CRAN</em>’ or ‘<em>github</em>’).</p>
<p>We can even group the data by package find the most commented packages. For reference, these are the ten packages with the most comments (excludes duplicate comments):</p>
<table>
<thead>
<tr>
<th style="text-align: left">pkgname</th>
<th style="text-align: right">n</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">ggplot2</td>
<td style="text-align: right">208</td>
</tr>
<tr>
<td style="text-align: left">dplyr</td>
<td style="text-align: right">190</td>
</tr>
<tr>
<td style="text-align: left">plyr</td>
<td style="text-align: right">92</td>
</tr>
<tr>
<td style="text-align: left">MASS</td>
<td style="text-align: right">87</td>
</tr>
<tr>
<td style="text-align: left">tidyverse</td>
<td style="text-align: right">74</td>
</tr>
<tr>
<td style="text-align: left">reshape2</td>
<td style="text-align: right">71</td>
</tr>
<tr>
<td style="text-align: left">stringr</td>
<td style="text-align: right">64</td>
</tr>
<tr>
<td style="text-align: left">scales</td>
<td style="text-align: right">64</td>
</tr>
<tr>
<td style="text-align: left">data.table</td>
<td style="text-align: right">60</td>
</tr>
<tr>
<td style="text-align: left">lubridate</td>
<td style="text-align: right">55</td>
</tr>
</tbody>
</table>
<p>Many of these are tidyverse packages and sometimes the comments made note of that. I did not look at when these scripts are from, but they seem to cover and make note of important changes such as the <em>melt/cast-gather/spread-pivot</em> transition. See some comments for <code class="language-plaintext highlighter-rouge">tidyr</code> that mention reshaping in general:</p>
<table>
<thead>
<tr>
<th style="text-align: left">pkgname</th>
<th style="text-align: left">comment</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">for reshaping data (e.g., ‘gather’)</td>
</tr>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">instead of reshape2::melt</td>
</tr>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">key package for reshaping</td>
</tr>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">reshaping data</td>
</tr>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">library(Hmisc); library(reshape2);</td>
</tr>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">data reshaping</td>
</tr>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">because we’ll need to reshape the data</td>
</tr>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">For reshaping dataframes, specifically nesting</td>
</tr>
<tr>
<td style="text-align: left">tidyr</td>
<td style="text-align: left">this library contains the function that we’ll need in order to reshape the dataframe</td>
</tr>
</tbody>
</table>
<p>I had noted there are comments in different languages, so we can try to identify them with Google’s Compact Language Detector as implemented in the <code class="language-plaintext highlighter-rouge">cld3</code> pacakge. The function is vectorized and easy to use.</p>
<table>
<thead>
<tr>
<th style="text-align: left">commentLanguage</th>
<th style="text-align: right">n</th>
<th style="text-align: right">percent</th>
<th> </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"> </td>
<td style="text-align: right">en</td>
<td style="text-align: right">4820</td>
<td>0.72</td>
</tr>
<tr>
<td style="text-align: left"> </td>
<td style="text-align: right">no</td>
<td style="text-align: right">324</td>
<td>0.05</td>
</tr>
<tr>
<td style="text-align: left"> </td>
<td style="text-align: right">pt</td>
<td style="text-align: right">103</td>
<td>0.02</td>
</tr>
<tr>
<td style="text-align: left"> </td>
<td style="text-align: right">fr</td>
<td style="text-align: right">90</td>
<td>0.01</td>
</tr>
<tr>
<td style="text-align: left"> </td>
<td style="text-align: right">es</td>
<td style="text-align: right">88</td>
<td>0.01</td>
</tr>
<tr>
<td style="text-align: left"> </td>
<td style="text-align: right">ja</td>
<td style="text-align: right">85</td>
<td>0.01</td>
</tr>
</tbody>
</table>
<p>The vast majority of annotations appear to be in English, followed by Norwegian, Portugese, French, Spanish, and Japanese. Because the comments are so brief and many include function names that aren’t real words, I highly doubt the accuracy some of these detected languages. I looked at some the comments that were supposedly in Norweigian and they were mostly short comments with function names and not words or sentences in any particular language. Still, I was interested in the Spanish-language comments and I found some similar to what I’ve written in the past. It’s pretty interesting to see how people describe packages.</p>
<table>
<thead>
<tr>
<th style="text-align: left">pkgname</th>
<th style="text-align: left">comment</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">dplyr</td>
<td style="text-align: left">paquete que veremos mañana y que por ahota lo necesitan otras funciones</td>
</tr>
<tr>
<td style="text-align: left">dplyr</td>
<td style="text-align: left">ya incluido en tidyverse</td>
</tr>
<tr>
<td style="text-align: left">corrplot</td>
<td style="text-align: left">grafico de correlacion</td>
</tr>
<tr>
<td style="text-align: left">data.table</td>
<td style="text-align: left">manipulacion de datos mayor rapidez que dplyr</td>
</tr>
<tr>
<td style="text-align: left">NombreDelPaquete</td>
<td style="text-align: left">Para usar el paquete</td>
</tr>
<tr>
<td style="text-align: left">dplyr</td>
<td style="text-align: left">Para cargar el paquete dplyr</td>
</tr>
<tr>
<td style="text-align: left">stringr</td>
<td style="text-align: left">Para la conversión de tipos numéricos a cadenas</td>
</tr>
<tr>
<td style="text-align: left">psych</td>
<td style="text-align: left">incluye describe()</td>
</tr>
<tr>
<td style="text-align: left">IRdisplay</td>
<td style="text-align: left">despliegue de resultados en jupyter para R</td>
</tr>
<tr>
<td style="text-align: left">fdrtool</td>
<td style="text-align: left">para half normal</td>
</tr>
<tr>
<td style="text-align: left">Hmisc</td>
<td style="text-align: left">algunas funciones de interes para describir datos</td>
</tr>
<tr>
<td style="text-align: left">randomForest</td>
<td style="text-align: left">construir enms</td>
</tr>
<tr>
<td style="text-align: left">readxl</td>
<td style="text-align: left">read_excel</td>
</tr>
<tr>
<td style="text-align: left">stringr</td>
<td style="text-align: left">creo que ni la usé</td>
</tr>
<tr>
<td style="text-align: left">foreign</td>
<td style="text-align: left">Solo para cargar el conjunto de datos de prueba en formato arff</td>
</tr>
<tr>
<td style="text-align: left">ggplot2</td>
<td style="text-align: left">manipulacion de graficos - tidyverse</td>
</tr>
<tr>
<td style="text-align: left">curl</td>
<td style="text-align: left">Descargar desde internet</td>
</tr>
</tbody>
</table>
<p>I particularly like the comment “<em>creo que ni la usé</em>” or “<em>I think I didn’t even use this package</em>” because it makes the tools from <code class="language-plaintext highlighter-rouge">annotater</code> seem helpful. If I have a big script and am unsure whether or not I used a function or dataset from a package, the addins can check that for me.</p>
<p>As a quick exercise, here are seven comments sampled randomly for five of the packages in the data drawn as network graphs. I think that even without labeling the package name we could figure out what each one is based on the comments.</p>
<figure>
<a href="/assets/images/pkggraph.png"><img src="/assets/images/pkggraph.png" /></a>
<figcaption>nice</figcaption>
</figure>
<p>There is a lot to explore in these data (TidyTuesday potential??), but I’m quite happy with the overlap between what I found and the comments that can be created using the the tools that I’ve been working on.</p>
<p>Exploration code:
<script src="https://gist.github.com/luisDVA/574c849e4956c385c56d10fefc4b60a0.js"> </script></p>Luis D. Verde ArregoitiaExploring code comments about R package calls on GitHubFixing broken and irregular column headers2022-11-07T00:00:00+00:002022-11-07T00:00:00+00:00https://luisdva.github.io/rstats/mash-colnames<p>Last week I saw <a href="https://linktr.ee/rivaquiroga" target="_blank">Riva Quiroga</a> masterfully import and clean a scary spreadsheet during an <a href="https://vimeo.com/766089428" target="_blank">RLadies meetup</a>, and I recognized some common issues with other people’s data for which I have already written relevant functions.</p>
<p>I also realized that I never wrote a dedicated post for the <code class="language-plaintext highlighter-rouge">mash_colnames</code> function in the <a href="https://unheadr.liomys.mx" target="_blank">unheadr</a> package. This functionality has been around for a while now, but is worth describing in detail (plus I get to create some new teaching materials). Credit goes to <a href="http://byrneslab.net" target="_blank">Jarret Byrnes</a>, who contributed the initial version of the function in this <a href="https://github.com/luisDVA/unheadr/issues/4" target="_blank">GitHub issue</a>. I just added some tidyeval and a few enhancements for a better fit within the package. Internally, <code class="language-plaintext highlighter-rouge">mash_colnames</code> pivots the column headers and the first <em>n</em> rows of the data and ‘mashes’ whatever needs to be together columnwise.</p>
<p><code class="language-plaintext highlighter-rouge">mash_colnames()</code> has two main uses. We can apply this nifty function to data with the following issues:</p>
<h2 id="data-with-the-variable-names-split-across--1-rows-ie-there-are-fragments-of-the-headers-in-the-first-few-data-rows">Data with the variable names split across > 1 rows (i.e. there are fragments of the headers in the first few data rows)</h2>
<figure>
<a href="/assets/images/mashnolabs.png"><img src="/assets/images/mashnolabs.png" /></a>
<figcaption>oh no</figcaption>
</figure>
<p>This little example here with rodent data has different bits of the column headers spread out ‘vertically’ for some reason. This is quite common, as far as I’ve seen. This next image below explains the problem a bit better. Notice that if there were any separators between the pieces of column headers, these are now implicit.</p>
<figure>
<a href="/assets/images/mash.png"><img src="/assets/images/mash.png" /></a>
<figcaption>oh my</figcaption>
</figure>
<p>Ultimately, we most likely are interested in something like this:</p>
<figure>
<a href="/assets/images/mashed.png"><img src="/assets/images/mashed.png" /></a>
<figcaption>much better</figcaption>
</figure>
<h2 id="names-split-across-1-rows-but-with-gaps-in-the-headers-at-the-very-top">Names split across >1 rows but with gaps in the headers at the very top</h2>
<p>This happens when cells were originally merged in a spreadsheet or formatted table, or maybe the gaps are there intentionally to imply that the values along this row are the same until a new one appears. I’m not sure about the correct terms for this but <a href="https://www.visibledata.co.uk/about.html" target="_blank">Charlie Hadley</a> referred to this as “non-regular spanning of column headers”.</p>
<figure>
<a href="/assets/images/ragged.png"><img src="/assets/images/ragged.png" /></a>
<figcaption>ragged</figcaption>
</figure>
<p>With colors to show which columns are meant to share a piece of header, the data look like this:</p>
<figure>
<a href="/assets/images/multirow.png"><img src="/assets/images/multirow.png" /></a>
<figcaption>messy</figcaption>
</figure>
<p>A more usable version of the data would look like this:</p>
<figure>
<a href="/assets/images/multirow_fixed.png"><img src="/assets/images/multirow_fixed.png" /></a>
<figcaption>nicer</figcaption>
</figure>
<p>Having shown the two common issues that we can address with <code class="language-plaintext highlighter-rouge">unheadr</code>, let’s work through the same examples using code.</p>
<h1 id="working-with-code">Working with code</h1>
<p>Let’s set up the same example data from the images above and fix the issues.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">unheadr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.3.3</span><span class="w">
</span><span class="n">rodents</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">tibble</span><span class="o">::</span><span class="n">tribble</span><span class="p">(</span><span class="w">
</span><span class="o">~</span><span class="n">critter</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">tail</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">whisker</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">mass</span><span class="p">,</span><span class="w">
</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="s2">"length"</span><span class="p">,</span><span class="w"> </span><span class="s2">"length"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grams"</span><span class="p">,</span><span class="w">
</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="s2">"mm"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mm"</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w">
</span><span class="s2">"rat"</span><span class="p">,</span><span class="w"> </span><span class="s2">"71"</span><span class="p">,</span><span class="w"> </span><span class="s2">"12"</span><span class="p">,</span><span class="w"> </span><span class="s2">"91"</span><span class="p">,</span><span class="w">
</span><span class="s2">"mouse"</span><span class="p">,</span><span class="w"> </span><span class="s2">"58"</span><span class="p">,</span><span class="w"> </span><span class="s2">"8"</span><span class="p">,</span><span class="w"> </span><span class="s2">"47"</span><span class="p">,</span><span class="w">
</span><span class="s2">"vole"</span><span class="p">,</span><span class="w"> </span><span class="s2">"12"</span><span class="p">,</span><span class="w"> </span><span class="s2">"5"</span><span class="p">,</span><span class="w"> </span><span class="s2">"43"</span><span class="w">
</span><span class="p">)</span></code></pre></figure>
<p>The data in tibble form:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">> rodents
# A tibble: 5 × 4
critter tail whisker mass
<chr> <chr> <chr> <chr>
1 NA length length grams
2 NA mm mm NA
3 rat 71 12 91
4 mouse 58 8 47
5 vole 12 5 43 </code></pre></figure>
<p>Note the NAs padding the empty spaces.</p>
<p>To fix messy names broken across rows, we tell <code class="language-plaintext highlighter-rouge">mash_colnames()</code> how many data rows have header fragments in them. In this case, it’s <strong>two</strong>, the names don’t count as data rows. The default separator in the function is the underscore, but we can change it to spaces or dots or whatever.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">rodents</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">mash_colnames</span><span class="p">(</span><span class="n">n_name_rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="c1"># underscore is the default separator </span></code></pre></figure>
<p>The data with fixed names:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"># A tibble: 3 × 4
critter tail_length_mm whisker_length_mm mass_grams
<chr> <chr> <chr> <chr>
1 rat 71 12 91
2 mouse 58 8 47
3 vole 12 5 43 </code></pre></figure>
<p>Pretty nice! No more unnecessary NAs.</p>
<p>In some cases we may recognize messy names or similar issues in a dataset, so we skip the first row. It may look like this, with automated names that do not mean anything.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">rodents_skip</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">tibble</span><span class="o">::</span><span class="n">tribble</span><span class="p">(</span><span class="w">
</span><span class="o">~</span><span class="n">X1</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">X2</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">X3</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">X4</span><span class="p">,</span><span class="w">
</span><span class="s2">"critter"</span><span class="p">,</span><span class="w"> </span><span class="s2">"tail"</span><span class="p">,</span><span class="w"> </span><span class="s2">"whisker"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mass"</span><span class="p">,</span><span class="w">
</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="s2">"length"</span><span class="p">,</span><span class="w"> </span><span class="s2">"length"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grams"</span><span class="p">,</span><span class="w">
</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="s2">"mm"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mm"</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w">
</span><span class="s2">"rat"</span><span class="p">,</span><span class="w"> </span><span class="s2">"71"</span><span class="p">,</span><span class="w"> </span><span class="s2">"12"</span><span class="p">,</span><span class="w"> </span><span class="s2">"91"</span><span class="p">,</span><span class="w">
</span><span class="s2">"mouse"</span><span class="p">,</span><span class="w"> </span><span class="s2">"58"</span><span class="p">,</span><span class="w"> </span><span class="s2">"8"</span><span class="p">,</span><span class="w"> </span><span class="s2">"47"</span><span class="p">,</span><span class="w">
</span><span class="s2">"vole"</span><span class="p">,</span><span class="w"> </span><span class="s2">"12"</span><span class="p">,</span><span class="w"> </span><span class="s2">"5"</span><span class="p">,</span><span class="w"> </span><span class="s2">"43"</span><span class="w">
</span><span class="p">)</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-text" data-lang="text">> rodents_skip
# A tibble: 6 × 4
X1 X2 X3 X4
<chr> <chr> <chr> <chr>
1 critter tail whisker mass
2 NA length length grams
3 NA mm mm NA
4 rat 71 12 91
5 mouse 58 8 47
6 vole 12 5 43 </code></pre></figure>
<p>For cases like these, we can use the <code class="language-plaintext highlighter-rouge">keep_names</code> argument to ignore the names when we’re mashing. In this case we work with <strong>three</strong> data rows, which hold all the pieces of the names.</p>
<p>Like so:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">rodents_skip</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">mash_colnames</span><span class="p">(</span><span class="n">n_name_rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">keep_names</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span></code></pre></figure>
<p>The result is the same as before.</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"># A tibble: 3 × 4
critter tail_length_mm whisker_length_mm mass_grams
<chr> <chr> <chr> <chr>
1 rat 71 12 91
2 mouse 58 8 47
3 vole 12 5 43 </code></pre></figure>
<h3 id="ragged-names">Ragged names</h3>
<p>Let’s set up the example data with the gaps in the names column. When there are gaps in the column names we tend to skip the names during the import step. The <code class="language-plaintext highlighter-rouge">keep_names</code> argument really comes in handy here.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">surveys</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">tibble</span><span class="o">::</span><span class="n">tribble</span><span class="p">(</span><span class="w">
</span><span class="o">~</span><span class="n">X1</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">X2</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">X3</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">X4</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">X5</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">X6</span><span class="p">,</span><span class="w">
</span><span class="s2">"opinion"</span><span class="p">,</span><span class="w"> </span><span class="s2">"age"</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="s2">"source"</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w">
</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="s2">"15 to 19"</span><span class="p">,</span><span class="w"> </span><span class="s2">"20 to 24"</span><span class="p">,</span><span class="w"> </span><span class="s2">"25 to 29"</span><span class="p">,</span><span class="w"> </span><span class="s2">"local"</span><span class="p">,</span><span class="w"> </span><span class="s2">"visitor"</span><span class="p">,</span><span class="w">
</span><span class="s2">"disliked"</span><span class="p">,</span><span class="w"> </span><span class="s2">"8"</span><span class="p">,</span><span class="w"> </span><span class="s2">"8"</span><span class="p">,</span><span class="w"> </span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"30"</span><span class="p">,</span><span class="w"> </span><span class="s2">"30"</span><span class="p">,</span><span class="w">
</span><span class="s2">"neutral"</span><span class="p">,</span><span class="w"> </span><span class="s2">"1"</span><span class="p">,</span><span class="w"> </span><span class="s2">"6"</span><span class="p">,</span><span class="w"> </span><span class="s2">"10"</span><span class="p">,</span><span class="w"> </span><span class="s2">"26"</span><span class="p">,</span><span class="w"> </span><span class="s2">"34"</span><span class="p">,</span><span class="w">
</span><span class="s2">"liked"</span><span class="p">,</span><span class="w"> </span><span class="s2">"11"</span><span class="p">,</span><span class="w"> </span><span class="s2">"6"</span><span class="p">,</span><span class="w"> </span><span class="s2">"3"</span><span class="p">,</span><span class="w"> </span><span class="s2">"15"</span><span class="p">,</span><span class="w"> </span><span class="s2">"45"</span><span class="w">
</span><span class="p">)</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-text" data-lang="text">> surveys
# A tibble: 5 × 6
X1 X2 X3 X4 X5 X6
<chr> <chr> <chr> <chr> <chr> <chr>
1 opinion age NA NA source NA
2 NA 15 to 19 20 to 24 25 to 29 local visitor
3 disliked 8 8 7 30 30
4 neutral 1 6 10 26 34
5 liked 11 6 3 15 45 </code></pre></figure>
<p>The approach is similar, and to deal with the gaps that imply a repeated value across columns, we use the <code class="language-plaintext highlighter-rouge">sliding_headers</code> argument. By setting it to <code class="language-plaintext highlighter-rouge">TRUE</code> we fill the gaps from left to right.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">surveys</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">mash_colnames</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">keep_names</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">sliding_headers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span></code></pre></figure>
<p>The cleaned-up version is ready for further analysis.</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"># A tibble: 3 × 6
opinion `age_15 to 19` `age_20 to 24` `age_25 to 29` source_local source_visitor
<chr> <chr> <chr> <chr> <chr> <chr>
1 disliked 8 8 7 30 30
2 neutral 1 6 10 26 34
3 liked 11 6 3 15 45 </code></pre></figure>
<p>That’s it! Feel free to reach out with any questions/feedback or if I was using the wrong data-structuring terms for this.</p>Luis D. Verde ArregoitiaUseful function from unheadr for data cleaningWaffle charts with svg images2022-09-23T00:00:00+00:002022-09-23T00:00:00+00:00https://luisdva.github.io/rstats/wafflechart<p>Earlier this month I saw a couple of waffle charts on Twitter used as a nice alternative to showing counts and proportions with bar graphs.</p>
<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">Bar charts are easy and precise. But they're boring as hell. If you need your chart to stand out, that's bad.<br /><br />Waffle charts can be an eye-catching alternative. Especially, if you use icons. Use waffle charts to show counts or parts of a whole.<br /><br />Here's how you build them. <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> <a href="https://t.co/RqhaxTeOLy">pic.twitter.com/RqhaxTeOLy</a></p>— Albert Rapp (@rappa753) <a href="https://twitter.com/rappa753/status/1567226189135364096?ref_src=twsrc%5Etfw">September 6, 2022</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/TidyTuesday?src=hash&ref_src=twsrc%5Etfw">#TidyTuesday</a> week 36 Lego Bricks, data from <a href="https://t.co/1l8zxz3rpU">https://t.co/1l8zxz3rpU</a>, courtesy of <a href="https://twitter.com/geokaramanis?ref_src=twsrc%5Etfw">@geokaramanis</a>. <br /><br />Waffle plot inspired by <a href="https://twitter.com/issa_madjid?ref_src=twsrc%5Etfw">@issa_madjid</a> <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> code: <a href="https://t.co/7TLlBIL2gu">https://t.co/7TLlBIL2gu</a> <a href="https://t.co/N6SazCysof">pic.twitter.com/N6SazCysof</a></p>— Lee Olney (@leeolney3) <a href="https://twitter.com/leeolney3/status/1567009220389703680?ref_src=twsrc%5Etfw">September 6, 2022</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>##</p>
<p>Coincidentally, I had struggled with this a few months back for some freelance work and was meaning to document this alternative approach that uses <a href="https://coolbutuseless.github.io/package/ggsvg/" target="_blank"><code class="language-plaintext highlighter-rouge">ggsvg</code></a> to draw svg (Scalable Vector Graphics) image files arranged in a waffle-like grid. Using svg images has the advantage of us being able to map aesthetics such as size, fill, or color to different elements of the svg. This way we don’t need separate image files or icons if we only need them to vary in size or color.</p>
<h2 id="a-quick-example">A quick example:</h2>
<p>Let’s load some libraries and set up some example data in long format. In this case we have different regions and varying numbers of dogs of different age classes for each region.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v3.3.6</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggsvg</span><span class="p">)</span><span class="w"> </span><span class="c1"># [github::coolbutuseless/ggsvg] v0.1.11</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">forcats</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.5.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.10</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.2.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">purrr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.3.4</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">glue</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.6.2</span><span class="w">
</span><span class="n">dogs_long</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tibble</span><span class="p">(</span><span class="w">
</span><span class="n">Region</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"South"</span><span class="p">,</span><span class="w"> </span><span class="s2">"East"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Northwest"</span><span class="p">,</span><span class="w"> </span><span class="s2">"West"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Unknown"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Southeast"</span><span class="p">),</span><span class="w">
</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">80</span><span class="p">,</span><span class="w">
</span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">prob</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.3</span><span class="p">,</span><span class="w"> </span><span class="m">0.4</span><span class="p">,</span><span class="w"> </span><span class="m">0.1</span><span class="p">,</span><span class="w"> </span><span class="m">0.2</span><span class="p">,</span><span class="w"> </span><span class="m">0.2</span><span class="p">,</span><span class="w"> </span><span class="m">0.3</span><span class="p">)</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">Age_class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"Adult"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Puppy"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Not reported"</span><span class="p">),</span><span class="w"> </span><span class="m">80</span><span class="p">,</span><span class="w"> </span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">prob</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="m">0.3</span><span class="p">,</span><span class="w"> </span><span class="m">0.1</span><span class="p">))</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">dogs_long</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dogs_long</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">slice_sample</span><span class="p">(</span><span class="n">prop</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">Region</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">group_length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span></code></pre></figure>
<p>If we want a multi-panel plot with the total number of dogs per region arranged in a waffle grid with one icon per observation, we can group the data by <em>Region</em>, calculate the group sizes, and create a named vector with this bit of information.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">group_lengths</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dogs_long</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">distinct</span><span class="p">(</span><span class="n">Region</span><span class="p">,</span><span class="w"> </span><span class="n">group_length</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">pull</span><span class="p">(</span><span class="n">group_length</span><span class="p">)</span><span class="w">
</span><span class="nf">names</span><span class="p">(</span><span class="n">group_lengths</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dogs_long</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">distinct</span><span class="p">(</span><span class="n">Region</span><span class="p">,</span><span class="w"> </span><span class="n">group_length</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">pull</span><span class="p">(</span><span class="n">Region</span><span class="p">)</span></code></pre></figure>
<p>To arrange the points along an xy grid using <code class="language-plaintext highlighter-rouge">expand.grid()</code>, I borrowed some logic from the <a href="https://git.rud.is/hrbrmstr/waffle.git" target="_blank"><code class="language-plaintext highlighter-rouge">waffle</code></a> package and enforced some simple cutoff points for how many rows I wanted depending on the number of observations. I’m sure this can be improved upon.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># fn to arrange the points</span><span class="w">
</span><span class="n">waff_arrange</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">glength</span><span class="p">,</span><span class="w"> </span><span class="n">gname</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">glength</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">rows</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">glength</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">glength</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="m">10</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">rows</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">rows</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">4</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">dat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">expand.grid</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">rows</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">seq_len</span><span class="p">(</span><span class="nf">ceiling</span><span class="p">(</span><span class="n">glength</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">rows</span><span class="p">)))[</span><span class="m">1</span><span class="o">:</span><span class="n">glength</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="w">
</span><span class="n">dat</span><span class="o">$</span><span class="n">glength</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">glength</span><span class="w">
</span><span class="n">dat</span><span class="o">$</span><span class="n">category</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gname</span><span class="w">
</span><span class="n">dat</span><span class="w">
</span><span class="c1"># create the grid for these groups</span><span class="w">
</span><span class="n">gridsxy</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">map2_df</span><span class="p">(</span><span class="n">group_lengths</span><span class="p">,</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">group_lengths</span><span class="p">),</span><span class="w"> </span><span class="n">waff_arrange</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>With the grid set up, it can be bound to the original data (arranging first to keep the groups together). Then some minor preparation for the plotting can help, such as re-leveling factors and setting up labels.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">dogsW</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dogs_long</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">arrange</span><span class="p">(</span><span class="n">Region</span><span class="p">,</span><span class="w"> </span><span class="n">group_length</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">bind_cols</span><span class="p">(</span><span class="n">arrange</span><span class="p">(</span><span class="n">gridsxy</span><span class="p">,</span><span class="w"> </span><span class="n">category</span><span class="p">,</span><span class="w"> </span><span class="n">glength</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">ungroup</span><span class="p">()</span><span class="w">
</span><span class="n">dogsW</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dogsW</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">Age_class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fct_relevel</span><span class="p">(</span><span class="n">Age_class</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Puppy"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Adult"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Not reported"</span><span class="p">)))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">Region_n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">glue</span><span class="p">(</span><span class="s2">"{Region} ({glength})"</span><span class="p">))</span></code></pre></figure>
<p>To plot the xy data as points with a nicer distribution of space, we can draw them on top of a transparent grid from <code class="language-plaintext highlighter-rouge">geom_raster()</code>. This would be the basis of the waffle chart. This already shows the total number of dogs per region and the overall distribution by age class within each one.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ggplot</span><span class="p">(</span><span class="n">dogsW</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">geom_raster</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">),</span><span class="n">fill</span><span class="o">=</span><span class="s2">"transparent"</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Age_class</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="n">Age_class</span><span class="p">),</span><span class="n">pch</span><span class="o">=</span><span class="m">21</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">Region_n</span><span class="p">,</span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free"</span><span class="p">)</span></code></pre></figure>
<figure>
<a href="/assets/images/waffpoints.png"><img src="/assets/images/waffpoints.png" /></a>
<figcaption>basic</figcaption>
</figure>
<p>Next, we need an svg image. I downloaded this jumping dog (below) from <a href="https://freesvg.org/happy-running-dog-vector-image" target="_blank">freesvg.org</a> into my working directory. Next,<code class="language-plaintext highlighter-rouge">ggsvg</code> needs this image as text, so we read its contents as a string.</p>
<figure>
<a href="/assets/images/dog.png"><img src="/assets/images/dog.png" /></a>
<figcaption>cute</figcaption>
</figure>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">dog_svg</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="n">readLines</span><span class="p">(</span><span class="s2">"dognoclp.svg"</span><span class="p">),</span><span class="w"> </span><span class="n">collapse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"\n"</span><span class="p">)</span></code></pre></figure>
<p>For fun, let’s map the age class to the color of the dog’s collar. The <code class="language-plaintext highlighter-rouge">ggsvg</code> documentation explains that “we can use the <code class="language-plaintext highlighter-rouge">css()</code> helper function to target aesthetics at selected elements within an SVG using <code class="language-plaintext highlighter-rouge">css(selector, property = value)</code>”.</p>
<p>To identify the dog collar in the image, I opened the svg in a browser, used the Inspect Element tool, and copied the css selector that I needed.</p>
<figure>
<a href="/assets/images/dogcss.png"><img src="/assets/images/dogcss.png" /></a>
<figcaption>Inspecting elements</figcaption>
</figure>
<p>This selector feeds into <code class="language-plaintext highlighter-rouge">geom_svg_point()</code> and as the <code class="language-plaintext highlighter-rouge">aesthetics</code> argument to <code class="language-plaintext highlighter-rouge">scale_svg_fill_manual()</code>. Point size can be mapped to age class, so that puppies appear smaller than adults, and the rest is some minor tweaking to get a nice look. For reference, this includes putting the legend at the bottom with the title above and the labels below.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ggplot</span><span class="p">(</span><span class="n">dogsW</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_raster</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"transparent"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point_svg</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">css</span><span class="p">(</span><span class="w">
</span><span class="n">selector</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"svg:nth-child(1) > path:nth-child(4)"</span><span class="p">,</span><span class="w">
</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Age_class</span><span class="w">
</span><span class="p">),</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Age_class</span><span class="p">),</span><span class="w"> </span><span class="n">svg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dog_svg</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">Region_n</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_size_manual</span><span class="p">(</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">16</span><span class="p">,</span><span class="w"> </span><span class="m">14</span><span class="p">),</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Age"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_svg_fill_manual</span><span class="p">(</span><span class="w">
</span><span class="n">aesthetics</span><span class="w"> </span><span class="o">=</span><span class="w">
</span><span class="n">css</span><span class="p">(</span><span class="n">selector</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"svg:nth-child(1) > path:nth-child(4)"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Age_class</span><span class="p">),</span><span class="w">
</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"#806cff"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#fdb731"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#194162"</span><span class="p">),</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Age"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">ggthemes</span><span class="o">::</span><span class="n">theme_few</span><span class="p">(</span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Lato"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">axis.text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.ticks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">strip.text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">face</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bold"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">18</span><span class="p">),</span><span class="w">
</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bottom"</span><span class="p">,</span><span class="w"> </span><span class="n">legend.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">guides</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">guide_legend</span><span class="p">(</span><span class="w">
</span><span class="n">title.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"top"</span><span class="p">,</span><span class="w"> </span><span class="n">label.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bottom"</span><span class="p">,</span><span class="w">
</span><span class="n">title.hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="w">
</span><span class="p">))</span></code></pre></figure>
<figure>
<a href="/assets/images/dogwaffles.png"><img src="/assets/images/dogwaffles.png" /></a>
<figcaption>Look at the little collars!</figcaption>
</figure>
<p>Looks nice! Thanks to Mike FC for developing <code class="language-plaintext highlighter-rouge">ggsvg</code> and for help with understanding the <code class="language-plaintext highlighter-rouge">css()</code> function.</p>
<p>All feedback welcome.</p>Luis D. Verde ArregoitiaCreating waffle charts with ggsvgannotater 0.2.02022-04-25T00:00:00+00:002022-04-25T00:00:00+00:00https://luisdva.github.io/rstats/annotaterv020<p>Roughly two years after the first release of a working version of <code class="language-plaintext highlighter-rouge">annotater</code>, I’m happy to announce the release of version 0.2.0.</p>
<p>The backstory of the package is <a href="https://luisdva.github.io/rstats/annotater/" target="_blank">here</a>, and since developing these functions I use them pretty much every time I share code.</p>
<p>In my opinion, this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">MCMCglmm</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v2.33</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">purrr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.3.4</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">scico</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.3.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">forcats</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.5.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v3.3.5</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggridges</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.5.3</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">patchwork</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.1.1</span></code></pre></figure>
<p>Is preferable to this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">MCMCglmm</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">purrr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">scico</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">forcats</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggridges</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">patchwork</span><span class="p">)</span><span class="w"> </span></code></pre></figure>
<p>Also, the function annotation tools have been quite useful for teaching, collaborative work, and dealing with other people’s code.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">readr</span><span class="p">)</span><span class="w"> </span><span class="c1"># read_csv</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># %>% select filter</span><span class="w">
</span><span class="n">dat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read_csv</span><span class="p">(</span><span class="s2">"myfile.csv"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">X1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">X1</span><span class="o">></span><span class="m">1</span><span class="p">)</span></code></pre></figure>
<h2 id="annotater-v020">annotater v0.2.0</h2>
<p>Version 0.2.0 comes with support for package load calls using the <code class="language-plaintext highlighter-rouge">p_load</code> function from <a href="https://github.com/trinker/pacman" target="_blank"><code class="language-plaintext highlighter-rouge">pacman</code></a></p>
<figure>
<a href="/assets/images/annotpacmanRepos.gif"><img src="/assets/images/annotpacmanRepos.gif" /></a>
<figcaption>click for high res</figcaption>
</figure>
<figure>
<a href="/assets/images/annotpacmanFns.gif"><img src="/assets/images/annotpacmanFns.gif" width="900" /></a>
<figcaption>click for high res</figcaption>
</figure>
<p>Other minor fixes include support for indented library load calls (e.g., when they are inside a function definition; see below) and more unit tests.</p>
<figure>
<a href="/assets/images/annotateindented.gif"><img src="/assets/images/annotateindented.gif" /></a>
<figcaption>click for high res</figcaption>
</figure>
<p>Read more about <code class="language-plaintext highlighter-rouge">annotater</code> in the dedicated packagedown <a href="https://annotater.liomys.mx" target="_blank">site</a></p>
<p>Check out the code <a href="https://github.com/luisDVA/annotater" target="_blank">here</a></p>
<p>… and install from GitHub like so:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># install.packages("remotes")</span><span class="w">
</span><span class="n">remotes</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"luisdva/annotater"</span><span class="p">)</span></code></pre></figure>
<p>I’m gearing up for a CRAN submission, so all testing and feedback is welcome.</p>Luis D. Verde ArregoitiaAnnotate R code with details on the packages being loadedNBA player names in hip hop lyrics2022-04-12T00:00:00+00:002022-04-12T00:00:00+00:00https://luisdva.github.io/rstats/NBA-rap<p>Hip hop and basketball have always had a unique and close relationship, so it is not surprising when player names and various terms from the sport appear in rap lyrics. This post works though the R code needed to find NBA player names in the lyrics of ~4000 songs from 20 hip hop artists and plotting the resulting patterns.</p>
<p>I’m writing this mainly to document several tricks and hacks for working with and ultimately plotting large(ish) volumes of text-based data. The choice of artists is a mix of rappers that I listen to, and others that seemed well-represented in the Genius lyrics data.</p>
<p>The overall workflow is split into getting a hold of lyrics and NBA player names and finding out: which artists are mentioning which players in their songs, and which players get mentioned most.</p>
<h2 id="getting-the-lyrics">Getting the lyrics</h2>
<p>Until last year, lyrics could be accessed from the <a href="https://genius.com/" target="_blank">Genius</a> website (a massive collection of song lyrics and crowd-sourced annotations) via the Genius API with the <a href="https://github.com/ewenme/geniusr" target="_blank"><code class="language-plaintext highlighter-rouge">geniusr</code></a> package. At present, because of changes to the Genius API and legal terms, we can no longer fetch lyrics directly with <code class="language-plaintext highlighter-rouge">geniusr</code> directly without legally-gray webscraping (if you must, the relevant <code class="language-plaintext highlighter-rouge">geniusr</code> functions can be patched with the advice in this <a href="https://github.com/ewenme/geniusr/issues/17" target="_blank">issue</a> and they’ll work fine - but see <a href="https://github.com/JosiahParry/genius" target="_blank">here</a> for more information).</p>
<p>This little example uses many cool packages, let’s set them up first.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># bball player names in hip hop</span><span class="w">
</span><span class="c1"># Load libraries ----</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">geniusr</span><span class="p">)</span><span class="w"> </span><span class="c1"># [github::ewenme/geniusr] v1.2.0.9000</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">nbastatR</span><span class="p">)</span><span class="w"> </span><span class="c1"># [github::abresler/nbastatR] v0.1.1506</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rvest</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.2</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">xml2</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.3.3</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rlang</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tibble</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v3.1.6</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.8</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">purrr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.3.4</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.4.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tidytext</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.3.2</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.2.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">forcats</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.5.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v3.3.5</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">artyfarty</span><span class="p">)</span><span class="w"> </span><span class="c1"># [github::datarootsio/artyfarty] v0.0.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggfittext</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.9.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggtext</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.1.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">scico</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.3.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">extrafont</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.17</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">fuzzyjoin</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.1.6</span></code></pre></figure>
<p>Back when the API still produced tibbles with lyrics, I searched for the respective ID for the following artists, and used each artist ID to obtain a vector with all the unique song IDS for each of the following artists:</p>
<p>The artist names:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">c("Migos", "Ghostface Killah", "2 Chainz", "G-Unit", "Beastie Boys", "Army of the Pharaohs", "Flatbush Zombies",
"R.A. The Rugged Man", "Das EFX", "Jedi Mind Tricks", "Gang Starr", "Mobb Deep", "Wu-Tang Clan", "Kool G Rap",
"DMX", "MF DOOM", "213", "Goodie Mob", "Sage Francis") </code></pre></figure>
<p>Here are examples for two of the artists. Each vector of song ids gets named consistently, so we can combine them all later (I didn’t iterate here so I could check that I was getting the correct matches for each search term).</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">search_artist</span><span class="p">(</span><span class="s2">"outerspace"</span><span class="p">)</span><span class="w"> </span><span class="c1"># 1836</span><span class="w">
</span><span class="n">OSall</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">geniusr</span><span class="o">::</span><span class="n">get_artist_songs_df</span><span class="p">(</span><span class="m">1836</span><span class="p">)</span><span class="w">
</span><span class="n">allOSsong_ids</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">OSall</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">pull</span><span class="p">(</span><span class="n">song_id</span><span class="p">)</span><span class="w">
</span><span class="n">geniusr</span><span class="o">::</span><span class="n">search_artist</span><span class="p">(</span><span class="s2">"gang starr"</span><span class="p">)</span><span class="w"> </span><span class="c1"># 220</span><span class="w">
</span><span class="n">GSTall</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">geniusr</span><span class="o">::</span><span class="n">get_artist_songs_df</span><span class="p">(</span><span class="m">220</span><span class="p">)</span><span class="w">
</span><span class="n">allGSTsong_ids</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">GSTall</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">pull</span><span class="p">(</span><span class="n">song_id</span><span class="p">)</span></code></pre></figure>
<p>Once we found all the artists we wanted, we can get the vector objects from our enviroment with <code class="language-plaintext highlighter-rouge">mget</code>.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">### combine all artists</span><span class="w">
</span><span class="n">all_song_ids</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">flatten_chr</span><span class="p">(</span><span class="n">mget</span><span class="p">(</span><span class="n">ls</span><span class="p">(</span><span class="n">pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"^all"</span><span class="p">)))</span></code></pre></figure>
<p>To get all the lyrics, we can iterate over the final vector of song ID (with a multi-statement lambda function to add some time between requests and not spam the server).</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># iterate and get all song lyrics</span><span class="w">
</span><span class="n">all_lyricsdf</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">map_df</span><span class="p">(</span><span class="n">all_song_ids</span><span class="p">,</span><span class="o">~</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">Sys.sleep</span><span class="p">(</span><span class="n">sample</span><span class="p">(</span><span class="n">seq</span><span class="p">(</span><span class="m">0.5</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0.25</span><span class="p">),</span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="n">get_lyrics_id</span><span class="p">(</span><span class="n">song_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.x</span><span class="p">)})</span></code></pre></figure>
<p>The resulting tibble has >200,000 rows, one per line for roughly 4000 songs. Here’s a random sample of 10 rows.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">all_lyrsdf</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">slice_sample</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="m">10</span><span class="p">)</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># A tibble: 10 × 6</span><span class="w">
</span><span class="n">line</span><span class="w"> </span><span class="n">section_name</span><span class="w"> </span><span class="n">section_artist</span><span class="w"> </span><span class="n">song_name</span><span class="w"> </span><span class="n">artist_name</span><span class="w"> </span><span class="n">song_id</span><span class="w">
</span><span class="o"><</span><span class="n">chr</span><span class="o">></span><span class="w"> </span><span class="o"><</span><span class="n">chr</span><span class="o">></span><span class="w"> </span><span class="o"><</span><span class="n">chr</span><span class="o">></span><span class="w"> </span><span class="o"><</span><span class="n">chr</span><span class="o">></span><span class="w"> </span><span class="o"><</span><span class="n">chr</span><span class="o">></span><span class="w"> </span><span class="o"><</span><span class="n">chr</span><span class="o">></span><span class="w">
</span><span class="m">1</span><span class="w"> </span><span class="n">I</span><span class="w"> </span><span class="n">know</span><span class="w"> </span><span class="n">what</span><span class="w"> </span><span class="n">it</span><span class="err">…</span><span class="w"> </span><span class="n">Verse</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="n">Young</span><span class="w"> </span><span class="n">Buck</span><span class="w"> </span><span class="n">Footprin</span><span class="err">…</span><span class="w"> </span><span class="n">G</span><span class="o">-</span><span class="n">Unit</span><span class="w"> </span><span class="m">20762</span><span class="w">
</span><span class="m">2</span><span class="w"> </span><span class="n">Dracos</span><span class="w"> </span><span class="n">and</span><span class="w"> </span><span class="n">MAC</span><span class="err">…</span><span class="w"> </span><span class="n">Verse</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="n">Offset</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">Quavo</span><span class="w"> </span><span class="n">Racks</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="err">…</span><span class="w"> </span><span class="n">Migos</span><span class="w"> </span><span class="m">5555420</span><span class="w">
</span><span class="m">3</span><span class="w"> </span><span class="n">I</span><span class="w"> </span><span class="n">don</span><span class="s1">'t trust … Hook Quavo Trust No… Migos 2340835
4 Leaving my dre… Verse 3 Offset White Sa… Migos 3468920
5 All you know i… Verse 3 2 Chainz I Said Me 2 Chainz 4346684
6 Pickin my mark… Havoc Mobb Deep It’s Over Mobb Deep 33514
7 You should see… Chorus The Notorious… The Dang… R.A. The R… 136500
8 Close your ear… Intro Havoc So Long Mobb Deep 33536
9 I'</span><span class="n">m</span><span class="w"> </span><span class="n">off</span><span class="w"> </span><span class="n">style</span><span class="w"> </span><span class="err">…</span><span class="w"> </span><span class="n">Verse</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="n">Ghostface</span><span class="w"> </span><span class="n">Kil</span><span class="err">…</span><span class="w"> </span><span class="n">Ron</span><span class="w"> </span><span class="n">O</span><span class="err">’</span><span class="n">Ne</span><span class="err">…</span><span class="w"> </span><span class="n">Wu</span><span class="o">-</span><span class="n">Tang</span><span class="w"> </span><span class="n">Cl</span><span class="err">…</span><span class="w"> </span><span class="m">491657</span><span class="w">
</span><span class="m">10</span><span class="w"> </span><span class="n">A</span><span class="w"> </span><span class="n">lousy</span><span class="w"> </span><span class="n">condit</span><span class="err">…</span><span class="w"> </span><span class="n">Yeeeaah</span><span class="p">,</span><span class="w"> </span><span class="n">yo</span><span class="err">…</span><span class="w"> </span><span class="n">Louis</span><span class="w"> </span><span class="n">Logic</span><span class="w"> </span><span class="n">Over</span><span class="w"> </span><span class="n">the</span><span class="err">…</span><span class="w"> </span><span class="n">Louis</span><span class="w"> </span><span class="n">Logic</span><span class="w"> </span><span class="m">29385</span><span class="w"> </span></code></pre></figure>
<p>For reference, this <a href="https://statnamara.wordpress.com/2021/01/26/scraping-analysing-and-visualising-lyrics-in-r/" target="_blank">post</a> by Tom MacNamara also shows how to get and visualize lyrics from the same data source.</p>
<h3 id="text-analysis">Text analysis</h3>
<p>For this exercise, let’s split up (tokenize) all the lines into bigrams (consecutive sequences of two words) with <code class="language-plaintext highlighter-rouge">unnest_token</code> from <code class="language-plaintext highlighter-rouge">tidytext</code>. For the next step, it’s also convenient to add new columns with the bigrams split into separate columns.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># tokenize </span><span class="w">
</span><span class="n">lyric_bigrams</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">all_lyrsdf</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">unnest_tokens</span><span class="p">(</span><span class="n">BGlyric</span><span class="p">,</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ngrams"</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="o">=</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">BGlyric</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">separate</span><span class="p">(</span><span class="n">BGlyric</span><span class="p">,</span><span class="n">into</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s2">"w1"</span><span class="p">,</span><span class="s2">"w2"</span><span class="p">),</span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">" "</span><span class="p">,</span><span class="n">remove</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span></code></pre></figure>
<p>To reduce the number of comparisons and because people’s names are the whole point of this exercise, we can filter out the rows in the lyrics data which match any of the words in a custom list of stop words (extremely common words not useful for analysis, such as “the”, “of”, “to”, etc.) from a generic text file.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># stopwords</span><span class="w">
</span><span class="n">stopWords</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tibble</span><span class="p">(</span><span class="n">word</span><span class="o">=</span><span class="n">readr</span><span class="o">::</span><span class="n">read_lines</span><span class="p">(</span><span class="s2">"minimal-stop.txt"</span><span class="p">))</span><span class="w">
</span><span class="n">lir</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lyric_bigrams</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">w1</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">stopWords</span><span class="o">$</span><span class="n">word</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="o">!</span><span class="n">w2</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">stopWords</span><span class="o">$</span><span class="n">word</span><span class="p">)</span></code></pre></figure>
<h2 id="player-names">Player names</h2>
<p>The <code class="language-plaintext highlighter-rouge">dictionary_bref_players()</code> function from <code class="language-plaintext highlighter-rouge">nbastatR</code> will get us a dictionary of NBA player names (from the <a href="https://www.basketball-reference.com/players/">Basketball Reference</a> website) in tibble form. No arguments are needed, but we can just remove names from the BAA league to simplify the process.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># NBA player names</span><span class="w">
</span><span class="n">playerNames</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">nbastatR</span><span class="o">::</span><span class="n">dictionary_bref_players</span><span class="p">()</span><span class="w">
</span><span class="n">playerNames</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">playerNames</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">countSeasons</span><span class="p">))</span></code></pre></figure>
<h2 id="fuzzy-joining">Fuzzy joining</h2>
<p>For a more flexible merge of the two columns (the player names and the lyric bigrams), the <code class="language-plaintext highlighter-rouge">fuzzyjoin</code> package implements various methods for fuzzy string matching, to allow for minor variations in spelling. For no particular reason, I matched the columns with Levenshtein distance and a maximum distance of one. Be aware that all these comparisons consume a lot of memory.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">joineddfs</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">stringdist_left_join</span><span class="p">(</span><span class="n">playerNames</span><span class="p">,</span><span class="w">
</span><span class="n">lir</span><span class="p">,</span><span class="n">by</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s2">"namePlayerBREF"</span><span class="o">=</span><span class="s2">"BGlyric"</span><span class="p">),</span><span class="n">max_dist</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="w">
</span><span class="n">method</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s2">"lv"</span><span class="p">),</span><span class="w">
</span><span class="n">ignore_case</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span></code></pre></figure>
<p>After some cleaning and deduplicating, for the main visualization we can count how many times each player was mentioned. I manually filtered out eight homonyms (I doubt that lyrics mentioning Michael Jackson or Mel Gibson referred to a Knicks point guard (1987-1990) or a guard for the Lakers in 1964, respectively). I most likely missed some.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># clean and deduplicate</span><span class="w">
</span><span class="n">playermentions</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">joineddfs</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">BGlyric</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">distinct</span><span class="p">(</span><span class="n">song_id</span><span class="p">,</span><span class="n">BGlyric</span><span class="p">,</span><span class="n">.keep_all</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1"># remove homonyms</span><span class="w">
</span><span class="n">mentions_countF</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">mentions_count</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">namePlayerBREF</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Michael Jackson"</span><span class="p">,</span><span class="s2">"Dan King"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Bill Smith"</span><span class="p">,</span><span class="s2">"Ed Horton"</span><span class="p">,</span><span class="s2">"Larry Sanders"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Bobby Brown"</span><span class="p">,</span><span class="s2">"Mel Gibson"</span><span class="p">,</span><span class="s2">"Harry Davis"</span><span class="p">))</span></code></pre></figure>
<p>The final data for plotting results from a series of hacks to rank and arrange the names according to their number of mentions (<code class="language-plaintext highlighter-rouge">cur_group_id()</code> is our friend here), and then to ‘conditionally’ wrap some of the names so they fit nicely in the plot. My improvised approach for categories with many names was to stack them side by side with <code class="language-plaintext highlighter-rouge">lead</code>, slice out every other row (note the <code class="language-plaintext highlighter-rouge">%%</code> operator), then put things back together.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># prepare data for plotting</span><span class="w">
</span><span class="n">freqsPL</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">mentions_countF</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">add_count</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ncat"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">nr</span><span class="o">=</span><span class="n">forcats</span><span class="o">::</span><span class="n">fct_inorder</span><span class="p">(</span><span class="n">as.factor</span><span class="p">(</span><span class="n">n</span><span class="p">)))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">nr</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="p">),</span><span class="n">.by_group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">rank</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cur_group_id</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">forticks</span><span class="o">=</span><span class="n">paste0</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w">
</span><span class="n">freqsPLtop</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">freqsPL</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">n</span><span class="o">></span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="o">=</span><span class="n">str_wrap</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="p">,</span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">))</span><span class="w">
</span><span class="n">freqsPLmid</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">freqsPL</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">n</span><span class="o">==</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="o">=</span><span class="n">paste</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="w"> </span><span class="n">lead</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="p">)))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">slice</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">row_number</span><span class="p">()</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="n">freqsPLbottom</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">freqsPL</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">n</span><span class="o">==</span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="o">=</span><span class="n">paste</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="p">,</span><span class="s2">" "</span><span class="p">,</span><span class="w"> </span><span class="n">lead</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="p">)))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">slice</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">row_number</span><span class="p">()</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="n">allfreqsPL</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bind_rows</span><span class="p">(</span><span class="n">freqsPLtop</span><span class="p">,</span><span class="n">freqsPLmid</span><span class="p">,</span><span class="n">freqsPLbottom</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="o">=</span><span class="n">str_remove</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="p">,</span><span class="s2">" NA"</span><span class="p">))</span><span class="w">
</span></code></pre></figure>
<h2 id="plotting">Plotting</h2>
<p>Before the plotting, lets set up a gradient background for the plot panel following this <a href="https://localcoder.org/r-ggplot-background-gradient-coloring#solution_1" target="_blank">entry</a>, and a separate tibble for a colorful annotation using <code class="language-plaintext highlighter-rouge">ggtext</code>.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="w"> </span><span class="c1"># gradient background grob</span><span class="w">
</span><span class="n">g</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">grid</span><span class="o">::</span><span class="n">rasterGrob</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"#272822"</span><span class="p">,</span><span class="s2">"black"</span><span class="p">),</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="n">unit</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="s2">"npc"</span><span class="p">),</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unit</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="s2">"npc"</span><span class="p">),</span><span class="w">
</span><span class="n">interpolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1"># custom text annotation</span><span class="w">
</span><span class="n">forsubtitle</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">tibble</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"<span style = 'color: #0F58FF;'>**3988**</span> songs from <span style = 'color: #0F58FF;'>**21**</span> artists <br> <span style = 'color: #F95C09;'>**5022**</span> player names"</span><span class="p">,</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.7</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">35</span><span class="p">)</span></code></pre></figure>
<p>The data now looks like this:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">> allfreqsPL
# A tibble: 90 × 6
# Groups: nr [8]
namePlayerBREF n ncat nr rank forticks
<chr> <int> <int> <fct> <int> <chr>
1 "Michael\nJordan" 9 1 9 1 9
2 "Scottie\nPippen" 8 1 8 2 8
3 "Shaquille\nO'Neal" 6 3 6 3 6
4 "Reggie\nMiller" 6 3 6 3 6
5 "LeBron James" 6 3 6 3 6
6 "Yao Ming" 5 6 5 4 5
7 "Steve Nash" 5 6 5 4 5
8 "Shawn Kemp" 5 6 5 4 5
9 "Paul Pierce" 5 6 5 4 5
10 "Gilbert\nArenas" 5 6 5 4 5
# … with 80 more rows </code></pre></figure>
<p>Now we can plot the mentions as text stacked inside bars, using <code class="language-plaintext highlighter-rouge">ggfittext</code> for dynamic resizing. The ‘<em>Rock Salt</em>’ font is from Google Fonts, downloaded to my Linux system using Typecatcher and shown using <code class="language-plaintext highlighter-rouge">extrafont</code>.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ggplot</span><span class="p">(</span><span class="n">allfreqsPL</span><span class="p">,</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">factor</span><span class="p">(</span><span class="n">rank</span><span class="p">),</span><span class="n">y</span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="n">label</span><span class="o">=</span><span class="n">namePlayerBREF</span><span class="p">))</span><span class="o">+</span><span class="w">
</span><span class="n">annotation_custom</span><span class="p">(</span><span class="n">g</span><span class="p">,</span><span class="w"> </span><span class="n">xmin</span><span class="o">=-</span><span class="kc">Inf</span><span class="p">,</span><span class="w"> </span><span class="n">xmax</span><span class="o">=</span><span class="kc">Inf</span><span class="p">,</span><span class="w"> </span><span class="n">ymin</span><span class="o">=-</span><span class="kc">Inf</span><span class="p">,</span><span class="w"> </span><span class="n">ymax</span><span class="o">=</span><span class="kc">Inf</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_bar_text</span><span class="p">(</span><span class="n">position</span><span class="o">=</span><span class="s2">"stack"</span><span class="p">,</span><span class="n">min.size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w">
</span><span class="n">family</span><span class="o">=</span><span class="s2">"Rock Salt"</span><span class="p">,</span><span class="n">reflow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">,</span><span class="w">
</span><span class="n">grow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">place</span><span class="o">=</span><span class="s2">"left"</span><span class="p">,</span><span class="n">outside</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_discrete</span><span class="p">(</span><span class="n">breaks</span><span class="o">=</span><span class="m">1</span><span class="o">:</span><span class="m">8</span><span class="p">,</span><span class="n">labels</span><span class="o">=</span><span class="n">unique</span><span class="p">(</span><span class="n">freqsPL</span><span class="o">$</span><span class="n">forticks</span><span class="p">),</span><span class="w">
</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">expansion</span><span class="p">(</span><span class="n">add</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">0.01</span><span class="p">,</span><span class="m">0.5</span><span class="p">)))</span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.01</span><span class="p">,</span><span class="m">0.01</span><span class="p">))</span><span class="o">+</span><span class="w">
</span><span class="n">geom_richtext</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">forsubtitle</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="o">=</span><span class="n">y</span><span class="p">,</span><span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">),</span><span class="w">
</span><span class="n">color</span><span class="o">=</span><span class="s2">"white"</span><span class="p">,</span><span class="n">family</span><span class="o">=</span><span class="s2">"Lato Thin"</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">9</span><span class="p">,</span><span class="w">
</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="n">label.color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="n">hjust</span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="n">panel.grid.major</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.line.y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.text.y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.ticks.y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">panel.background</span><span class="o">=</span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">panel.border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">plot.background</span><span class="o">=</span><span class="n">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">),</span><span class="w">
</span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">vjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-0.5</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">32</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="o">=</span><span class="m">0.03</span><span class="p">,</span><span class="w">
</span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Lato Medium"</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="s2">"white"</span><span class="p">),</span><span class="w">
</span><span class="n">axis.text.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">family</span><span class="o">=</span><span class="s2">"Lato Heavy"</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">19</span><span class="p">),</span><span class="w">
</span><span class="n">axis.ticks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.title.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="s2">"white"</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">17</span><span class="p">))</span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s2">"mentions"</span><span class="p">,</span><span class="n">y</span><span class="o">=</span><span class="s2">""</span><span class="p">,</span><span class="w">
</span><span class="n">title</span><span class="o">=</span><span class="s2">"NBA players mentioned in hip hop songs"</span><span class="p">)</span></code></pre></figure>
<figure>
<a href="/assets/images/playernames.png"><img src="/assets/images/playernames.png" width="900" /></a>
<figcaption>click for high res</figcaption>
</figure>
<p>As a complement, we can figure out the number of distinct players mentioned by the different artists.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># to get player per artist</span><span class="w">
</span><span class="n">lyrsplyrs</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">joineddfs</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">BGlyric</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="p">,</span><span class="n">song_id</span><span class="p">,</span><span class="n">song_name</span><span class="p">,</span><span class="n">artist_name</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">song_id</span><span class="o">=</span><span class="nf">as.character</span><span class="p">(</span><span class="n">song_id</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">left_join</span><span class="p">(</span><span class="n">all_lyrsdf</span><span class="p">)</span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">namePlayerBREF</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Michael Jackson"</span><span class="p">,</span><span class="s2">"Dan King"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Bill Smith"</span><span class="p">,</span><span class="s2">"Ed Horton"</span><span class="p">,</span><span class="s2">"Larry Sanders"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Bobby Brown"</span><span class="p">,</span><span class="s2">"Mel Gibson"</span><span class="p">,</span><span class="s2">"Harry Davis"</span><span class="p">))</span><span class="w">
</span><span class="n">artists_nPlayers</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">lyrsplyrs</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">distinct</span><span class="p">(</span><span class="n">artist_name</span><span class="p">,</span><span class="n">song_id</span><span class="p">,</span><span class="n">namePlayerBREF</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">artist_name</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">distinct</span><span class="p">(</span><span class="n">namePlayerBREF</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">count</span><span class="p">(</span><span class="n">artist_name</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span></code></pre></figure>
<p>After some basic wrangling we get this two-column tibble with artists and the total number of players mentioned.</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"> artists_nPlayers
# A tibble: 21 × 2
# Groups: artist_name [21]
artist_name n
<chr> <int>
1 Migos 67
2 Ghostface Killah 28
3 2 Chainz 16
4 G-Unit 14
5 Beastie Boys 13
6 Army of the Pharaohs 11
7 Flatbush Zombies 10
8 R.A. The Rugged Man 8
9 Das EFX 7
10 Jedi Mind Tricks 7
# … with 11 more rows </code></pre></figure>
<p>To plot these values we can use <code class="language-plaintext highlighter-rouge">geom_segment</code></p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">artists_nPlayers</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">n</span><span class="o">></span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">ggplot</span><span class="p">()</span><span class="o">+</span><span class="w">
</span><span class="n">geom_segment</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">fct_reorder</span><span class="p">(</span><span class="n">artist_name</span><span class="p">,</span><span class="n">n</span><span class="p">),</span><span class="w">
</span><span class="n">xend</span><span class="o">=</span><span class="n">fct_reorder</span><span class="p">(</span><span class="n">artist_name</span><span class="p">,</span><span class="n">n</span><span class="p">),</span><span class="n">y</span><span class="o">=</span><span class="m">0</span><span class="p">,</span><span class="n">yend</span><span class="o">=</span><span class="n">n</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="n">n</span><span class="p">),</span><span class="w">
</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">geom_text</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">fct_reorder</span><span class="p">(</span><span class="n">artist_name</span><span class="p">,</span><span class="n">n</span><span class="p">),</span><span class="n">y</span><span class="o">=</span><span class="n">n</span><span class="m">+0.5</span><span class="p">,</span><span class="n">label</span><span class="o">=</span><span class="n">n</span><span class="p">),</span><span class="n">color</span><span class="o">=</span><span class="s2">"gray"</span><span class="p">,</span><span class="w">
</span><span class="n">family</span><span class="o">=</span><span class="s2">"Lato Medium"</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">5</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">coord_flip</span><span class="p">()</span><span class="o">+</span><span class="w">
</span><span class="n">scale_color_scico</span><span class="p">(</span><span class="n">palette</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'nuuk'</span><span class="p">,</span><span class="n">direction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="n">guide</span><span class="o">=</span><span class="s1">'none'</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">expansion</span><span class="p">(</span><span class="n">add</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">)))</span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s2">"Number of players mentioned"</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="s2">"white"</span><span class="p">,</span><span class="n">family</span><span class="o">=</span><span class="s2">"Lato Medium"</span><span class="p">),</span><span class="w">
</span><span class="n">axis.ticks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">panel.grid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.text.y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Rock Salt"</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="s2">"white"</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">18</span><span class="p">),</span><span class="w">
</span><span class="n">axis.text.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">plot.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="s2">"black"</span><span class="p">),</span><span class="w">
</span><span class="n">panel.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="o">=</span><span class="s2">"black"</span><span class="p">))</span><span class="w"> </span></code></pre></figure>
<figure>
<a href="/assets/images/artistsplyrs.png"><img src="/assets/images/artistsplyrs.png" width="660" /></a>
<figcaption>click to enlarge</figcaption>
</figure>
<p>Lastly, for those of us interested in the lines that actually contain player names, we can produce a tibble of artists, song names, and lines by cleaning up the special characters in the <em>line</em> variable and keeping only the rows which contain a name.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># clean weird characters, filter lines to keep mentions only</span><span class="w">
</span><span class="n">linesplyrs</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">lyrsplyrs</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">linecln</span><span class="o">=</span><span class="n">str_remove_all</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="s2">"[^\\w\\s]"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="n">stringi</span><span class="o">::</span><span class="n">stri_detect_regex</span><span class="p">(</span><span class="n">linecln</span><span class="p">,</span><span class="n">namePlayerBREF</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">distinct</span><span class="p">(</span><span class="n">song_id</span><span class="p">,</span><span class="n">.keep_all</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span></code></pre></figure>
<p>A random sample of the lines with player names:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">> linesplyrs %>% select(song_name,artist_name,linecln) %>%
+ slice_sample(n=6) %>% as_tibble()
# A tibble: 6 × 3
song_name artist_name linecln
<chr> <chr> <chr>
1 Put ’Em In The Grave (Funeral Remix) Jedi Mind Tricks I take my Glock and I point god point guard like Brevin Knight
2 The Black Diamonds Ghostface Killah Like Mike Harris
3 Can’t Go Out Sad Migos Quavo Paul Pierce em whip a ball Wilson
4 Drip (Remix) Migos No Vince Carter fifteen with the carbon
5 Look at My Dab Migos Michael Jordan Im perfecting my craft
6 Represent the Real Das EFX This for my block handlin rock like Kenny Anderson</code></pre></figure>
<p>Nice!</p>
<p>This simplistic approach was for a limited number of artists, and by matching full names as they appear in the player dictionary (without considering nicknames, partial matches, or abbreviations) I’m missing out on many more mentions. Still, this code should document a few things I needed to learn such as ranking groups, slicing every other row, text sizing, and colorful annotations.</p>
<p>As usual, feel free to contact me with any questions or comments.</p>Luis D. Verde ArregoitiaTidy text manipulation and plotting.Sharing nice code with addins and IDE tools2021-12-12T00:00:00+00:002021-12-12T00:00:00+00:00https://luisdva.github.io/rstats/cleaner-code<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">Does anyone have an example of a document outlining "best practices" for how to prepare useful archived code (well annotated, etc) to go with data repositories?</p>— Daniel Bolnick (@DanielBolnick) <a href="https://twitter.com/DanielBolnick/status/1462455323340554240?ref_src=twsrc%5Etfw">November 21, 2021</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>This tweet by Daniel Bolnick received many useful replies with resources and guidelines for sharing code that’s useful and readable. Shortly after that, the editorial team at <em>The American Naturalist</em> published this <a href="https://comments.amnat.org/2021/12/guidelines-for-archiving-code-with-data.html" target="_blank">guide</a> guide for archiving data and code in ecology, evolution and behavior.</p>
<p>Coincidentally, I taught some relevant tools for this at a workshop last September and never wrote about it for this site. With a few packages and IDE tools, we can clean up our code efficiently, which goes a long way towards meeting the latest guidelines for clean, reproducible code.</p>
<p>Here’s a brief tour through my favorite tools:</p>
<h1 id="packup"><a href="https://github.com/MilesMcBain/packup" target="_blank">packup</a></h1>
<p>In interactive sessions and for less-structured workflows, I often add <code class="language-plaintext highlighter-rouge">library()</code> calls to my scripts after realizing I need a function without scrolling up to the top to put all the package load calls together. <code class="language-plaintext highlighter-rouge">packup</code> by <a href="https://t.co/A8jjzH92RB" target="_blank">Miles McBain</a> provides an Rstudio addin to move all these calls up to the top of the script, remove any duplicates, and sort them alphabetically. <code class="language-plaintext highlighter-rouge">Packup libary() calls</code> works for both .R and .Rmd files and I see no downside to calling it before sharing a script, just make sure that the reordering isn’t causing namespace conflicts.</p>
<p>With <code class="language-plaintext highlighter-rouge">packup</code> we can easily go from this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># random script</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readr</span><span class="p">)</span><span class="w">
</span><span class="n">dat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read_csv</span><span class="p">(</span><span class="s2">"mydata.csv"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="n">dat</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">janitor</span><span class="p">)</span><span class="w">
</span><span class="c1"># fix letter case</span><span class="w">
</span><span class="n">dat</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">clean_names</span><span class="p">()</span></code></pre></figure>
<p>to this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># random script</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">janitor</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readr</span><span class="p">)</span><span class="w">
</span><span class="n">dat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read_csv</span><span class="p">(</span><span class="s2">"mydata.csv"</span><span class="p">)</span><span class="w">
</span><span class="n">dat</span><span class="w">
</span><span class="c1"># fix letter case</span><span class="w">
</span><span class="n">dat</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">clean_names</span><span class="p">()</span></code></pre></figure>
<blockquote>
<p>Bonus tip: In RStudio use Alt + arrow keys to move whole lines of code up or down (I did to nudge the commented line to the very top of the text).</p>
</blockquote>
<h1 id="annotater"><a href="https://annotater.liomys.mx" target="_blank">annotater</a></h1>
<p>The functions and RStudio addins in <code class="language-plaintext highlighter-rouge">annotater</code> are only a small step towards reproducibility (for the real deal, I suggest <code class="language-plaintext highlighter-rouge">renv</code>). After some time using them, I think they’ve proven their worth. Whenever I share code (including for the posts on this site, I use the <code class="language-plaintext highlighter-rouge">Annotate package repository sources in active file</code> addin to automatically annotate my library load calls with the source of the packages being loaded (e.g. CRAN, GitHub, BioConductor, etc.) and the version number.</p>
<p><code class="language-plaintext highlighter-rouge">annotater</code> also has a cool function that can make a note of which functions are called from each of the packages being loaded in a file with ‘library’ calls. This can be useful to avoid dependency issues or making others install packages they don’t need (for running the code in that particular file).</p>
<p>Use the <code class="language-plaintext highlighter-rouge">Annotate package repository sources in active file</code> function to turn this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># random script</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">janitor</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readr</span><span class="p">)</span></code></pre></figure>
<p>into this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># random script</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.7</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">janitor</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v2.1.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v2.0.2</span></code></pre></figure>
<h1 id="styler"><a href="https://styler.r-lib.org/" target="_blank">styler</a></h1>
<p><code class="language-plaintext highlighter-rouge">styler</code> can help us format our code consistently using a style guide that generally leads to cleaner code that is easier to read (e.g., correct indentation, spaces after commas and around infix operators but not after opening parenthesis for function calls, etc.). We can style a selection or an entire file.</p>
<p>This mangled code has some spacing and style issues:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ggplot</span><span class="p">(</span><span class="w"> </span><span class="n">cast</span><span class="p">)</span><span class="o">+</span><span class="w"> </span><span class="n">geom_point</span><span class="p">(</span><span class="n">aes</span><span class="w">
</span><span class="p">(</span><span class="n">cost_gold</span><span class="p">,</span><span class="n">build_time</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="o">=</span><span class="w">
</span><span class="n">hit_points</span><span class="p">),</span><span class="n">pch</span><span class="o">=</span><span class="m">21</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="s2">"black"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="o">=</span><span class="m">4</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">scale_fill_scico</span><span class="p">(</span><span class="w">
</span><span class="n">palette</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"davos"</span><span class="p">,</span><span class="n">direction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-1</span><span class="p">)</span><span class="o">+</span><span class="n">theme_clean</span><span class="p">(</span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Lato"</span><span class="w"> </span><span class="p">)</span></code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">styler</code> functions can help with that, returning the code below. Note that I normally don’t use the built-in tidyverse style guide (derived from the Google Style guide for R code) for ggplot code (too many line breaks after opening parens).</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ggplot</span><span class="p">(</span><span class="n">cast</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">aes</span><span class="w">
</span><span class="p">(</span><span class="n">cost_gold</span><span class="p">,</span><span class="w"> </span><span class="n">build_time</span><span class="p">,</span><span class="w">
</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w">
</span><span class="n">hit_points</span><span class="w">
</span><span class="p">),</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">21</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_fill_scico</span><span class="p">(</span><span class="w">
</span><span class="n">palette</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"davos"</span><span class="p">,</span><span class="w"> </span><span class="n">direction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-1</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_clean</span><span class="p">(</span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Lato"</span><span class="p">)</span></code></pre></figure>
<h1 id="code-sections"><a href="https://support.rstudio.com/hc/en-us/articles/200484568-Code-Folding-and-Sections-in-the-RStudio-IDE" target="_blank">Code sections</a></h1>
<p>RStudio (and other IDEs) let us insert foldable sections, to split up scripts into discrete pieces that can be collapsed and navigated between using the little navigation panel that can be toggled on and off in the source pane. These can be built automatically for any comment with four trailing dashes, like so:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># My section ---- </span></code></pre></figure>
<p>becomes</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># My section ---------------------------------------- </span></code></pre></figure>
<p>We can also insert these sections with Ctrl+Shift+R (Cmd+Shift+R on a Mac), and for longer scripts they bring more sanity.</p>
<h1 id="littleboxes"><a href="https://github.com/ThinkR-open/littleboxes" target="_blank">littleboxes</a></h1>
<p>Lastly, <code class="language-plaintext highlighter-rouge">littleboxes</code> by the <a href="https://thinkr.fr/" target="_blank">ThinkR</a> squad gives us an addin for creating text titles with fancy ascii art boxes around them. I like to use these at the beginning of scripts, to include a date, the purpose of the file, and the author.</p>
<p>With the example from earlier, we can add titles, comments, dates or whatever, then select these lines and call <code class="language-plaintext highlighter-rouge">Little Boxes</code> to add a fancy frame.</p>
<p>This:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Random script - Demonstrating IDE tools</span><span class="w">
</span><span class="c1">#Dec 2021. by Luis</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.7</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">janitor</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v2.1.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v2.0.2</span></code></pre></figure>
<p>Becomes this:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">##%######################################################%##</span><span class="w">
</span><span class="c1"># #</span><span class="w">
</span><span class="c1">#### Random script - Demonstrating IDE ####</span><span class="w">
</span><span class="c1">#### tools Dec 2021. by Luis ####</span><span class="w">
</span><span class="c1"># #</span><span class="w">
</span><span class="c1">##%######################################################%##</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.7</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">janitor</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v2.1.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v2.0.2</span></code></pre></figure>
<p>This animation shows how we can use these tools sequentially to clean up our code for sharing.</p>
<figure>
<a href="/assets/images/bscript.gif"><img src="/assets/images/bscript.gif" /></a>
<figcaption>towards cleaner code</figcaption>
</figure>
<p><br /></p>
<p>Try these tools out!</p>Luis D. Verde ArregoitiaFive tools to make R code cleaner and more reproducible.Plotting model-based clusters of population genetic structure2021-09-17T00:00:00+00:002021-09-17T00:00:00+00:00https://luisdva.github.io/rstats/dapc-plot<p>This <a href="https://luisdva.github.io/rstats/model-cluster-plots/" target="_blank">post</a> from 2019 describes an approach for making Structure-style plots for model-based clusters of population genetic structure using <code class="language-plaintext highlighter-rouge">ggplot2</code>. The code still runs fine, but <strong>a)</strong> the post was unrealistic and used made-up data that looks odd given the lack of structure and <strong>b)</strong> we can improve on the plots using new ggplot extensions. (I also wrote the post before learning to use the <code class="language-plaintext highlighter-rouge">tidyr::pivot_</code> functions)</p>
<p>Here I’ll recreate a Discriminant Analysis on Principal Components (DAPC) from this (Open Access) <a href="https://link.springer.com/article/10.1007/s10340-018-1043-4" target="_blank">publication</a> by Amélie Desvars-Larrive et al. from 2019. The authors used microsatellites to examine the genetic structure of brown rat populations in Eastern France, and ran DAPC in R using the <code class="language-plaintext highlighter-rouge">adegenet</code> pacakge. This is a really cool paper with a very large sample that also examined resistance to rodenticides.</p>
<p>Figure 1 in the publication shows the study sites, the genetic structure in discriminant space, and the cluster assignment in panel C. We’ll focus on panel C.</p>
<figure>
<a href="/assets/images/amelie.png"><img src="/assets/images/amelie.png" /></a>
<figcaption>Figure 1 from Desvars-Larrive et al. (2019)</figcaption>
</figure>
<p><br /></p>
<p>Let’s repeat the analysis but then use <code class="language-plaintext highlighter-rouge">ggplot</code> to show the individual membership assignment of the sampled animals to the genetic clusters identified by DAPC. The underlying data was shared in an xlsx file <a href="https://static-content.springer.com/esm/art%3A10.1007%2Fs10340-018-1043-4/MediaObjects/10340_2018_1043_MOESM1_ESM.xlsx" target="_blank">here</a>, which we can work with once we have it in our working directory.</p>
<h2 id="setup">Setup</h2>
<p>To get going, we need to load a few package and import the allele data (sources and versions added with <code class="language-plaintext highlighter-rouge">annotater</code>).</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Load libraries. Install first if needeed</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readxl</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.3.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">janitor</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v2.1.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.7</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.1.3</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">adegenet</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v2.1.4</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v3.3.5</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">forcats</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.5.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.4.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggh4x</span><span class="p">)</span><span class="w"> </span><span class="c1"># [github::teunbrand/ggh4x] v0.2.0.9000</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">paletteer</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.4.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">extrafont</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.17</span><span class="w">
</span><span class="c1"># Read microsatellite data from spreadsheet</span><span class="w">
</span><span class="n">rats</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read_excel</span><span class="p">(</span><span class="s2">"10340_2018_1043_MOESM1_ESM.xlsx"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">clean_names</span><span class="p">()</span></code></pre></figure>
<h2 id="preparing-the-data">Preparing the data</h2>
<p>At this point, there’s a few easy steps to prepare the allele data for the <code class="language-plaintext highlighter-rouge">df2genind()</code> function that converts this input to a <code class="language-plaintext highlighter-rouge">genind</code> object. I used <code class="language-plaintext highlighter-rouge">sprintf</code> to pad the repeats so that the <code class="language-plaintext highlighter-rouge">ncode</code> argument works (<code class="language-plaintext highlighter-rouge">ncode</code> is an optional integer giving the number of characters used for coding one genotype at one locus).</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># subset the individual IDs and the loci data and coerce to data.frame</span><span class="w">
</span><span class="n">rat_microsdf</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rats</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="n">rat_id</span><span class="p">,</span><span class="w"> </span><span class="n">d19r62</span><span class="o">:</span><span class="n">d3r159</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">as.data.frame</span><span class="p">()</span><span class="w">
</span><span class="n">row.names</span><span class="p">(</span><span class="n">rat_microsdf</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rat_microsdf</span><span class="o">$</span><span class="n">rat_id</span><span class="w">
</span><span class="n">rat_microsdf</span><span class="o">$</span><span class="n">rat_id</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">NULL</span><span class="w">
</span><span class="c1"># pad the repeats</span><span class="w">
</span><span class="n">rat_microsdf</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rat_microsdf</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">across</span><span class="p">(</span><span class="n">everything</span><span class="p">(),</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">sprintf</span><span class="p">(</span><span class="s2">"%06d"</span><span class="p">,</span><span class="w"> </span><span class="n">.x</span><span class="p">)))</span><span class="w">
</span><span class="c1"># create genind object</span><span class="w">
</span><span class="n">ratgen</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">df2genind</span><span class="p">(</span><span class="n">rat_microsdf</span><span class="p">,</span><span class="w"> </span><span class="n">ncode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">6</span><span class="p">)</span><span class="w">
</span><span class="n">ratgen</span></code></pre></figure>
<p>The converted object now prints this:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">> ratgen
/// GENIND OBJECT /////////
// 355 individuals; 13 loci; 425 alleles; size: 678.2 Kb
// Basic content
@tab: 355 x 425 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 14-73)
@loc.fac: locus factor for the 425 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: df2genind(X = rat_microsdf, ncode = 6)
// Optional content
- empty -</code></pre></figure>
<h2 id="run-dapc">Run DAPC</h2>
<p>With the data in <code class="language-plaintext highlighter-rouge">genind</code> format, we can run <code class="language-plaintext highlighter-rouge">find.clusters</code> and then <code class="language-plaintext highlighter-rouge">dapc</code> with the newly identified clusters. For the Structure-style plot, we need the membership probabilities of each individual for each cluster. We pull these from the dapc result, pivot these to long format, and add labels. Then we’ll be ready for plotting.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># find clusters</span><span class="w">
</span><span class="n">grp</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">find.clusters</span><span class="p">(</span><span class="n">ratgen</span><span class="p">,</span><span class="w"> </span><span class="n">max.n.clust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">)</span><span class="w"> </span><span class="c1"># 180 PCS, 7 clusters</span><span class="w">
</span><span class="c1"># Discriminant analysis using the groups identified by find.clusters</span><span class="w">
</span><span class="n">rats_dapc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dapc</span><span class="p">(</span><span class="n">ratgen</span><span class="p">,</span><span class="w"> </span><span class="n">grp</span><span class="o">$</span><span class="n">grp</span><span class="p">)</span><span class="w"> </span><span class="c1"># 160 PCs, 6 DFs</span><span class="w">
</span><span class="c1"># create an object with membership probabilities</span><span class="w">
</span><span class="n">postprobs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="nf">round</span><span class="p">(</span><span class="n">rats_dapc</span><span class="o">$</span><span class="n">posterior</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">))</span><span class="w">
</span><span class="c1"># put probabilities in a tibble with IDS and labels for sites</span><span class="w">
</span><span class="n">ratclusters</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tibble</span><span class="o">::</span><span class="n">rownames_to_column</span><span class="p">(</span><span class="n">postprobs</span><span class="p">,</span><span class="w"> </span><span class="n">var</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ind"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">trapsite</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rats</span><span class="o">$</span><span class="n">site</span><span class="p">)</span><span class="w">
</span><span class="c1"># melt into long format</span><span class="w">
</span><span class="n">rats_long</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ratclusters</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">pivot_longer</span><span class="p">(</span><span class="m">2</span><span class="o">:</span><span class="m">8</span><span class="p">,</span><span class="w"> </span><span class="n">names_to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"cluster"</span><span class="p">,</span><span class="w"> </span><span class="n">values_to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"prob"</span><span class="p">)</span><span class="w">
</span><span class="c1"># manual relevel of the sampling sites (to avoid alphabetical ordering)</span><span class="w">
</span><span class="n">rats_long</span><span class="o">$</span><span class="n">trapfact</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fct_relevel</span><span class="p">(</span><span class="n">as.factor</span><span class="p">(</span><span class="n">rats_long</span><span class="o">$</span><span class="n">trapsite</span><span class="p">),</span><span class="w"> </span><span class="s2">"GIV1"</span><span class="p">,</span><span class="w"> </span><span class="s2">"GIV2"</span><span class="p">,</span><span class="w"> </span><span class="s2">"GIV3"</span><span class="p">,</span><span class="w"> </span><span class="s2">"GIV4"</span><span class="p">,</span><span class="w"> </span><span class="s2">"GIV5"</span><span class="p">,</span><span class="w"> </span><span class="s2">"GIV6"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ROM1"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ROM2"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ROM3"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ROM4"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ROM5"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ROM6"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LYO1"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LYO2"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LYO3"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LYO4"</span><span class="p">,</span><span class="w"> </span><span class="s2">"LYO5"</span><span class="p">)</span><span class="w">
</span><span class="c1"># column for the municipality abbreviation</span><span class="w">
</span><span class="n">rats_long</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rats_long</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">str_remove</span><span class="p">(</span><span class="n">trapsite</span><span class="p">,</span><span class="w"> </span><span class="s2">"[0-9]"</span><span class="p">))</span></code></pre></figure>
<p>For clarity, this is what the probabilites look like raw:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">> head(postprobs)
1 2 3 4 5 6 7
GIV0158 0 0 0 0 0 0 1
GIV0159 0 0 0 0 0 0 1
GIV0160 0 0 0 0 0 0 1
GIV0161 0 0 0 0 0 0 1
GIV0162 0 0 0 0 0 0 1
GIV0163 0 0 0 0 0 0 1</code></pre></figure>
<p>Then the long-format probabilities with labels for plotting and faceting look like this:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">> head(rats_long)
# A tibble: 6 × 6
ind trapsite cluster prob trapfact loc
<chr> <chr> <chr> <dbl> <fct> <chr>
1 GIV0158 GIV2 1 0 GIV2 GIV
2 GIV0158 GIV2 2 0 GIV2 GIV
3 GIV0158 GIV2 3 0 GIV2 GIV
4 GIV0158 GIV2 4 0 GIV2 GIV
5 GIV0158 GIV2 5 0 GIV2 GIV
6 GIV0158 GIV2 6 0 GIV2 GIV </code></pre></figure>
<h2 id="plotting">Plotting</h2>
<p><code class="language-plaintext highlighter-rouge">facet_nested</code> from the <code class="language-plaintext highlighter-rouge">ggh4x</code> package lets us implement nested facets to show sampling sites in their respective municipalities. Before calling <code class="language-plaintext highlighter-rouge">ggplot</code>, we need to set up some customization parameters for the nested facets via <code class="language-plaintext highlighter-rouge">strip_nested</code>. This lets us toggle the font size for each strip layer, and lets us turn off clipping for the strip text for those locations with few samples (and consequently, narrow facets). I was unaware of <a href="https://teunbrand.github.io/ggh4x/index.html" target="_blank"><code class="language-plaintext highlighter-rouge">ggh4x</code></a>, but the package comes with lots of cool utility functions for ggplot and has a great hex logo.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="w"> </span><span class="c1"># set up custom facet strips</span><span class="w">
</span><span class="n">facetstrips</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">strip_nested</span><span class="p">(</span><span class="w">
</span><span class="n">text_x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">elem_list_text</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">)),</span><span class="w">
</span><span class="n">by_layer_x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">clip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"off"</span><span class="w">
</span><span class="p">)</span></code></pre></figure>
<p>Onto the <code class="language-plaintext highlighter-rouge">ggplot</code> call. The suitable <em>geom</em> here is <code class="language-plaintext highlighter-rouge">geom_col</code> because we want the bars to add up to 1. This way we are in control of the spacing of different locations by using facets, the <code class="language-plaintext highlighter-rouge">expand</code> argument for the scales, and the <code class="language-plaintext highlighter-rouge">panel.spacing</code> argument for the overall plot theme. Note how the <code class="language-plaintext highlighter-rouge">scales</code> and <code class="language-plaintext highlighter-rouge">space</code> arguments to <code class="language-plaintext highlighter-rouge">facet_nested</code> help us accommodate the different number of individuals per location. <code class="language-plaintext highlighter-rouge">switch</code> places the facet labels below the plot.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">ggplot</span><span class="p">(</span><span class="n">rats_long</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">factor</span><span class="p">(</span><span class="n">ind</span><span class="p">),</span><span class="w"> </span><span class="n">prob</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">cluster</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_col</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"gray"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.01</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">facet_nested</span><span class="p">(</span><span class="o">~</span><span class="w"> </span><span class="n">loc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">trapfact</span><span class="p">,</span><span class="w">
</span><span class="n">switch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"x"</span><span class="p">,</span><span class="w">
</span><span class="n">nest_line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">lineend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"round"</span><span class="p">),</span><span class="w">
</span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free"</span><span class="p">,</span><span class="w"> </span><span class="n">space</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free"</span><span class="p">,</span><span class="w"> </span><span class="n">strip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">facetstrips</span><span class="p">,</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_minimal</span><span class="p">(</span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Nimbus Sans"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Individuals"</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"membership probability"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_discrete</span><span class="p">(</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">expansion</span><span class="p">(</span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_fill_paletteer_d</span><span class="p">(</span><span class="s2">"ghibli::PonyoMedium"</span><span class="p">,</span><span class="w"> </span><span class="n">guide</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">panel.spacing.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unit</span><span class="p">(</span><span class="m">0.18</span><span class="p">,</span><span class="w"> </span><span class="s2">"lines"</span><span class="p">),</span><span class="w">
</span><span class="n">axis.text.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">panel.grid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">()</span><span class="w">
</span><span class="p">)</span></code></pre></figure>
<figure>
<a href="/assets/images/dapcrats.png"><img src="/assets/images/dapcrats.png" /></a>
<figcaption>Look at that structure</figcaption>
</figure>
<p><br /></p>
<p>Individuals are represented by vertical bars, colors correspond to different genetic clusters, and each individual’s color proportion indicates its membership to the corresponding cluster. Individuals are faceted by sampling location and a thicker line groups these locations. Compare this plot with the output of <code class="language-plaintext highlighter-rouge">dapc::compoplot</code>:</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="w"> </span><span class="n">compoplot</span><span class="p">(</span><span class="n">rats_dapc</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">funky</span><span class="p">,</span><span class="w"> </span><span class="n">xlab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"individuals"</span><span class="p">)</span></code></pre></figure>
<figure>
<a href="/assets/images/compoplotout.png"><img src="/assets/images/compoplotout.png" /></a>
<figcaption>Built-in plotting from adegenet</figcaption>
</figure>
<p><br /></p>
<p>The final result looks pretty good and those interested in population genetics can now see the structure and migration. This code will work with any number of clusters (K values) as long as the data are in long format. Try this with your own data and let me know if it works. Special thanks to conservation genomics specialist <a href="https://twitter.com/LillyDParker" target="_blank">Lilly D. Parker</a> for answering all my microsatellite questions :)</p>Luis D. Verde ArregoitiaPlotting Structure, DAPC, or Admixture results with ggplot2.Mapas tridimensionales en R2021-08-19T00:00:00+00:002021-08-19T00:00:00+00:00https://luisdva.github.io/rstats/izt-3d-es<blockquote>
<p>English-language version <a href="https://luisdva.github.io/rstats/izt-3d/">here</a></p>
</blockquote>
<p>Hoy en día ya es posible y además relativamente fácil renderizar (anglicismo para describir el proceso de generar imágenes digitales a partir de un modelo tridimensional) mapas con perspectiva e iluminación usando R. Gracias a los paquetes <code class="language-plaintext highlighter-rouge">rayshader</code> y<code class="language-plaintext highlighter-rouge"> rayrender</code>, desarrollados por <a href="https://www.tylermw.com/" target="_blank">Tyler Morgan-Wall</a> y <code class="language-plaintext highlighter-rouge">elevatr</code> de <a href="https://jwhollister.com/" target="_blank">Jeff Hollister</a>, podemos combinar una imagen georeferenciada con un modelo de elevación digital (DEM, por sus siglas en inglés) para darle profundidad y ambiente a cualquier mapa que tengamos en formato digital.
<br /></p>
<p>El tuitero <a href="https://twitter.com/researchremora" target="_blank">@flotsam</a> siempre comparte ejemplos muy buenos, como éste que subió hace poco:</p>
<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">A 1972 NRCan (formerly the Department of Energy, Mines, and Resources) map of Montréal, Canada. Spent more time trying to decipher cryptic filenames to find what I want than actually rayshading it. :P<a href="https://twitter.com/hashtag/rayshader?src=hash&ref_src=twsrc%5Etfw">#rayshader</a> adventures, an <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> tale <a href="https://t.co/O756ddhs9G">pic.twitter.com/O756ddhs9G</a></p>— flotsam (@researchremora) <a href="https://twitter.com/researchremora/status/1422177343645302785?ref_src=twsrc%5Etfw">August 2, 2021</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p><br /></p>
<p>El proceso para <em>renderizar</em> imagenes georeferencias está resumido en esta <a href="https://gist.github.com/tylermorganwall/cec09392cb7d3e102496e30afe5e0898" target="_blank">guía</a>.</p>
<p>Hay muchos mapas ya georeferenciados en Internet, y además podemos usar programas de SIG para empatar cualquier mapa digital con su contexto espacial mediante puntos de control. Yo generalmente prefiero hacer mis propios mapa en <code class="language-plaintext highlighter-rouge">ggplot2</code> y quería procesarlos en 3D con <code class="language-plaintext highlighter-rouge">rayshader</code>.</p>
<p>Cuando hacemos mapas en <code class="language-plaintext highlighter-rouge">ggplot2</code> con los paquetes geoespaciales más populares (<code class="language-plaintext highlighter-rouge">sf</code>, <code class="language-plaintext highlighter-rouge">stars</code>, etc.), ya tenemos datos espaciales y no haría falta asignarle a mano coordenadas del mundo real a cada pixel del raster de nuestro mapa. Además, cada que trato de georeferenciar imágenes en qGIS pierdo la paciencia por tener que crear al menos seis puntos de control con el cursor y la mano toda temblorina.</p>
<figure>
<a href="/assets/images/memazorast.jpg"><img src="/assets/images/memazorast.jpg" width="660" /></a>
<figcaption></figcaption>
</figure>
<p><br /></p>
<p>En esta <a href="https://stackoverflow.com/questions/53771331/writing-a-path-route-plot-as-a-geotiff-in-r" target="_blank">respuesta de StackOverflow</a>, nos explican como extraer la información que describe los valores xy máximos y mínimos del panel de un objeto ggplot, para poder asignarle esta extensión a la versión raster que ya exportamos de nuestro objeto.</p>
<p>Para poder seguir este método, lo importante es asegurarse que el objeto de ggplot no tenga nada de bordes, para que los valores xy que definen su extensión empaten con los bordes de la imagen que exportamos con el fin de usarla en <code class="language-plaintext highlighter-rouge">rayshader</code>. Para lograrlo, usamos el (<code class="language-plaintext highlighter-rouge">theme_nothing</code>) que viene con el paquete <code class="language-plaintext highlighter-rouge">cowplot</code> de <a href="https://clauswilke.com/" target="_blank">Claus Wilke</a>. En resumen, ésto equivale a apagar casi todos los elementos de la figura y en no tener ninguna expansión de los ejes. <a href="https://stackoverflow.com/questions/31254533/when-using-ggplot-in-r-how-do-i-remove-margins-surrounding-the-plot-area/31255629" target="_blank">Aquí</a> podemos ver cuáles elementos se están modificando.</p>
<p>Probemos. Este ejemplo está basado en datos espaciales del portal de datos geográficos de la <a href="http://www.conabio.gob.mx/informacion/gis/" target="_blank">CONABIO</a> y de la <a href="http://sig.conanp.gob.mx/website/pagsig/info_shape.htm" target="_blank">CONANP</a>. Descargué los shapefiles de Áreas Naturales Protegidas y de Uso del suelo y vegetación (escala 1:250000, serie VI continuo nacional), y me enfoqué en la ANP Iztaccíhuatl-Popocatépetl.</p>
<p>Estos pasos cargan los archivos shp, recortan los datos a la ANP de interés, y agrupan (medio arbitrariamente) algunas de las categorías de uso de suelo y vegetación. Luego, podemos hacer un mapa simple pero colorido de los volcanes. Aquí el <code class="language-plaintext highlighter-rouge">theme_nothing()</code> le quita los márgenes y los rótulos de los ejes a la figura, y hay algunos argumentos de <code class="language-plaintext highlighter-rouge">coord_sf()</code> que son para que todo quede dentro del panel principal de la figura.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># paquetes</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sf</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.9-8</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.7</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.4.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v3.3.3</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">raster</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v3.4-13</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stars</span><span class="p">)</span><span class="w"> </span><span class="c1"># [github::r-spatial/stars] v0.5-3</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">elevatr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.4.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rayshader</span><span class="p">)</span><span class="w"> </span><span class="c1"># [github::tylermorganwall/rayshader] v0.26.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">smoothr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.2.2</span><span class="w">
</span><span class="c1"># importar ANPs </span><span class="w">
</span><span class="n">anps</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_read</span><span class="p">(</span><span class="s2">"YOUR-PATH-HERE/182ANP_Geo_ITRF08_Agosto_2020.shp"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">dplyr</span><span class="o">::</span><span class="n">select</span><span class="p">(</span><span class="n">NOMBRE</span><span class="p">)</span><span class="w">
</span><span class="c1"># importar vegetación</span><span class="w">
</span><span class="n">veg</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_read</span><span class="p">(</span><span class="s2">"YOUR-PATH-HERE/usv250s6gw.shp"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_transform</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="n">anps</span><span class="p">))</span><span class="w">
</span><span class="c1"># area de estudio</span><span class="w">
</span><span class="n">izta</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">anps</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">str_detect</span><span class="p">(</span><span class="n">NOMBRE</span><span class="p">,</span><span class="w"> </span><span class="s2">"Izt"</span><span class="p">))</span><span class="w">
</span><span class="c1"># limites</span><span class="w">
</span><span class="n">Iztbbox</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_buffer</span><span class="p">(</span><span class="n">izta</span><span class="p">,</span><span class="w"> </span><span class="n">dist</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_bbox</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_as_sfc</span><span class="p">()</span><span class="w">
</span><span class="c1"># disolver vegetación</span><span class="w">
</span><span class="n">veggr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">veg</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">summarize</span><span class="p">(</span><span class="n">.groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"keep"</span><span class="p">)</span><span class="w">
</span><span class="c1"># recortar a los limites</span><span class="w">
</span><span class="n">Iztveg</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_crop</span><span class="p">(</span><span class="n">veggr</span><span class="p">,</span><span class="w"> </span><span class="n">Iztbbox</span><span class="p">)</span><span class="w">
</span><span class="c1"># reclasificar vegetación</span><span class="w">
</span><span class="n">Iztvegrc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Iztveg</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">vegclass</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">case_when</span><span class="p">(</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^B"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Bosque"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^VS"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Vegetación secundaria"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^T"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Agricultura (temporal)"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^R"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Agricultura (Riego)"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"DV"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Sin vegetación"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^P"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Pastizal"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"VW"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Pradera de alta montaña"</span><span class="p">,</span><span class="w">
</span><span class="kc">TRUE</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">CVE_UNION</span><span class="w">
</span><span class="p">))</span><span class="w">
</span><span class="c1"># smooth</span><span class="w">
</span><span class="n">Iztvegrc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">smooth</span><span class="p">(</span><span class="n">Iztvegrc</span><span class="p">)</span><span class="w">
</span><span class="c1"># exportando a tiff</span><span class="w">
</span><span class="n">ragg</span><span class="o">::</span><span class="n">agg_tiff</span><span class="p">(</span><span class="w">
</span><span class="n">filename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"izt.tiff"</span><span class="p">,</span><span class="w">
</span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">400</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3.88</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5.92</span><span class="p">,</span><span class="w"> </span><span class="n">units</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"in"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># figura</span><span class="w">
</span><span class="n">iztgg</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Iztvegrc</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">vegclass</span><span class="p">),</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_fill_manual</span><span class="p">(</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="s2">"#b5c99a"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#97a97c"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#8d99ae"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#26432F"</span><span class="p">,</span><span class="w">
</span><span class="s2">"#e8e8e4"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#cdeac0"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#B1D5BB"</span><span class="p">,</span><span class="w">
</span><span class="s2">"#cde5d7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#adb6c4"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#9cc5a1"</span><span class="p">,</span><span class="w">
</span><span class="s2">"#ffc49b"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#eaeaea"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#f3e7e4"</span><span class="w">
</span><span class="p">),</span><span class="w"> </span><span class="n">guide</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">ggfx</span><span class="o">::</span><span class="n">with_outer_glow</span><span class="p">(</span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">izta</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"transparent"</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.1</span><span class="p">),</span><span class="w">
</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">coord_sf</span><span class="p">(</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">clip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"on"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">cowplot</span><span class="o">::</span><span class="n">theme_nothing</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">panel.grid.major</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#14213d"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.1</span><span class="p">,</span><span class="w"> </span><span class="n">linetype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dashed"</span><span class="p">),</span><span class="w">
</span><span class="n">panel.ontop</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">iztgg</span><span class="w">
</span><span class="n">dev.off</span><span class="p">()</span></code></pre></figure>
<p>Me tomó algunos intentos poder exportar el tiff sin que saliera espacio en blanco en los márgenes, pero cuando ésto ya queda podemos importar la imagen usando el paquete <code class="language-plaintext highlighter-rouge">raster</code>.</p>
<figure>
<a href="/assets/images/iztgg.png"><img src="/assets/images/iztgg.png" /></a>
<figcaption>Se ve bien</figcaption>
</figure>
<p><br /></p>
<p>Revisemos ese tiff.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># creando un StackedRaster </span><span class="w">
</span><span class="n">stackedRaster</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">stack</span><span class="p">(</span><span class="s2">"izt.tiff"</span><span class="p">)</span></code></pre></figure>
<figure>
<a href="/assets/images/tiffrgb.png"><img src="/assets/images/tiffrgb.png" width="660" /></a>
<figcaption>Raster RGB</figcaption>
</figure>
<p><br /></p>
<p>La información geoespacial que necesitamos está contenida en el objeto ggplot, y la podemos extraer con <code class="language-plaintext highlighter-rouge">ggplot_build</code>. Uno de los componentes del objeto es una lista con los máximos y mínimos para x y para y, y éstos se los damos a la función <code class="language-plaintext highlighter-rouge">extent</code> para georeferenciar el mapa (también hay que definir la proyección). El raster georeferenciado se exporta con <code class="language-plaintext highlighter-rouge">INT1U</code> para que los valores queden entre 0 y 255, y <code class="language-plaintext highlighter-rouge">PHOTOMETRIC=RGB</code> para declarar cómo se interpretan los colores.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># extraer componentes Geoepaciales</span><span class="w">
</span><span class="n">lat_long</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot_build</span><span class="p">(</span><span class="n">iztgg</span><span class="p">)</span><span class="o">$</span><span class="n">layout</span><span class="o">$</span><span class="n">panel_params</span><span class="p">[[</span><span class="m">1</span><span class="p">]][</span><span class="nf">c</span><span class="p">(</span><span class="s2">"x_range"</span><span class="p">,</span><span class="w"> </span><span class="s2">"y_range"</span><span class="p">)]</span><span class="w">
</span><span class="c1"># asignarle los valores al StackedRaster</span><span class="w">
</span><span class="n">raster</span><span class="o">::</span><span class="n">extent</span><span class="p">(</span><span class="n">stackedRaster</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">lat_long</span><span class="o">$</span><span class="n">x_range</span><span class="p">,</span><span class="w"> </span><span class="n">lat_long</span><span class="o">$</span><span class="n">y_range</span><span class="p">)</span><span class="w">
</span><span class="n">raster</span><span class="o">::</span><span class="n">projection</span><span class="p">(</span><span class="n">stackedRaster</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">crs</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="n">Iztvegrc</span><span class="p">)</span><span class="o">$</span><span class="n">proj4string</span><span class="p">)</span><span class="w">
</span><span class="c1"># exportar</span><span class="w">
</span><span class="n">writeRaster</span><span class="p">(</span><span class="n">stackedRaster</span><span class="p">,</span><span class="w"> </span><span class="s2">"iztGeoTiff.tif"</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"PHOTOMETRIC=RGB"</span><span class="p">,</span><span class="w"> </span><span class="n">datatype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"INT1U"</span><span class="p">,</span><span class="w"> </span><span class="n">overwrite</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span></code></pre></figure>
<p>Importamos de nuevo el geoTiff</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">grRast</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">brick</span><span class="p">(</span><span class="s2">"iztGeoTiff.tif"</span><span class="p">)</span><span class="w">
</span><span class="n">grRastst</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">stack</span><span class="p">(</span><span class="n">grRast</span><span class="p">)</span></code></pre></figure>
<p>Ahora ya podemos seguir las guías existentes para procesar imágenes con <code class="language-plaintext highlighter-rouge">rayshader</code>. Primero hay que transformar el raster en un array (arreglo de valores numéricos), y usar este objeto para descargar datos de elevación mediante <code class="language-plaintext highlighter-rouge">elevatr</code>. Luego los recortamos, les reducimos la resolución y el tamaño, y los transponemos (los rasters y los arreglos numéricos vienen con orientaciones distintas en R).</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># convertir en array</span><span class="w">
</span><span class="n">test_c_arr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.array</span><span class="p">(</span><span class="n">grRastst</span><span class="p">)</span><span class="w">
</span><span class="c1"># descargar elevación</span><span class="w">
</span><span class="n">Rch_dem</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">elevatr</span><span class="o">::</span><span class="n">get_elev_raster</span><span class="p">(</span><span class="n">grRastst</span><span class="p">,</span><span class="w"> </span><span class="n">z</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">9</span><span class="p">)</span><span class="w">
</span><span class="n">Rch_dem</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">crop</span><span class="p">(</span><span class="n">Rch_dem</span><span class="p">,</span><span class="w"> </span><span class="n">grRastst</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">Rch_dem</span><span class="p">)</span><span class="w">
</span><span class="c1"># reducir tamaño</span><span class="w">
</span><span class="n">areademMatrix</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rayshader</span><span class="o">::</span><span class="n">resize_matrix</span><span class="p">(</span><span class="n">rayshader</span><span class="o">::</span><span class="n">raster_to_matrix</span><span class="p">(</span><span class="n">Rch_dem</span><span class="p">),</span><span class="w"> </span><span class="n">scale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.7</span><span class="p">)</span><span class="w">
</span><span class="c1"># transponer</span><span class="w">
</span><span class="n">demFinal</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="p">(</span><span class="n">areademMatrix</span><span class="p">)</span></code></pre></figure>
<p>Así se ve el DEM.</p>
<figure>
<a href="/assets/images/iztdem.png"><img src="/assets/images/iztdem.png" width="660" /></a>
<figcaption>DEM</figcaption>
</figure>
<p><br /></p>
<p>Ahora calculamos el sombreado y combinamos el arreglo con los datos de elevación.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Sombreado</span><span class="w">
</span><span class="n">ambient_layer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ambient_shade</span><span class="p">(</span><span class="n">demFinal</span><span class="p">,</span><span class="w"> </span><span class="n">zscale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">multicore</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">maxsearch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">200</span><span class="p">)</span><span class="w">
</span><span class="n">ray_layer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ray_shade</span><span class="p">(</span><span class="n">demFinal</span><span class="p">,</span><span class="w"> </span><span class="n">zscale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">multicore</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1"># 3D</span><span class="w">
</span><span class="p">(</span><span class="n">test_c_arr</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">255</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">add_shadow</span><span class="p">(</span><span class="n">ray_layer</span><span class="p">,</span><span class="w"> </span><span class="m">0.3</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">add_shadow</span><span class="p">(</span><span class="n">ambient_layer</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">plot_3d</span><span class="p">(</span><span class="n">demFinal</span><span class="p">,</span><span class="w">
</span><span class="n">zscale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">150</span><span class="p">,</span><span class="w"> </span><span class="n">theta</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-50</span><span class="p">,</span><span class="w"> </span><span class="n">phi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">28</span><span class="p">,</span><span class="w"> </span><span class="n">zoom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.3</span><span class="p">,</span><span class="w">
</span><span class="n">windowsize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1200</span><span class="p">,</span><span class="w"> </span><span class="m">800</span><span class="p">),</span><span class="w"> </span><span class="n">solid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w">
</span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#3d314a"</span><span class="w">
</span><span class="p">)</span></code></pre></figure>
<p>Así se ve la ventana RGL que genera <code class="language-plaintext highlighter-rouge">plot3d</code>. Ajusté algunos de los valores de la cámara. Sale una vista que conocemos bien los chilangos.</p>
<figure>
<a href="/assets/images/iztsnap.png"><img src="/assets/images/iztsnap.png" width="660" /></a>
<figcaption>3D!</figcaption>
</figure>
<p><br /></p>
<p>También se ve bien la vista superior que resulta de <code class="language-plaintext highlighter-rouge">plot_map()</code></p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># en 2D</span><span class="w">
</span><span class="p">(</span><span class="n">test_c_arr</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">255</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">add_shadow</span><span class="p">(</span><span class="n">ray_layer</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">add_shadow</span><span class="p">(</span><span class="n">ambient_layer</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">plot_map</span><span class="p">()</span></code></pre></figure>
<figure>
<a href="/assets/images/2dmap.png"><img src="/assets/images/2dmap.png" /></a>
<figcaption>2D</figcaption>
</figure>
<p><br /></p>
<p>Finalmente, podemos añadir elementos con <code class="language-plaintext highlighter-rouge">rayrender</code> antes de exportar imágenes de alta resolución con <code class="language-plaintext highlighter-rouge">render_highquality()</code>. Me gustó esta alternativa para darle más ambiente a la escena con el argumento <code class="language-plaintext highlighter-rouge">light = FALSE</code> y una esfera de luz de color colocada en donde nos guste más.</p>
<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">Turn off the lights and add your own🤓: <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> <a href="https://twitter.com/hashtag/rayshader?src=hash&ref_src=twsrc%5Etfw">#rayshader</a> <a href="https://twitter.com/hashtag/rayrender?src=hash&ref_src=twsrc%5Etfw">#rayrender</a><br /><br />render_highquality(light = FALSE, scene_elements = sphere(y = 100, radius = 10, material = diffuse(lightintensity = 250, implicit_sample = TRUE))) <a href="https://t.co/g1QDEphZGh">pic.twitter.com/g1QDEphZGh</a></p>— Tyler Morgan-Wall (@tylermorganwall) <a href="https://twitter.com/tylermorganwall/status/1188505201193488386?ref_src=twsrc%5Etfw">October 27, 2019</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p><br /></p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># render</span><span class="w">
</span><span class="n">render_highquality</span><span class="p">(</span><span class="w">
</span><span class="n">light</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w">
</span><span class="n">scene_elements</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rayrender</span><span class="o">::</span><span class="n">sphere</span><span class="p">(</span><span class="w">
</span><span class="n">z</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">450</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">200</span><span class="p">,</span><span class="w"> </span><span class="n">radius</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">36</span><span class="p">,</span><span class="w">
</span><span class="n">material</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rayrender</span><span class="o">::</span><span class="n">light</span><span class="p">(</span><span class="w">
</span><span class="n">intensity</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">150</span><span class="p">,</span><span class="w">
</span><span class="n">spotlight_width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w">
</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#ff773d"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span></code></pre></figure>
<p>Aquí hay algunos renders que saqué con diferentes opciones de cámara e iluminación.</p>
<figure>
<a href="/assets/images/3dfinal.png"><img src="/assets/images/3dfinal.png" width="660" /></a>
<figcaption></figcaption>
</figure>
<figure>
<a href="/assets/images/3dwide.png"><img src="/assets/images/3dwide.png" width="660" /></a>
<figcaption></figcaption>
</figure>
<figure>
<a href="/assets/images/izt3dsh.png"><img src="/assets/images/izt3dsh.png" width="660" /></a>
<figcaption></figcaption>
</figure>
<figure>
<a href="/assets/images/xyplt.png"><img src="/assets/images/xyplt.png" width="660" /></a>
<figcaption></figcaption>
</figure>
<p><br /></p>
<p>Los resultados me gustaron bastante, y con todo ésto ya hasta se pueden hacer animaciones. Esta manera de georeferenciar imágenes también sirve para mapas que hacemos con R base, siempre y cuando el panel ocupe todo el espacio de la imagen. Espero que esta guía les sirva. Quedo atento a cualquier duda o comentario. Salu-2</p>Luis D. Verde ArregoitiaUtilizando el paquete rayshader para generar mapas fotorealistas a partir de objectos ggplot.Rendering thematic maps in 3D2021-08-16T00:00:00+00:002021-08-16T00:00:00+00:00https://luisdva.github.io/rstats/izt-3d<blockquote>
<p>Versión en español <a href="https://luisdva.github.io/rstats/izt-3d-es/">aquí</a></p>
</blockquote>
<p>Nowadays, it is not only possible but also straightforward to make cool and informative 3d maps in R. Thanks to <code class="language-plaintext highlighter-rouge">rayshader</code> and <code class="language-plaintext highlighter-rouge">rayrender</code> by <a href="https://www.tylermw.com/" target="_blank">Tyler Morgan-Wall</a> and <code class="language-plaintext highlighter-rouge">elevatr</code> by <a href="https://jwhollister.com/" target="_blank">Jeff Hollister</a>, we can combine a georeferenced image and a Digital Elevation Model (DEM) to really bring an existing map to life.<br />
<br /></p>
<p>Twitter user <a href="https://twitter.com/researchremora" target="_blank">@flotsam</a> has many great examples, such as this recent one:</p>
<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">A 1972 NRCan (formerly the Department of Energy, Mines, and Resources) map of Montréal, Canada. Spent more time trying to decipher cryptic filenames to find what I want than actually rayshading it. :P<a href="https://twitter.com/hashtag/rayshader?src=hash&ref_src=twsrc%5Etfw">#rayshader</a> adventures, an <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> tale <a href="https://t.co/O756ddhs9G">pic.twitter.com/O756ddhs9G</a></p>— flotsam (@researchremora) <a href="https://twitter.com/researchremora/status/1422177343645302785?ref_src=twsrc%5Etfw">August 2, 2021</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p><br /></p>
<p>The overall process for rayshading a georeferenced image is summarized step-by-step in this <a href="https://gist.github.com/tylermorganwall/cec09392cb7d3e102496e30afe5e0898" target="_blank">gist</a> by Tyler Morgan-Wall.</p>
<p>There are many georeferenced maps online, and we can use GIS software to match up any digital map with its corresponding spatial context by creating ground control points. However, I like making my own thematic maps in <code class="language-plaintext highlighter-rouge">ggplot2</code> and wanted to plot one in 3d with <code class="language-plaintext highlighter-rouge">rayshader</code>.</p>
<p>To make maps in <code class="language-plaintext highlighter-rouge">ggplot2</code> using popular spatial packages (<code class="language-plaintext highlighter-rouge">sf</code>, <code class="language-plaintext highlighter-rouge">stars</code>, etc.), we are already working with spatial data, so there should be no need to assign real-world coordinates to each pixel of the raster in the final map manually. I also kept losing my patience trying to create control points for a polynomial warp using a trackpad or the weird orange nub on my computer with my shaky hand.</p>
<figure>
<a href="/assets/images/memazorast.jpg"><img src="/assets/images/memazorast.jpg" width="660" /></a>
<figcaption></figcaption>
</figure>
<p><br /></p>
<p>This <a href="https://stackoverflow.com/questions/53771331/writing-a-path-route-plot-as-a-geotiff-in-r">StackOverflow thread</a> explains how to pull the information that describes the maximum and minimum xy values for the panel in a ggplot object and then assign them to the raster image we exported this object as.</p>
<p>If we want to avoid manual georeferencing and use this approach, the key point here (which is kind of a hack) is to make sure that the ggplot object has no margins so that the xy values that describe the panel extent match up with the overall extent of the image we export and then use for <code class="language-plaintext highlighter-rouge">rayshader</code>. For this, we can use a custom theme (<code class="language-plaintext highlighter-rouge">theme_nothing</code>) that comes with the <code class="language-plaintext highlighter-rouge">cowplot</code> package by <a href="https://clauswilke.com/">Claus Wilke</a>, and the rationale is explained <a href="https://stackoverflow.com/questions/31254533/when-using-ggplot-in-r-how-do-i-remove-margins-surrounding-the-plot-area/31255629.">here</a>. Basically, this theme is equivalent to setting various plot elements to blank with no expansion of the scales.</p>
<p>Let me demonstrate. This example uses spatial data from the <a href="http://www.conabio.gob.mx/informacion/gis/">CONABIO Geoinformation Portal</a> and the CONANP <a href="http://sig.conanp.gob.mx/website/pagsig/info_shape.htm">spatial Information download area</a>. I downloaded shapefiles for Protected Areas (Áreas Naturales Protegidas
) and Landcover types (Uso del suelo y vegetación, escala 1:250000, serie VI (continuo nacional)) and then focused on the Iztaccíhuatl-Popocatépetl protected area.</p>
<p>These steps are for reading the shp data, subsetting the features for the area of interest, lumping together some of the land cover categories (somewhat arbitrarily) with <code class="language-plaintext highlighter-rouge">case_when</code> and making a simple map of the volcanoes with pretty colors that I chose for each land cover type. Note how <code class="language-plaintext highlighter-rouge">theme_nothing()</code> removes all the margins and axis ticks/labels, and then I added extra arguments to <code class="language-plaintext highlighter-rouge">coord_sf()</code> to make sure everything stays within the plot panel.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># load packages</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sf</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.9-8</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.0.7</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v1.4.0</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v3.3.3</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">raster</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v3.4-13</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stars</span><span class="p">)</span><span class="w"> </span><span class="c1"># [github::r-spatial/stars] v0.5-3</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">elevatr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.4.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rayshader</span><span class="p">)</span><span class="w"> </span><span class="c1"># [github::tylermorganwall/rayshader] v0.26.1</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">smoothr</span><span class="p">)</span><span class="w"> </span><span class="c1"># CRAN v0.2.2</span><span class="w">
</span><span class="c1"># import ANP data</span><span class="w">
</span><span class="n">anps</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_read</span><span class="p">(</span><span class="s2">"YOUR-PATH-HERE/182ANP_Geo_ITRF08_Agosto_2020.shp"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">dplyr</span><span class="o">::</span><span class="n">select</span><span class="p">(</span><span class="n">NOMBRE</span><span class="p">)</span><span class="w">
</span><span class="c1"># import vegetation</span><span class="w">
</span><span class="n">veg</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_read</span><span class="p">(</span><span class="s2">"YOUR-PATH-HERE/usv250s6gw.shp"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_transform</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="n">anps</span><span class="p">))</span><span class="w">
</span><span class="c1"># feature of interest</span><span class="w">
</span><span class="n">izta</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">anps</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">str_detect</span><span class="p">(</span><span class="n">NOMBRE</span><span class="p">,</span><span class="w"> </span><span class="s2">"Izt"</span><span class="p">))</span><span class="w">
</span><span class="c1"># map limits</span><span class="w">
</span><span class="n">Iztbbox</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_buffer</span><span class="p">(</span><span class="n">izta</span><span class="p">,</span><span class="w"> </span><span class="n">dist</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_bbox</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_as_sfc</span><span class="p">()</span><span class="w">
</span><span class="c1"># summarize vegetation</span><span class="w">
</span><span class="n">veggr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">veg</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">summarize</span><span class="p">(</span><span class="n">.groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"keep"</span><span class="p">)</span><span class="w">
</span><span class="c1"># crop vegetation to limits</span><span class="w">
</span><span class="n">Iztveg</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_crop</span><span class="p">(</span><span class="n">veggr</span><span class="p">,</span><span class="w"> </span><span class="n">Iztbbox</span><span class="p">)</span><span class="w">
</span><span class="c1"># re-classify vegetation types</span><span class="w">
</span><span class="n">Iztvegrc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Iztveg</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">vegclass</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">case_when</span><span class="p">(</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^B"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Bosque"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^VS"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Vegetación secundaria"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^T"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Agricultura (temporal)"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^R"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Agricultura (Riego)"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"DV"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Sin vegetación"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"^P"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Pastizal"</span><span class="p">,</span><span class="w">
</span><span class="n">str_detect</span><span class="p">(</span><span class="n">CVE_UNION</span><span class="p">,</span><span class="w"> </span><span class="s2">"VW"</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s2">"Pradera de alta montaña"</span><span class="p">,</span><span class="w">
</span><span class="kc">TRUE</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">CVE_UNION</span><span class="w">
</span><span class="p">))</span><span class="w">
</span><span class="c1"># smooth vegetation features</span><span class="w">
</span><span class="n">Iztvegrc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">smooth</span><span class="p">(</span><span class="n">Iztvegrc</span><span class="p">)</span><span class="w">
</span><span class="c1"># set up graphics device</span><span class="w">
</span><span class="n">ragg</span><span class="o">::</span><span class="n">agg_tiff</span><span class="p">(</span><span class="w">
</span><span class="n">filename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"izt.tiff"</span><span class="p">,</span><span class="w">
</span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">400</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3.88</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5.92</span><span class="p">,</span><span class="w"> </span><span class="n">units</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"in"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># plot</span><span class="w">
</span><span class="n">iztgg</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Iztvegrc</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">vegclass</span><span class="p">),</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_fill_manual</span><span class="p">(</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="s2">"#b5c99a"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#97a97c"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#8d99ae"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#26432F"</span><span class="p">,</span><span class="w">
</span><span class="s2">"#e8e8e4"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#cdeac0"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#B1D5BB"</span><span class="p">,</span><span class="w">
</span><span class="s2">"#cde5d7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#adb6c4"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#9cc5a1"</span><span class="p">,</span><span class="w">
</span><span class="s2">"#ffc49b"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#eaeaea"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#f3e7e4"</span><span class="w">
</span><span class="p">),</span><span class="w"> </span><span class="n">guide</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">ggfx</span><span class="o">::</span><span class="n">with_outer_glow</span><span class="p">(</span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">izta</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"transparent"</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.1</span><span class="p">),</span><span class="w">
</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">coord_sf</span><span class="p">(</span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">clip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"on"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">cowplot</span><span class="o">::</span><span class="n">theme_nothing</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">panel.grid.major</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#14213d"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.1</span><span class="p">,</span><span class="w"> </span><span class="n">linetype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dashed"</span><span class="p">),</span><span class="w">
</span><span class="n">panel.ontop</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">iztgg</span><span class="w">
</span><span class="n">dev.off</span><span class="p">()</span></code></pre></figure>
<p>It took some trial and error to export the tiff (using a ragg device) and have no blank space around the edges, but once the output looks good we can read this new image with the <code class="language-plaintext highlighter-rouge">raster</code> package.</p>
<figure>
<a href="/assets/images/iztgg.png"><img src="/assets/images/iztgg.png" /></a>
<figcaption>Looks OK</figcaption>
</figure>
<p><br /></p>
<p>Let’s import the tiff and have a look.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Create a StackedRaster object from the saved plot</span><span class="w">
</span><span class="n">stackedRaster</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">stack</span><span class="p">(</span><span class="s2">"izt.tiff"</span><span class="p">)</span></code></pre></figure>
<figure>
<a href="/assets/images/tiffrgb.png"><img src="/assets/images/tiffrgb.png" width="660" /></a>
<figcaption>Raster bands</figcaption>
</figure>
<p><br /></p>
<p>The geospatial data we need is in the ggplot object, and we can pull it out with <code class="language-plaintext highlighter-rouge">ggplot_build</code> and some indexing. This gives us a list with ranges for x and y, which go into the <code class="language-plaintext highlighter-rouge">extent</code> function to finally georeference the map (we also need to define the projection). Next, we export the georeferenced raster. <code class="language-plaintext highlighter-rouge">INT1U</code> means the values are bound between 0 and 255, and <code class="language-plaintext highlighter-rouge">PHOTOMETRIC=RGB</code> is the color interpretation tag.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Get the GeoSpatial Components</span><span class="w">
</span><span class="n">lat_long</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot_build</span><span class="p">(</span><span class="n">iztgg</span><span class="p">)</span><span class="o">$</span><span class="n">layout</span><span class="o">$</span><span class="n">panel_params</span><span class="p">[[</span><span class="m">1</span><span class="p">]][</span><span class="nf">c</span><span class="p">(</span><span class="s2">"x_range"</span><span class="p">,</span><span class="w"> </span><span class="s2">"y_range"</span><span class="p">)]</span><span class="w">
</span><span class="c1"># Supply GeoSpatial data to the StackedRaster</span><span class="w">
</span><span class="n">raster</span><span class="o">::</span><span class="n">extent</span><span class="p">(</span><span class="n">stackedRaster</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">lat_long</span><span class="o">$</span><span class="n">x_range</span><span class="p">,</span><span class="w"> </span><span class="n">lat_long</span><span class="o">$</span><span class="n">y_range</span><span class="p">)</span><span class="w">
</span><span class="n">raster</span><span class="o">::</span><span class="n">projection</span><span class="p">(</span><span class="n">stackedRaster</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">crs</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="n">Iztvegrc</span><span class="p">)</span><span class="o">$</span><span class="n">proj4string</span><span class="p">)</span><span class="w">
</span><span class="c1"># write to disk</span><span class="w">
</span><span class="n">writeRaster</span><span class="p">(</span><span class="n">stackedRaster</span><span class="p">,</span><span class="w"> </span><span class="s2">"iztGeoTiff.tif"</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"PHOTOMETRIC=RGB"</span><span class="p">,</span><span class="w"> </span><span class="n">datatype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"INT1U"</span><span class="p">,</span><span class="w"> </span><span class="n">overwrite</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span></code></pre></figure>
<p>Let’s re-import the geoTiff</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># reread</span><span class="w">
</span><span class="n">grRast</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">brick</span><span class="p">(</span><span class="s2">"iztGeoTiff.tif"</span><span class="p">)</span><span class="w">
</span><span class="n">grRastst</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">stack</span><span class="p">(</span><span class="n">grRast</span><span class="p">)</span></code></pre></figure>
<p>After this, the process follows most guides on rayshading images. First, we transform the raster to an array and use it to get elevation data using <code class="language-plaintext highlighter-rouge">elevatr</code>. We will then crop, resize, and transpose the DEM data (because rasters and arrays are oriented differently in R).</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># to array for draping</span><span class="w">
</span><span class="n">test_c_arr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.array</span><span class="p">(</span><span class="n">grRastst</span><span class="p">)</span><span class="w">
</span><span class="c1"># get elevation data</span><span class="w">
</span><span class="n">Rch_dem</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">elevatr</span><span class="o">::</span><span class="n">get_elev_raster</span><span class="p">(</span><span class="n">grRastst</span><span class="p">,</span><span class="w"> </span><span class="n">z</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">9</span><span class="p">)</span><span class="w">
</span><span class="n">Rch_dem</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">crop</span><span class="p">(</span><span class="n">Rch_dem</span><span class="p">,</span><span class="w"> </span><span class="n">grRastst</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">Rch_dem</span><span class="p">)</span><span class="w">
</span><span class="c1"># Reduce the size of the elevation data, for speed</span><span class="w">
</span><span class="n">areademMatrix</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rayshader</span><span class="o">::</span><span class="n">resize_matrix</span><span class="p">(</span><span class="n">rayshader</span><span class="o">::</span><span class="n">raster_to_matrix</span><span class="p">(</span><span class="n">Rch_dem</span><span class="p">),</span><span class="w"> </span><span class="n">scale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.7</span><span class="p">)</span><span class="w">
</span><span class="c1"># transpose</span><span class="w">
</span><span class="n">demFinal</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="p">(</span><span class="n">areademMatrix</span><span class="p">)</span></code></pre></figure>
<p>A quick glimpse of the elevation data</p>
<figure>
<a href="/assets/images/iztdem.png"><img src="/assets/images/iztdem.png" width="660" /></a>
<figcaption>DEM</figcaption>
</figure>
<p><br /></p>
<p>Now we can compute the shadows and drape the array on the elevation data.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Compute shadows</span><span class="w">
</span><span class="n">ambient_layer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ambient_shade</span><span class="p">(</span><span class="n">demFinal</span><span class="p">,</span><span class="w"> </span><span class="n">zscale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">multicore</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">maxsearch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">200</span><span class="p">)</span><span class="w">
</span><span class="n">ray_layer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ray_shade</span><span class="p">(</span><span class="n">demFinal</span><span class="p">,</span><span class="w"> </span><span class="n">zscale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">multicore</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1"># Plot in 3D</span><span class="w">
</span><span class="p">(</span><span class="n">test_c_arr</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">255</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">add_shadow</span><span class="p">(</span><span class="n">ray_layer</span><span class="p">,</span><span class="w"> </span><span class="m">0.3</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">add_shadow</span><span class="p">(</span><span class="n">ambient_layer</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">plot_3d</span><span class="p">(</span><span class="n">demFinal</span><span class="p">,</span><span class="w">
</span><span class="n">zscale</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">150</span><span class="p">,</span><span class="w"> </span><span class="n">theta</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-50</span><span class="p">,</span><span class="w"> </span><span class="n">phi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">28</span><span class="p">,</span><span class="w"> </span><span class="n">zoom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.3</span><span class="p">,</span><span class="w">
</span><span class="n">windowsize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1200</span><span class="p">,</span><span class="w"> </span><span class="m">800</span><span class="p">),</span><span class="w"> </span><span class="n">solid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w">
</span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#3d314a"</span><span class="w">
</span><span class="p">)</span></code></pre></figure>
<p>Here’s a snapshot of the rgl window produced by <code class="language-plaintext highlighter-rouge">plot3d</code>, after fiddling with the camera values. If you are reading this in Mexico City, the area should look familiar.</p>
<figure>
<a href="/assets/images/iztsnap.png"><img src="/assets/images/iztsnap.png" width="660" /></a>
<figcaption>3D!</figcaption>
</figure>
<p><br /></p>
<p>An overhead view produced by <code class="language-plaintext highlighter-rouge">plot_map()</code> looks pretty good too.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Plot in 2D</span><span class="w">
</span><span class="p">(</span><span class="n">test_c_arr</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">255</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">add_shadow</span><span class="p">(</span><span class="n">ray_layer</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">add_shadow</span><span class="p">(</span><span class="n">ambient_layer</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">plot_map</span><span class="p">()</span></code></pre></figure>
<figure>
<a href="/assets/images/2dmap.png"><img src="/assets/images/2dmap.png" /></a>
<figcaption>Overhead</figcaption>
</figure>
<p><br /></p>
<p>Finally, we can add elements with rayrender and render the fancier maps with <code class="language-plaintext highlighter-rouge">render_highquality()</code>. I like this approach to add more atmosphere by using <code class="language-plaintext highlighter-rouge">light = FALSE</code> and instead lighting the scene with a sphere of light placed at a custom location.</p>
<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">Turn off the lights and add your own🤓: <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> <a href="https://twitter.com/hashtag/rayshader?src=hash&ref_src=twsrc%5Etfw">#rayshader</a> <a href="https://twitter.com/hashtag/rayrender?src=hash&ref_src=twsrc%5Etfw">#rayrender</a><br /><br />render_highquality(light = FALSE, scene_elements = sphere(y = 100, radius = 10, material = diffuse(lightintensity = 250, implicit_sample = TRUE))) <a href="https://t.co/g1QDEphZGh">pic.twitter.com/g1QDEphZGh</a></p>— Tyler Morgan-Wall (@tylermorganwall) <a href="https://twitter.com/tylermorganwall/status/1188505201193488386?ref_src=twsrc%5Etfw">October 27, 2019</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p><br /></p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># render with 'golden' light</span><span class="w">
</span><span class="n">render_highquality</span><span class="p">(</span><span class="w">
</span><span class="n">light</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w">
</span><span class="n">scene_elements</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rayrender</span><span class="o">::</span><span class="n">sphere</span><span class="p">(</span><span class="w">
</span><span class="n">z</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">450</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">200</span><span class="p">,</span><span class="w"> </span><span class="n">radius</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">36</span><span class="p">,</span><span class="w">
</span><span class="n">material</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rayrender</span><span class="o">::</span><span class="n">light</span><span class="p">(</span><span class="w">
</span><span class="n">intensity</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">150</span><span class="p">,</span><span class="w">
</span><span class="n">spotlight_width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w">
</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#ff773d"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span></code></pre></figure>
<p>Here are some renders with different camera locations and light placements.</p>
<figure>
<a href="/assets/images/3dfinal.png"><img src="/assets/images/3dfinal.png" width="660" /></a>
<figcaption>nice</figcaption>
</figure>
<figure>
<a href="/assets/images/3dwide.png"><img src="/assets/images/3dwide.png" width="660" /></a>
<figcaption>whoa</figcaption>
</figure>
<figure>
<a href="/assets/images/izt3dsh.png"><img src="/assets/images/izt3dsh.png" width="660" /></a>
<figcaption>cool</figcaption>
</figure>
<figure>
<a href="/assets/images/xyplt.png"><img src="/assets/images/xyplt.png" width="660" /></a>
<figcaption>gold</figcaption>
</figure>
<p><br /></p>
<p>The results came out quite nice, and with this setup we could even make animations or add additional elements. This approach can also work with base R plots, as long as the plot panel goes all the way to the edges of the output raster image. I hope someone finds this helpful or entertaining. If you have any questions or feedback please let me know.</p>Luis D. Verde ArregoitiaUsing rayshader to render ggplot2 maps in 3D.