How To: Display suggested reading (or anything) on R start-up

Introduction

With the exception of endless self-promotion, I primarily use Twitter to learn about new developments and tools in data science and the R community.

Lately, I’ve been finding increasingly more great content from people, hitting the ‘favorite’ button to save the tweet for future reference, and then never going back to read all the posts I want to read.

The Goal

In this super short blog post, I will write a function to display a random URL (primarily from personal blogs or GitHub repos) from my Twitter favorites and then display it when I begin a new R session.

The main goal is to figure out how to run script on R start-up. The secondary goal will hopefully nudge me into reading more of the content I find on Twitter.

tl;dr

This was a simple project, but did have a few twists and require me to learn a few new tricks. The process:

  • Write a function, minimizing package calls to collect a user’s likes from Twitter & choose one.
  • Provide options to automatically open the link or do so on user confirmation.
  • Call this function at start-up in interactive sessions using the Rprofile.sys file.
  • To avoid excessive API calls and confirmations, either set up your own Twitter API token or only collect tweets at regular intervals.

Here’s the final script.

Tools

To avoid requiring too many packages to be installed, I use mostly base R (throwback!) and {rtweet} by Michael Kearney, which collects tweets from your timeline or likes and displays them in a tidy data table.

Collect & display a tweet URL

I began by writing a function to collect the recent tweets I have liked, retrieve the URLs from the ones with links, and pick a random one to display.

The first time you run a function from rtweet, your browser will open to Twitter, where you must authorize the app. This authorization is a one-time thing for this R session. Since our goal is to run this function each time we begin an R session, you can see how this might become a problem. But let’s ignore it for now.

#Collect favorited tweets for a user timeline
rtweet::get_favorites("ryantimpe") -> likes

#Process twitter data for likes
unlist(likes$urls_expanded_url) -> likes_urls

#Remove likes without URLs...
likes_urls <- likes_urls[!is.na(likes_urls)]

rtweet::get_favorites() returns the most recent 200 favorited tweets, but you can increase this if you want, depending on your Twitter habits and your motivations for liking tweets.

When I ran this script just now, I ended up with 175 URLs from likes. Great! However, upon manually inspecting the URLs, I see a lot of links from domains that don’t really fit my goal to review blog posts and GitHub repos I think I’d like.

Next I apply a few more filters to this links to try to ensure I only sample from a list of URLs that contain recommended reading. All of these are of course optional.

  • Removing ‘twitter.com’ takes quote tweets out of the list of URLs
  • I dropped ‘bit.y’ and other link shorteners mostly because I like so many #TidyTuesday tweets.
  • I don’t want to see my own content or the LEGO.com content I like, so I remove those two. (I recommend YOU keep those two though 😄)
#Remove likes with links that you don't want to see...
# (like tweets, my job, my own site, etc.)
# Might want to consider github.com if you only want posts and not repos
link_block_list = c("twitter.com", "instagram.com",
                    "bit.ly", "tinyurl.com",
                    "ryantimpe.com", "lego.com")

likes_urls <- likes_urls[!grepl(paste0(link_block_list, collapse = "|"), likes_urls)]

#... And remove root-only URLs (like https://ryantimpe.com/)
likes_urls <- likes_urls[grepl("//.+/.+/", likes_urls)]

#Pick one
this_link <- sample(likes_urls, 1)
cat("Check out this link you wanted to read!\n", this_link, "\n")

Run script on R start-up

In order for this script to run on R start-up, you need to edit add it to your .Rprofile file. I did this manually after locating the file in C:\Program Files\R\R-n.n.n\etc on my Windows machine. In order to let me change the file, I had to change the permissions on the folder.

For more information, including tips on doing this with the {usethis} package, see this link.

Within the .Rprofile file, add a function .First(){} and paste your Twitter script inside the function. You’ll also want to ensure that the script only runs in interactive R sessions.

#Other script already inside .Rprofile...
#...

.First <- function(){
  if(interactive()){ 
    #SCRIPT YOU WANT TO RUN ON START-UP
  }
}

Avoid constant authorization calls

If you put our Twitter URL scripts inside of .First() and restart R, your browser will open the Twitter authentication window again and as long as it’s not your first time recently, you’ll see a white page with black text displaying Authentication complete. Please close this page and return to R.. That’s annoying! We definitely don’t want to see that every time we open R.

To avoid this, I can think of two options:

The smart way

Create your own Twitter App for API authentication and save the credentials to your R profile. This page explains it better than I ever could.

The impatient way

Instead of applying for access to the Twitter API, I chose to instead limit the number of times I’ll actually run the rtweet::get_favorites() function, limiting the number of times I must authenticate on R start-up. After it runs, I cache the ~200 tweets in my temporary folder.1 My logic here is that I probably won’t run out of tweets I want to read over the course of a week or two, so no harm in not always having the most recent tweets in my sample.

I added an input to my function update_increment = 14 to only collect tweets that I liked every 14 days, so the annoying authentication page will only open once every 2 weeks. Not ideal, but much less annoying than every time I restart R.

#User input
update_increment = 14 #days

#Location of cached file
file_name = paste0(dirname(tempdir()), "/rtweetlikes_", user, ".rds")
#Number of days since Favorites file was updated
if(!file.exists(file_name)){
  days_since_update = 99
} else {
  days_since_update = as.numeric(Sys.Date() - as.Date(file.info(file_name)$ctime))
}
#Decide whether to perform an API call or not
if(days_since_update > update_increment){
  rtweet::get_favorites(user) -> likes
  saveRDS(likes, file = file_name)
} else {
  likes <- readRDS(file_name)
}

The complete function

Putting it all together, we end up with a pretty short script that we can add to the .Rprofile file. See the full file here on GitHub or expand the section below.

Click here for code

.First <- function(){
  if(interactive()){
    random_url_at_startup <- function(user = "ryantimpe",
                                      open_link = "no",
                                      link_block_list = c("lego.com", "ryantimpe.com"),
                                      update_increment = 14){

      #Location of cached file
      file_name = paste0(dirname(tempdir()), "/rtweetlikes_", user, ".rds")
      #Number of days since Favorites file was updated
      if(!file.exists(file_name)){
        days_since_update = 99
      } else {
        days_since_update = as.numeric(Sys.Date() - as.Date(file.info(file_name)$ctime))
      }
      #Decide whether to perform an API call or not
      if(days_since_update > update_increment){
        rtweet::get_favorites(user) -> likes
        saveRDS(likes, file = file_name)
      } else {
        likes <- readRDS(file_name)
      }

      #Process twitter data for likes
      unlist(likes$urls_expanded_url) -> likes_urls

      #Remove likes without URLs...
      likes_urls <- likes_urls[!is.na(likes_urls)]

      #... And remove likes with links that you don't want to see...
      # (like tweets, my job, my own site, etc.)
      # Might want to consider github.com if you only want posts and not repos
      link_block_list = c(link_block_list, "twitter.com", "instagram.com",
                          "bit.ly", "tinyurl.com")
      likes_urls <- likes_urls[!grepl(paste0(link_block_list, collapse = "|"), likes_urls)]

      #... And remove only root URLs (like https://ryantimpe.com/)
      likes_urls <- likes_urls[grepl("//.+/.+/", likes_urls)]

      #Pick one
      this_link <- sample(likes_urls, 1)
      cat("Check out this link you wanted to read!\n", this_link, "\n")

      #Should the link open in your browser?
      if(open_link == "ask"){
        user_open <- readline(prompt="Open the link? (y/n): ")
      } else {
        user_open <- open_link
      }
      if(tolower(substr(user_open, 1, 1)) == "y") {
        browseURL(this_link)
      }

    } #End function
    random_url_at_startup() #Call the function
  }
}


Check out the full script on GitHub.

Follow me on Twitter at ryantimpe!


  1. I don’t know how temporary files work on a Mac, but you can change this file to save in any folder you want.↩︎

Avatar
Ryan Timpe
Data Science | Economics

Related