TAGS

Recent Posts

Archives

The Rio Package in Detail: Step Eleven in Learning R Programming for Free
Posted on January 3, 2019
Author: Linda Stewart, Performance Architects

I hope you are enjoying the “Learning R Programming for Free” series; here are links to the previous segments (Step One, Step Two, Step Three, Step Four, Step Five, Step Six, Step Seven, Step Eight, Step Nine, Step Ten) to provide some helpful background.

In the previous installment, we learned about setting up R on RedHat Linux 6 and a little bit about converting files with the “rio package.”  We installed R on a RedHat Linux server and added the rio package.  Installing the rio package also showed us how we can add packages manually if we cannot directly connect to an R mirror.

In this installment, we will discuss the rio package in more detail since it is such a powerful tool to have on hand. The rio package is a great tool to convert data from one format to another format.

First, to test our rio installation in RedHat Linux, if everything went all right with the package installation, we created a small Excel file like this (your file can contain anything interesting for you):

ID FullName Hometown Interest
1 Mouse, Mickey Tugboat Minnie
2 Duck, Donald Swim Pond Bugs
3 Mouse, Minnie Disney Studios Mickey
4 Bunny, Bugs Rabbit Hole Carrots

 

Save it as an XLS file and move it to the Linux server using a utility such as sftp. I saved my file as: myfile.xlsx

We then used Rscript (command line R) and a call to rio to covert XLSX format to CSV (comma separated format):

Rscript -e “rio::convert(‘myfile.xlsx’,’myfile.csv’)”

[riotest]$ ls -1

myfile.csv 

myfile.xlsx

Let’s look at our CSV file:

[riotest]$ cat *.csv

ID,FullName,Hometown,Interest

1,”Mouse, Mickey”,Tugboat,Minnie

2,”Duck, Donald”,Swim Pond,Bugs

3,”Mouse, Minnie”,Disney Studios,Mickey

4,”Bunny, Bugs”,Rabbit Hole,Carrots

Let’s move on to a discussion of the usefulness of the rio package.  The most interesting functions in the rio package are “export,” “import,” and “import_list.”

Using Export()

Exporting data is handled with one function, export():

library(“rio”)

export(mtcars, “mtcars.csv”) # comma-separated values

export(mtcars, “mtcars.rds”) # R serialized

export(mtcars, “mtcars.sav”) # SPSS

A particularly useful feature of rio is the ability to import from and export to compressed (e.g., zip) directories, saving users the extra step of compressing a large exported file:

export(mtcars, “mtcars.tsv.zip”)

As of rio v0.5.0, “export()” can also write multiple data frames to respective sheets of an Excel workbook or an HTML file:

export(list(mtcars = mtcars, iris = iris), file = “mtcars.xlsx”)

Open “mtcars.xlsx” and we find multiple tabs!

Using Import()

Importing data is handled with one function, import():

cars_df1 <- import(“mtcars.csv”)
cars_df2 <- import(“mtcars.rds”)
cars_df3 <- import(“mtcars.sav”)

Notice that with “import” and “export,” we never specified the file formats.  That is because the conversion is done implicitly as a result of the file extension.

Using Import_list()

The import_list() function imports a list of data frames from a multi-object file. For example, import a multiple sheet Excel file:

str(import_list(“mtcars.xlsx”))

## List of 2

##  $ mtcars:’data.frame’:  32 obs. of  11 variables:

##   ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 …

##   ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 …

##   ..$ disp: num [1:32] 160 160 108 258 360 …

##   ..$ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 …

##   ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 …

##   ..$ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 …

##   ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 …

##   ..$ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 …

##   ..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 …

##   ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 …

##   ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 …

##  $ iris  :’data.frame’:  150 obs. of  5 variables:

##   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 …

##   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 …

##   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 …

##   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 …

Notice that we now have two data frames: “mtcars” and “iris.”

That concludes our current discussion about rio.  Review the main page for rio to get a complete list of all of the formats rio can process including JSON.

In our next installment, we will use the rio package to get some data into RStudio and then we will work with more variations of ggplot, specifically bar charts.

Additional blog posts on more complex R concepts to follow; please contact communications@performancearchitects.com if you have any questions or need further help!

 

Share
© Performance Architects, Inc. and Performance Architects Blog, 2006 - present. Unauthorized use and/or duplication of this material without express and written permission from this blog's author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Performance Architects, Inc. and Performance Architects Blog with appropriate and specific direction to the original content.

Leave a Reply

Your email address will not be published. Required fields are marked *