Skip to content Skip to sidebar Skip to footer

Read Loaded R File Data in R Studio

[This commodity was first published on Rcrastinate, and kindly contributed to R-bloggers]. (Yous tin study outcome most the content on this page hither)


Desire to share your content on R-bloggers? click hither if y'all have a blog, or here if you don't.

What I will show yous

In this post, I want to show you a few ways how you can salvage your datasets in R. Maybe, this seems like a dumb question to yous. Only after giving quite a few R courses mainly – but not only – for R beginners, I came to acknowledge that the answer to this question is non obvious and the unlike possibilites tin be disruptive. In this post, I want to give an overview over the different alternatives and also country my opinion which style is the best in which situation.

Why would y'all want to know that?

Well, there are quite a few tutorials out there on how to read data into R. RStudio fifty-fifty has a special button for this in the 'Surround' tab – it's labelled 'Import Dataset'. But there is no button and too fewer tutorials on saving information. That's strange, isn't it? If y'all import your data, you might do some (sometimes lengthy) manipulation, aggregation, selection and other stuff. If all that stuff takes several minutes (or even longer), yous might not want to do it everytime you are working with the information. And then, you might want to salvage your dataset at a stage that's pre-analyses merely mail service-processing (where 'processing' might include cleaning, manipulating, calculating new variables, merging, selecting, aggregating and lots of other stuff).

What are we going to practise?

I will show you the following ways of saving or exporting your data from R:

  • Saving it equally an R object with the functions save() and saveRDS()
  • Saving it as a CSV file with write.tabular array() or fwrite()
  • Exporting it to an Excel file with WriteXLS()

For me, these options embrace at to the lowest degree 90% of the stuff I have to exercise at piece of work. So I hope that information technology'll work for you, likewise.

Training: Load some data

I volition use some adequately (merely non very) large dataset from the motorcar packet. The dataset is chosen MplsStops and holds information well-nigh stops made past the Minneapolis Police Department in 2017. Of course, you lot can admission this dataset past installing and loading the car package and typing MplsStops. However, I desire to simulate a more typical workflow here. Namely, loading a dataset from your disk (I will load it over the Www). The dataset is also available from GitHub:

data <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv",                    sep = ",", header = T,                    row.names = 1) scroll_box(kable(head(information), row.names = F),            width = "100%", height = "300px")
idNum date trouble MDC citationIssued personSearch vehicleSearch preRace race gender lat long policePrecinct neighborhood
17-000003 2017-01-01 00:00:42 suspicious MDC NA NO NO Unknown Unknown Unknown 44.96662 -93.24646 1 Cedar Riverside
17-000007 2017-01-01 00:03:07 suspicious MDC NA NO NO Unknown Unknown Male person 44.98045 -93.27134 1 Downtown West
17-000073 2017-01-01 00:23:15 traffic MDC NA NO NO Unknown White Female 44.94835 -93.27538 5 Whittier
17-000092 2017-01-01 00:33:48 suspicious MDC NA NO NO Unknown East African Male 44.94836 -93.28135 five Whittier
17-000098 2017-01-01 00:37:58 traffic MDC NA NO NO Unknown White Female 44.97908 -93.26208 ane Downtown W
17-000111 2017-01-01 00:46:48 traffic MDC NA NO NO Unknown Eastward African Male 44.98054 -93.26363 1 Downtown West

We now have a dataset with over 50,000 rows (you tin can scroll through the kickoff half dozen of them in the box in a higher place) and 14 variables in our global environment (the 'workspace'). But for the sake of simulating a real workflow, I will exercise some very calorie-free information manipulation. Here, I'm assigning a new column information$gender.not.known which is Truthful whenever information$gender is "Unknown" or NA.

data$gender.not.known <- is.na(data$gender) | information$gender == "Unknown"

As I wrote above: Saving the current country of your dataset in R makes sense when all the preparations take a lot of time. If they don't, you can just run your pre-processing code every fourth dimension you are getting back to analyzing the dataset. In the scope of this mail, let'south suppose that the calculation above took veeeery long and y'all absolutely don't want to run it everytime.

Option 1: Save as an R object

Whenever I'm the only 1 working on a projection or everybody else is also using R, I like to save my datasets equally R objects. Basically, it'due south just saving a variable/object (or several of them) in a file on your disk. In that location are 2 ways of doing this:

  1. Use the function save() to create an .Rdata file. In these files, yous can store several variables.
  2. Utilize the function saveRDS() to create an .Rds file. You tin can merely store one variable in it.

Pick ane.1: save()

You can salve your data simply by doing the following:

save(data, file = "data.Rdata")

By default, the parameter shrink of the save() office is turned on. That means that the resulting file volition utilize less space on your deejay. Still, if it is a really huge dataset, it could take longer to load it later on because R get-go has to extract the file again. So, if you lot desire to save infinite, so leave information technology as it is. If you want to save time, add a parameter compress = F.

If you want to load such an .Rdata file into your environment, merely do

load(file = "data.Rdata")

Then, the object is bachelor in your workspace with its old proper noun. Here, the new variable will besides have the name data. With save() You lot can also relieve several objects in ane file. Let'due south duplicate information to simulate this.

data2 <- data save(list = c("information", "data2"), file = "data.Rdata")

Now, if yous do load("data.Rdata"), you will have two more than objects in your workspace, namely data and data2.

Selection one.2: saveRDS()

This is the 2nd pick of saving R objects. saveRDS() can only be used to salve one object in one file. The "loading function" for saveRDS() is readRDS(). Let'southward effort it out.

saveRDS(data, file = "information.Rds") data.copy <- readRDS(file = "information.Rds")

At present, yous have some other R object in your workspace which is an verbal copy of data. The compress parameter is too available for readRDS().

Note that yous cannot "mix" the saving and loading functions: save() goes together with load(), saveRDS() goes together with readRDS().

The difference between save() and saveRDS()

So, you might ask "why should I use saveRDS() instead of save()"? Actually, I similar saveRDS() better - for one specific reason that you might not have noticed in the calls above. When nosotros utilize load(), we practice not assign the result of the loading process to a variable because the original names of the objects are used. But this also ways that you take to "remember" the names of the previously used objects when using load().

When nosotros use readRDS(), we have to assign the result of the reading process to a variable. This might mean more typing but information technology as well has the advantage that y'all can choose a new name for the variable to integrate information technology in into the rest of the new script more smoothly. As well, it is more like to the beliefs of all the other "reading functions" like read.table(): for these, you too accept to assign the upshot to a variable. The merely advantage of salvage() really is that y'all tin save several objects into one file - but in the end it might exist amend to take i file for one object. This might be more than clearly organized.

Option 2: Save every bit a CSV file

Whenever you are non so who will work with the data later on and whether these people are all using R, you might want to export your dataset equally a CSV file. Likewise, it's human readable. Too, if you provide a dataset on some website (e.g. in the Dataverse for other researchers, information technology is kind to provide a CSV file because everyone tin can open it with their preferred statistical software package.

Option 2.1: write.table()

You can remember of write.table() every bit the "opposite" of read.table(). Fifty-fifty the parameters are quite like. Let's endeavour it:

write.table(data, file = "information.csv",             sep = "\t", row.names = F)

We merely saved the data.frame stored in information as a CSV file with tabs as field separators. We also suppressed the rownames. I don't know why, only by default, write.table() is storing the rownames in the file which I discover a lilliputian foreign.

Oh, and you tin besides use write.tabular array() to append the contents of your data.frame at the end of the file: only set the parameter suspend to TRUE. This is neat whenever yous desire to "fill" a file in multiple steps (e.g., in a for loop). Remember to suppress the cavalcade names if yous're appending content to files because y'all don't desire them to be repeated throughout the file - just set col.names = F.

Option 2.2: fwrite()

Is your dataset really huge, like several gigabytes of data? So try giving fwrite() from the data.table packet a spin! It uses multiple CPU cores for writing data. Just like fread() from the same package, information technology is much much faster for larger files. Another advantage: the row.names parameter is FALSE by default. The most-widely used parameters have the aforementioned names every bit in write.table(). Neat!

library(information.tabular array) t0 <- Sys.time() for (i in ane:10) {   write.table(data,               file = "writetable.information.csv",               sep = "\t", row.names = F) } difftime(Sys.time(), t0) ## Time departure of two.742357 secs t1 <- Sys.time() for (i in 1:10) {   fwrite(data,          file = "fwrite-data.csv",          sep = "\t") } difftime(Sys.time(), t1) ## Fourth dimension departure of 0.191942 secs

Run across that? Even with merely x replications of writing a rather small dataset to disk, fwrite() has a huge timing reward (information technology's more than than 10 times faster!). For a very big dataset, this might come up in really handy.

Pick 3: Salvage equally an Excel file

You might come into a situation where yous want to export your dataset to an Excel file. Maybe some colleagues only work with Excel (because you notwithstanding not managed to convince them switching to R) or you want to utilize Excel for annotating your dataset with a spreadsheet editor. In this case, you lot can use the Write.XLS() function from the Write.XLS parcel. I tried a few packages for writing Excel files and I find this 1 the most user-friendly to use.

Let's try it.

library(WriteXLS) WriteXLS(data, ExcelFileName = "data.xlsx",          SheetNames = "my information",          AdjWidth = T,          BoldHeaderRow = T)

This is what the resulting Excel file looks like on my automobile.

You tin can relieve several dataframes in one Excel file by including the names of the objects at the first position. Here, you could replace data with c("data", "data2"). With the parameter SheetNames you can gear up the names of the information sheets (visible at the bottom of Excel, not included in the screenshot). If you want to write several data.frames into several sheets of the Excel file, you can put several names in a vector here that have to correspond with the names of the objects at the beginning position.

Adj.Width is a overnice parameter considering it tries to arrange the width of the columns in Excel in a way that every entry fits in the cells. BoldHeaderRow is self-explanatory, I guess. You can run across the outcome in the screenshot. Oh, and by the way, you can set the entries for NA values with the na parameter. It's "" by default.

Summary

  • If you lot know that the dataset is going to be used in R and R only, apply saveRDS(). save() is OK, too. But yous cannot assign the result of load()ing your data dorsum into R to a variable proper noun of your pick. You can save uncompressed files by setting compress = F. Reading those files back in is much faster but they use more infinite on your disk.
  • If you want to distribute your dataset to a lot of people from whom you don't know which statistical processing software bundle they use, you can salve CSV files. I recommend using fwrite() from the data.table package because it is much faster than write.table().
  • If y'all really really want (or need) an Excel file, I recommend using WriteXLS() from the WriteXLS package.

If you think that I should also cover other formats of saving a dataset on the deejay, please let me know in the comments and I will try to cover them as well.

thorntonthroplad1935.blogspot.com

Source: https://www.r-bloggers.com/2019/05/how-to-save-and-load-datasets-in-r-an-overview/

Post a Comment for "Read Loaded R File Data in R Studio"