Data-driven approach to try and rank R packages

"06/13-15/2012,Vanderbilt University, use R!: Slicing and dicing big data with RHadoop-rmr — Yours truly to speak"

speaking - Antonio Piccolboni

No more installing R on each node. No more begging IT for permission to even try. First working version (branch 0-install), tested on a Ubuntu Natty EC2 whirr cluster. Hopefully generalizable to other settings. Please kick the tires.

R among fastest growing languages as share of computer book market according to O’Reilly (fastest that my eyes can see)

R among fastest growing languages as share of computer book market according to O’Reilly (fastest that my eyes can see)

Fixes a problem with streaming cmd line syntax when using the backend parameter option

Big Data Step-by-Step: Using R & Hadoop (with RHadoop’s rmr package), slides by @jeffreybreen

Big Data Workshop - Boston Predictive Analytics (Boston, MA) - Meetup - RHadoop demo with @jeffreybreen

- 2:15 to 4:00   VM, R-Studio, rmr - Jeffrey Breen

Another way into the cloud is directly through a virtual manager.  Jeffrey will guide us through the setup, and then proceed to load R-Studio onto a node - essentially one now has access to a bigger computer to run jobs.  Jeffrey will then use the ‘rmr’ Hadoop-based package with an airline performance dataset.

- 2:15 to 4:00   VM, R-Studio, rmr - Jeffrey Breen Another way into the cloud is directly through a virtual manager.  Jeffrey will guide us through the setup, and then proceed to load R-Studio onto a node - essentially one now has access to a bigger computer to run jobs.  Jeffrey will then use the ’ …
from http://workstream.piccolboni.info/big-data-workshop-boston-predictive-analytics via Workstream

A combined feed of all things RHadoop

Check out this website I found at friendfeed.com
Check out this website I found at friendfeed.com …
from http://workstream.piccolboni.info/a-combined-feed-of-all-things-rhadoop via Workstream

Help define plans for rmr 1.3, the efficiency release

Based on user feedback, we  intend  to devote release 1.3 of rmr to efficiency issues. Please join the discussion on github. See also this video.Based on user feedback, we  intend  to devote release 1.3 of rmr to efficiency issues. Please join the discussion on github. See also this video . …
from http://workstream.piccolboni.info/help-define-plans-for-rmr-13-the-efficiency-r via Workstream

Revolutions: RHadoop updated: improved performance and more control

« Revolution Analytics at Strata 2012 | Main

February 27, 2012

RHadoop updated: improved performance and more control

Revolution Analytics’ open-source RHadoop project, which provides integration between R and Hadoop, has been updated with the release of version 1.2 of the “rmr” package. New in this version: support for binary I/O formats, which improves on the text-only interfact by allowing use of faster and more space-efficient data formats like R’s native serialization format. This version also improves the performance of the reduce step (to get around the fact that list appends in R are not constant-time operations), and provides control to the Hadoop user to do things like set number of reducers on a per-job basis.

Find more details about these and other updates in rmr 1.2 (available now) at the link below.

RHadoop: Overview of rmr v1.2

Posted by at 10:00 in big data, packages, R | Permalink

TrackBack

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Verify your Comment

Previewing your Comment

Posted by:  | 

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

<div id=”comments-open-noscript” class=”comments-open-noscript”> <form action=”http://blog.revolutionanalytics.com/.services/comments” method=”post”> <input type=”hidden” name=”entry_xid” id=”comment-entry-xid” value=”6a010534b1db25970b01676311448e970b” /> <input type=”hidden” name=”token” value=”1330366183-be2caed6c17297090f61bc8d4d691a168cd9dec7:ZSYYNNSqD0qZRhbn” /> Name:<br /><input type=”text” name=”author” size=”30” /><br /> Email address:<br /><input type=”text” name=”email” size=”30” /><br /> URL:<br /><input type=”text” name=”url” size=”30” /><br /><br /> Comment:<br /><textarea name=”text” cols=”40” rows=”4”></textarea><br /> <input type=”submit” name=”submit” value=”Submit” /> </form> </div>
« Revolution Analytics at Strata 2012 | Main February 27, 2012 RHadoop updated: improved performance and more control Revolution Analytics’ open-source RHadoop project , which provides integration between R and Hadoop , has been updated with the release of version 1.2 of the “rmr” package . New in t …
from http://workstream.piccolboni.info/revolutions-rhadoop-updated-improved-performa via Workstream

RHadoop at > 200 watchers, rmr >300 download

Come join me for “RHadoop, R meets Hadoop”: Strata 2012 - 10:40am Wednesday, 02/29/2012

RHadoop, R meets Hadoop

Antonio Piccolboni (Revolution Analytics)
Hadoop & Big Data: Tech
Location: GA J

Rhadoop is an open source project spearheaded by Revolution Analytics to grant data scientists access to Hadoop’s scalability from their favorite language, R. RHadoop is comprised of three packages.

  • rhdfs provides file level manipulation for HDFS, the Hadoop file system
  • rhbase provides access to HBASE, the hadoop database
  • rmr allows to write mapreduce programs in R

rmr allows R developers to program in the mapreduce framework, and to all developers provides an alternative way to implement mapreduce programs that strikes a delicate compromise betwen power and usability. It allows to write general mapreduce programs, offering the full power and ecosystem of an existing, established programming language. It doesn’t force you to replace the R interpreter with a special run-time—it is just a library. You can write logistic regression in half a page and even understand it. It feels and behaves almost like the usual R iteration and aggregation primitives. It is comprised of a handful of functions with a modest number of arguments and sensible defaults that combine in many useful ways. But there is no way to prove that an API works: one can only show examples of what it allows to do and we will do that covering a few from machine learning and statistics. Finally, we will discuss how to get involved.

RHadoop, R meets Hadoop Antonio Piccolboni (Revolution Analytics) 10:40am Wednesday, 02/29/2012 Hadoop & Big Data: Tech Location: GA J Rhadoop is an open source project spearheaded by Revolution Analytics to grant data scientists access to Hadoop’s scalability from their favorite language, R. RHadoo …
from http://workstream.piccolboni.info/come-join-me-for-rhadoop-r-meets-hadoop-strat via Workstream