Data-driven approach to try and rank R packages
No more installing R on each node. No more begging IT for permission to even try. First working version (branch 0-install), tested on a Ubuntu Natty EC2 whirr cluster. Hopefully generalizable to other settings. Please kick the tires.
Fixes a problem with streaming cmd line syntax when using the backend parameter option
Big Data Workshop - Boston Predictive Analytics (Boston, MA) - Meetup - RHadoop demo with @jeffreybreen
- 2:15 to 4:00 VM, R-Studio, rmr - Jeffrey Breen
Another way into the cloud is directly through a virtual manager. Jeffrey will guide us through the setup, and then proceed to load R-Studio onto a node - essentially one now has access to a bigger computer to run jobs. Jeffrey will then use the ‘rmr’ Hadoop-based package with an airline performance dataset.
from http://workstream.piccolboni.info/big-data-workshop-boston-predictive-analytics via Workstream
A combined feed of all things RHadoop
from http://workstream.piccolboni.info/a-combined-feed-of-all-things-rhadoop via Workstream
Help define plans for rmr 1.3, the efficiency release
from http://workstream.piccolboni.info/help-define-plans-for-rmr-13-the-efficiency-r via Workstream
Revolutions: RHadoop updated: improved performance and more control
« Revolution Analytics at Strata 2012 | Main
February 27, 2012
RHadoop updated: improved performance and more control
Revolution Analytics’ open-source RHadoop project, which provides integration between R and Hadoop, has been updated with the release of version 1.2 of the “rmr” package. New in this version: support for binary I/O formats, which improves on the text-only interfact by allowing use of faster and more space-efficient data formats like R’s native serialization format. This version also improves the performance of the reduce step (to get around the fact that list appends in R are not constant-time operations), and provides control to the Hadoop user to do things like set number of reducers on a per-job basis.
Find more details about these and other updates in rmr 1.2 (available now) at the link below.
RHadoop: Overview of rmr v1.2
Posted by David Smith at 10:00 in big data, packages, R | Permalink
TrackBack
TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a010534b1db25970b01676311448e970bListed below are links to weblogs that reference RHadoop updated: improved performance and more control:
Verify your Comment
Previewing your Comment
Posted by: |
This is only a preview. Your comment has not yet been posted.
Your comment could not be posted. Error type:Your comment has been posted. Post another commentThe letters and numbers you entered did not match the image. Please try again.
As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.
Having trouble reading this image? View an alternate.
<div id=”comments-open-noscript” class=”comments-open-noscript”> <form action=”http://blog.revolutionanalytics.com/.services/comments” method=”post”> <input type=”hidden” name=”entry_xid” id=”comment-entry-xid” value=”6a010534b1db25970b01676311448e970b” /> <input type=”hidden” name=”token” value=”1330366183-be2caed6c17297090f61bc8d4d691a168cd9dec7:ZSYYNNSqD0qZRhbn” /> Name:<br /><input type=”text” name=”author” size=”30” /><br /> Email address:<br /><input type=”text” name=”email” size=”30” /><br /> URL:<br /><input type=”text” name=”url” size=”30” /><br /><br /> Comment:<br /><textarea name=”text” cols=”40” rows=”4”></textarea><br /> <input type=”submit” name=”submit” value=”Submit” /> </form> </div>
from http://workstream.piccolboni.info/revolutions-rhadoop-updated-improved-performa via Workstream
Come join me for “RHadoop, R meets Hadoop”: Strata 2012 - 10:40am Wednesday, 02/29/2012
RHadoop, R meets Hadoop
Antonio Piccolboni (Revolution Analytics)Rhadoop is an open source project spearheaded by Revolution Analytics to grant data scientists access to Hadoop’s scalability from their favorite language, R. RHadoop is comprised of three packages.
- rhdfs provides file level manipulation for HDFS, the Hadoop file system
- rhbase provides access to HBASE, the hadoop database
- rmr allows to write mapreduce programs in R
rmr allows R developers to program in the mapreduce framework, and to all developers provides an alternative way to implement mapreduce programs that strikes a delicate compromise betwen power and usability. It allows to write general mapreduce programs, offering the full power and ecosystem of an existing, established programming language. It doesn’t force you to replace the R interpreter with a special run-time—it is just a library. You can write logistic regression in half a page and even understand it. It feels and behaves almost like the usual R iteration and aggregation primitives. It is comprised of a handful of functions with a modest number of arguments and sensible defaults that combine in many useful ways. But there is no way to prove that an API works: one can only show examples of what it allows to do and we will do that covering a few from machine learning and statistics. Finally, we will discuss how to get involved.
from http://workstream.piccolboni.info/come-join-me-for-rhadoop-r-meets-hadoop-strat via Workstream
