Monday, 21 July 2014

Day #1 Session #2 - Jen Stirrup - Data Analysts Toolkit using R and PowerBI



Note on my shorthand [+] means actionable insight, [i] refers to something of interest, [!] indicates a warning, [?] poses a question I'll need to ask


R is an open source statistical programming language. It's complimentary to many other data tools but does not replace them entirely.

Variables are untyped therefore naming conventions are important.
Lists are know as Vectors
DataSets are known as DataFrames
A Dimension is known as a Factor

It can import and export data. It can access CSV's and ODBC connections. And even Hadoop.

It's difficult to Google for R resources because of it's name. Use RSeek.org which is an R specific search engine. Jen Stirrup the presenter has a blog which explains how to use the various data analytics tools with R.

[?] Have we got any R skill in the Analytics team or elsewhere in the business.

R is a commend line language but RStudio is a helpful tool.

Excel snap-in tools PowerQuery, PowerView,PowerMap and PowerPivot worth looking at to get Analytics using standard tools for testing and modelling.

PowerQuery really good for data mungeing and cleaning. It can create models for re-use. Analytics needs to use this to build automated testing over their XL estate. It's good for Data Mining as well.

Rattle is a data mining package for R which can reuse models.

Overall - Interesting session. CIG Analytics should use PowerQuery to build models to do testing. Send them the video. If they have any R skills we should encourage them to use it. Well delivered if a bit slow at times but she was following Brent. 3*'s

No comments:

Post a Comment