Monthly Archives: November 2015

Embedded R Shiny apps

In the following week(s) I’ll explain on how to create and deploy an interactive R application (referred to as Shiny) to your blog/website. Here is a simple example were the throw of a dice is simulated once the button is pressed. The number on which the dice “lands” is recorded and added the associated bin of the histogram. Theoretically, the histogram should be almost “flat” after a large number of iterations.

https://etiennekoen.shinyapps.io/rolldice

It seems that iframes are not supported in WordPress.com. For now please click on the link above. After further investigation it seems that iframes, for security reasons, are not supported in WordPress (.com) sites. This can be enabled by hosting the site yourself and installing one of many plugins to enable this functionality for WordPress (.org) blogs.

However, it is possible to embed youtube videos using the short code html tags, even for WordPress.com hostes sites.

Sampling

When training a machine learning model it’s often the case that some of the outcomes to be predicted or the features/variables associated with the outcomes are non-uniformly distributed. Any many scenarios we often found distributions to be “bell-shaped” like, or more formally normally distributed:

> x<-seq(-4,4,length=200)
> y<-dnorm(x,mean=0, sd=1)
> plot(x,y, type="l", lwd=2)

rnorm

Let’s start with an elementary example:

Consider a class of 100 students who were all awarded a mark at the end of the school term from A – E. Let’s say the majority of students obtained a mark “C”. A few students did quite well and achieved an A aggregate while others did not perform that great and achieved an E aggregate and so on…

To perform such a simulation in R we sample from a list of marks with some probability associated with each aggregated mark:

> marks <- sample(LETTERS[1:5],100,prob=c(0.1,0.2,0.4,0.2,0.1),replace=T)
> marks
  [1] "C" "B" "D" "C" "B" "C" "D" "C" "E" "C" "C" "D" "C" "C" "A" "E" "C" "B" "B" "C" "A"
 [22] "C" "B" "B" "C" "C" "A" "B" "C" "D" "C" "B" "C" "C" "B" "B" "A" "C" "A" "E" "C" "C"
 [43] "A" "C" "B" "E" "D" "C" "A" "C" "C" "B" "C" "B" "D" "D" "D" "B" "C" "B" "B" "A" "B"
 [64] "D" "C" "D" "B" "B" "C" "E" "C" "B" "B" "B" "C" "E" "D" "C" "A" "A" "A" "B" "A" "E"
 [85] "D" "D" "B" "D" "B" "E" "E" "C" "C" "D" "B" "A" "C" "B" "D" "D"

Note that I am using LETTERS[1:5] to sample from A(1) to E(5) with probabilities prob=c(0.1,0.2,0.4,0.2,0.1) accordingly and with replacement (replace=T).

The histogram of this sampled space will look as follow

>barplot(table(marks),col=1:5,xlab="Mark",ylab="Number of students")

studentsRaw

As noted, the majority of students achieved a C aggregate.