When training a machine learning model it’s often the case that some of the outcomes to be predicted or the features/variables associated with the outcomes are non-uniformly distributed. Any many scenarios we often found distributions to be “bell-shaped” like, or more formally normally distributed:

> x<-seq(-4,4,length=200)
> y<-dnorm(x,mean=0, sd=1)
> plot(x,y, type="l", lwd=2)


Let’s start with an elementary example:

Consider a class of 100 students who were all awarded a mark at the end of the school term from A – E. Let’s say the majority of students obtained a mark “C”. A few students did quite well and achieved an A aggregate while others did not perform that great and achieved an E aggregate and so on…

To perform such a simulation in R we sample from a list of marks with some probability associated with each aggregated mark:

> marks <- sample(LETTERS[1:5],100,prob=c(0.1,0.2,0.4,0.2,0.1),replace=T)
> marks
  [1] "C" "B" "D" "C" "B" "C" "D" "C" "E" "C" "C" "D" "C" "C" "A" "E" "C" "B" "B" "C" "A"
 [22] "C" "B" "B" "C" "C" "A" "B" "C" "D" "C" "B" "C" "C" "B" "B" "A" "C" "A" "E" "C" "C"
 [43] "A" "C" "B" "E" "D" "C" "A" "C" "C" "B" "C" "B" "D" "D" "D" "B" "C" "B" "B" "A" "B"
 [64] "D" "C" "D" "B" "B" "C" "E" "C" "B" "B" "B" "C" "E" "D" "C" "A" "A" "A" "B" "A" "E"
 [85] "D" "D" "B" "D" "B" "E" "E" "C" "C" "D" "B" "A" "C" "B" "D" "D"

Note that I am using LETTERS[1:5] to sample from A(1) to E(5) with probabilities prob=c(0.1,0.2,0.4,0.2,0.1) accordingly and with replacement (replace=T).

The histogram of this sampled space will look as follow

>barplot(table(marks),col=1:5,xlab="Mark",ylab="Number of students")


As noted, the majority of students achieved a C aggregate.

2 thoughts on “Sampling

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s