Sunday, July 21, 2013

Basic Statistics terminology



          Random Experiment:  experiment where outcome cannot be predicted with certainty.

As an example, you are getting dressed in the morning, you are in a rush and instead of picking socks from your drawer you randomly grab a pair of socks without looking. In this case, your picking of the socks is a random experiment.
 
          Sample space: collection of all outcomes of the experiment (SS)

Your sample space (SS) is your drawer, it doesn’t contain all the socks in the world but has the collection of all the outcome of the experiment.

          An Event: an outcome from one experiment (E)

An event would be the socks you ended up picking.

          A Sample:  a collection of events from repeated trials/experiments (S).

If you randomly pick socks for a week those seven pairs of socks would be a sample.

          Random variable:  function X(E) assigning one real value to each element from a sample space  ( features? )

A random variable for the socks could be a description of the socks like its style (athletic or business type socks) or the color.  

         Probability:  p = P(E), a number assigned to event representing the fraction of experiments resulting with the event/outcome A ==> p(x) = P(x = X), the fraction of time a random variable x = X

In the sock example, you are planning on wearing black pants so you want black socks, so you would want to know the probability of getting a pair of black socks. Let’s say you have 20 pairs of socks and 11 of them are black so p(x=black) would be .55. 

         Probability Density Function: f(x) = p(x=X) on SS.


My sock example falls apart when I start describing pdfs but these are normally represented as graphs showing the probability of something happening based on different outcomes. Think of a grades in class, so 100 students take a test 10 get As on the test, 20 get Bs, 45 get Cs, 15 get Ds and 10 get Fs. The pdf would have p(x=A) = 0.1, p(x=B) = 0.2, p(x=C) = 0.45, p(x=D) = 0.15, and p(x=F) = 0.1.

 

         Probability Distribution Function: F(x) = p(x<=X) on SS.

A probability distribution function also called a cumulative density function (cdf) shows the probability of something happening cumulatively as the values increase so for the grades example if F is the lowest grade and A the highest, p(x <= F) = .1 (notice stays the same), p(x <= D) = .25 (or p(x=D) + p(x=F)), p(x <= C) = 0.7 (or p(x=C) + p(x=D) + p(x=F)), p(x <= B) =0.9 (or p(x=B)+p(x=C) + p(x=D) + p(x=F)), and p(x <= A) = 1.0 (or cumulative value of all the outcomes)




         In summary, shown in the bottom figure, a sample space (SS) is all possible outcomes, a sample (S) is made up of experiments (E), the sample is used to model the SS with variables (X) to form a pdf (f(x)) 

No comments:

Post a Comment