Random Variables

The first key concept in probability is called a random variable.
Random variables are a key concept – but since they’re a key concept of the
frequentist school, they are alas, one of the things that bring out more of
the Bayesian wars. But the idea of the random variable, and its key position
in understanding probability and statistics predates the divide between
frequentist and Bayesian though. So please, folks, be a little bit patient,
and don’t bring the Bayesian flamewars into this post, OK? If you want to
rant about how stupid frequentist explanations are, please keep it in the comments here. I’m trying to
explain basic ideas, and you really can’t talk about probability and
statistics without talking about random variables.

A random variable is an abstract representation of a measurable outcome of
a repeated experiment. A very typical example is die rolling: if you’re
rolling three common six-sided dice, a reasonable random variable would be the
sum of their faces. A random variable doesn’t have to be a single number:
roughly speaking, it can be anything that corresponds to a representation of
the outcome of the experiment. For example, a different random variable
for the dice-rolling experiment could be a a triple containing the values of each of the dice. (There’s actually a bit more to the requirements on
what makes a valid type of value for a random variable, but it gets pretty hairy, so I’ve just sticking with the intuition here.)

The random variable isn’t the outcome of a single experiment: conceptually, it’s a representation of the outcome an infinite series of identical trials. It’s the statistical definition of the concept of
outcome for the experiment.

A random variable is thus much more than just a number, or a set of numbers. It’s the carrier of the probabilistic properties of what you’re measuring. For example, in the 3-die rolling example above, when you roll
three dice, you can get a value ranging from 3 to 18, and the different
values can occur with different frequency. For example, there’s only
one way to roll a “3” (Rolling a one on each die: {(1,1,1)}); but there are
three ways to roll a 4 (2 dice with a one, one with a 2. But the 2 could be
any of the three dices: {(1,1,2), (1,2,1), (2,1,1)}).

That idea, that the range of values covered by the random variable
have different frequencies, is represented by something called a
probability distribution. (Strictly speaking, it could also be
a probability density function for a continuous random
variable, but I’m sticking with the simpler discrete version.) The
probability distribution basically describes the ratio of occurrences
of the different outcomes in an ideal, infinite set of trials. For example,
suppose we wanted to look at rolling two dice. (Three is too many possibilities to be easy to read.) Below is a table of
the possible outcomes of rolling two dice:

Sum Count Rolls
2 1 (1,1)
3 2 (2,1), (1,2)
4 3 (1,3), (2,2), (3,1)
5 4 (2,3), (3,2), (1,4), (4,1)
6 5 (1,5), (2,4), (3,3), (4,2), (5,1)
7 6 (1,6), (2,5), (3,4), (4,3), (5,2), (6,1)
8 5 (2,6), (3,5), (4,4), (5,3), (6,2)
9 4 (3,6), (4,5), (5,4), (6,3)
10 3 (4,6), (5,5), (6,4)
11 2 (5,6), (6,5)
12 1 (6,6)

The probability distribution is defined by the ratio
of the number I labelled as “count” in the table above to the
number of possible outcomes. Which is another way of saying that
if you took a random trial, and looked at the value of the random
variable, then the probability distribution tells you how
likely each possible outcome is to be the value of the variable. For
the table above, the distribution is:

Sum Probability
2 1/36
3 2/36=1/18
4 3/36=1/12
5 4/36=1/9
6 5/36
7 6/36=1/6
8 5/36
9 4/36=1/9
10 3/36=1/12
11 2/36=1/18
12 1/36

So if you randomly select trials of die rolling, that means that
you would expect that, on average, one out of every 36
rolls would have a sum of 2.

According to the simplest definition of probability (which is
now considered to be part of the frequentist school), that’s what
it means to have a probability of 1/36: given a series of trials,
on average, one out of every 36 trials would produce that result. (The
Bayesian version would be, roughly, if you look at a particular trial,
with no knowledge other than that it was a fair set of dice, you’d have
a certainty of 1/36 that the outcome would be a 2.)

Given the probability distribution for a random variable, you can
analyze its properties. For a simple example, there’s an idea called
the expectation of a random variable. Given a random variable x, the expectation E(x) is essentially a probabilistic mean. If the probability distribution of x is given by p(a) : a∈range(x), then
E(x)=Σa∈range(x): a×p(a).

So for our two-die rolling example, the expectation is
2*1/36 + 3*1/18 + 4*1/12 + 5*1/9 + 6*5/36 + 7*1/6 + 8*5/36 + 9*1/9 + 10*1/12 + 11*1/18 + 12*1/36 = 7.

Many of the interesting properties that we can study using probability
and statistics come from the probability distributions. We’ll see more
of that in later posts.

0 thoughts on “Random Variables

  1. Peter

    Typo: “… one out of every 36 rolls would have a sum of 1”. That `1′ should have been `2′ (or `12′?), I suppose 🙂

    Reply
  2. 6EQUJ5

    I took a statistics course in college where we learned definitions and formulas and did exercises, but I never got a real sense of probability concepts until an English professor gave me the assignment of writing about the game of craps. From the rules I learned how to work out all the probabilities of any event or sequence of events. Simulating a crap game was one of my first computer programs. Dice have served me well in illustrating probability concepts. A deck of cards does not work as well, for some reason. (Because every card has rank and suit, and in some games the suits have ranks, while a die has a single value and a single dimension?)

    Reply
  3. Benoit Essiambre

    I don’t think the concept of a random variable is exclusively frequencist. Some Bayesians are itched by the word random which can seem to imply that the things or events themselves are random. Some avoid the term and simply say things like “variable (or parameter) (or even set of variables) with a probability distribution. But I say use the term if it makes things clearer.

    Reply
  4. John Armstrong

    Benoit, that’s a nice observation. Maybe we could avoid it altogether if we just recognized that probability is just a different language for measure theory and call it a “function” instead of a “variable”, random or otherwise. Then a “probability distribution” is a “measure” as it should be.

    Reply
  5. John Armstrong

    As an addendum to my previous point…
    6EQUJ5: the problem with using dice is that we are wedded to the idea that each side has a “value”. What’s really going on with a die is a space of six events
    {A,B,C,D,E,F}
    and a measure assigning weight 1/6 to each of them. Then we have a function (sorry, “random variable”)
    v(A)=1, v(B)=2, v(C)=3, v(D)=4, v(E)=5, v(F)=6
    but we could have different functions (“random variables”) giving different values to the six different events. The important thing is the the value isn’t inherent to the event, but comes along later. The identification of value with event seems to screw people up more than any other misconception.

    Reply
  6. Canuckistani

    …or more accurately, a concrete example of the problem pointed out by John Armstrong.
    It definitely itches me that the jargon “random variable” refers to something that is neither necessarily random in the colloquial sense, nor a variable.

    Reply
  7. Rich

    I think the issue is that if you start talking about random variables as functions, before you know it you’re armpit-deep in sigma-algebras and Lebesgue-Stieltjes integrals and all the math-phobic undergrads (and maybe even a few grad students as well) have fled for greener pastures.
    For teaching purposes at an introductory level, it’s probably worth pointing out that a) you’re glossing over a lot of analysis when you move from the event to the probability, b) the assignment of numbers to events is in many ways totally arbitrary, and c) that they’re just going to have to trust you as far as the CLT, LLN, and other such limiting results are concerned, and leave it at that.

    Reply
  8. nitpicker

    die = singular
    dice = plural
    “dices” = oops
    “die rolling . . . sum of 2.” and some others = better check usage.

    Reply
  9. spudbeach

    Benoit Essiambre (#4):
    Thanks for pointing out that random variables can be a bayesian thing too. I’m a bayesian and I use “random” all the time, as it captures what I as a statistician want to talk about — our knowledge about a given situation. The number of white cars in the parking lot is a (non-degenerate) random number, until I go out and look.
    As to whether the notion of “random” captures “true randomness”, I would say that “true randomness” means “a physical situation in which is it absolutely impossible to predict the outcome”. Some people I’ve had this conversation with say that there is no such thing as true randomness, that with enough knowledge, we can theoretically predict everything. To that, I can point to quantum mechanics. Bose-Einstein condensates are proof that atoms are identical, with no hidden variables, but even then, radioactive decay of individual atoms is still random.
    Einstein was wrong: God really does play dice with the universe. (Or the universe plays dice with God — and I can’t tell who’s winning!)

    Reply
  10. Carl^2

    I second what John Armstrong said–speaking for myself, after two undergraduate courses in Probability/Statistics, I still didn’t understand what a random variable was. (Although I did understand how to manipulate them!) Only when I took graduate-level Real Analysis did I learn that a “random variable” was just a normalized finite measure. This was a definite “aha!” moment for me–suddenly a lot of things in both analysis and probability made a lot more sense!
    Oh, and @Radiohead:
    There are a lot of ways of looking at the expected value formula–different people find different ones intuitive. My favorite is to think of expected value as the balance point of a see-saw, where the beam of the see-saw is a number line weighted by your random variable. Note that the fulcrum isn’t the point where you have equal weight on both sides (that would be the “median”), because stuff that’s further away has a bigger lever arm.

    Reply
  11. Mark C. Chu-Carroll

    Radiohead:
    There’s a simple intuition behind the expectation: if you took a long series of values, the mean of that series would converge on the expectation.
    Look at the example. If you ran 36 trials, and those trials perfectly matched the probability distribution, then you’d have a series with one 2, 2 threes, 3 fours, etc. If you write out the computation of that series, and then you write out the expansion of the expectation formula, the two expressions will be the same.

    Reply
  12. AJS

    @ Radiohead: Nothing about probability is intuitive, otherwise Camelot wouldn’t be in business.
    @ Mark: Is there any good reason to reduce your fractions to their lowest terms here? It seems to me that leaving “6/36” &c. not only reduces the instance of typos, but also makes it easier to compare the probabilities of events within the same space.
    (Yes, I have come out as a staunch decimalist before, but even I recognise there’s no way it’s going to work neatly with 36 for a denominator.)

    Reply
  13. Joe Marshall

    “…that’s what it means to have a probability of 1/36: given a series of trials, on average, one out of every 36 trials would produce that result. (The Bayesian version would be, roughly, if you look at a particular trial, with no knowledge other than that it was a fair set of dice, you’d have a certainty of 1/36 that the outcome would be a 2.)”A Bayesian version wouldn’t mention trials. He’d say something more along the lines “in 1/36 of the space of possible die configurations the sum of the dots is 2.”The difference is subtle, but it is one of the key points about Bayesianism: Bayesians don’t consider probability to be defined in terms of `experiments’.

    Reply
  14. Peter

    Why are they called Random Variables?
    I mean, they aren’t random and they aren’t variables. The OUTCOME may be random (did you roll a 10?) but the random variable is the mapping of the events to the probabilities, so why are they called that?

    Reply
  15. J

    spudbeach,
    For those less informed but still curious, what’s the difference between a degenerate and non-degenerate RV?

    Reply
  16. Canuckistani

    J,
    A degenerate random variable is constant. A frequentist would say that it takes the same value in every repetition of the experiment. (If you toss a marble instead of a coin, on which side will it land? The outside — every time.) A Bayesian would say that it’s a variable known with perfect accuracy.

    Reply
  17. spudbeach

    John Marshall: “Bayesians don’t consider probability to be defined in terms of `experiments’.”
    Well said. On the other hand, experiments sure go a long way towards giving knowledge! In that respect, any time a frequentist uses the term “probability”, I think a bayesian would agree.
    Peter: “Why are they called Random Variables?” I would say because the outcomes are random. It’s just so much easier to say than “mathematical symbol for a class of observations with uncertain outcome”.

    Reply
  18. Frederick Ross

    It’s called a random variable because that’s what the French mathematicians called it, and English speaking mathematicians learned probability from the French. The name is problematic because it’s actually a measurable function on a measure space, but English speaking mathematicians learned real analysis, and all the names that go with it, from the Germans. For instance, the sigma in sigma-algebra stands for Summe, the German word for ‘union’ in set theory. This is what the mathematical physicist who taught me probability claimed, and considering he’d give definitions in both those languages instead of English when the vocabulary lined up more sensibly (CADLAG functions anyone?) he probably knew what he was talking about.

    Reply
  19. Doug Spoonwood

    I feel that fuzziness vs. probability gets off onto a rather different topic. I find that interesting in itself… especially as to how some people somehow think of fuzziness as a form of probability, which I simply don’t get… but I don’t think fuzziness addresses the post here.
    I do know random variables as fundamental for most developments of probability theory. However, I doubt that in principle (as Mark suggests) random variables work as all that fundamental in reality. After all, even though I don’t proclaim to understand most of their book, Schweizer and Sklar write “In our development we follow Menger and work directly with probability distribution functions rather than random variables. This means that we take seriously the fact that the outcome of any series of measurements of the values of a nondeterministic quantity is a distribution function; and that a probability space, although an exceedingly useful mathematical construct, is in fact and in principle unobservable.” _Probabilistic Metric Spaces_ p. x of the introduction.

    Reply
  20. Mark C. Chu-Carroll

    I don’t say that random variables are fundamental to reality; I say that they’re fundamental to the basics of probability theory. There’s a huge difference there.
    I don’t think that you can go out into the world, and find real continous topological spaces. I don’t think that natural transformations of categorical constructs are in any way fundamental to reality. But both topological spaces and natural transformations do have real applications in understanding the real world. Similarly, I don’t think that there’s any such thing as a true random variable in nature; among other things, the idea of a random variable is tied to infinite reproduceable series, and neither the infinite nor the reproduceable really ever happen in the physical world. But as a mathematical abstraction that can be used to describe something that could be used in the real world, they’re useful.

    Reply
  21. spudbeach

    Yes, random variables do occur in nature: quantum mechanics is full of them. In fact, it can’t be explained without them.
    The examples are legion, but here are two:
    First, the decay mode of a radioactive isotope. Consider Uranium 238 — My handy reference (Kaye and Laby) gives it a 74% chance of decaying with an alpha particle with an energy of 4.20 MeV, a 23% chance of an 4.15 MeV, and the remainder of spontaneous fission. Those nuclei are _identical_ and can not be distinguished in any way. The energy of the alpha particle is thence a random variable in the truest sense, and given the number of U238 atoms out there, it gives as close to an infinite number of trials as anybody could like.
    Second, the position of a diffracted electron through a double slit apparatus. In aggregate, there are absolutely beautiful pictures of wave-like interference. But in terms of a single electron, the position striking the target is clearly random. (See the beautiful pictures at http://en.wikipedia.org/wiki/Double-slit_experiment )
    So, is there true randomness in nature? Are random variables / distributions crucial to understanding the world? Yes. Similar ideas are at work in the silicon you’re using to read this right now.

    Reply
  22. Thony C.

    For instance, the sigma in sigma-algebra stands for Summe, the German word for ‘union’ in set theory. This is what the mathematical physicist who taught me probability claimed, and considering he’d give definitions in both those languages instead of English when the vocabulary lined up more sensibly (CADLAG functions anyone?) he probably knew what he was talking about.

    The sigma for sum comes from the Latin summa, the German for union in set theory is Vereiningung.

    Reply
  23. Thony C.

    A couple of further comments on the history of mathematical symbols for sum; Mengenlehre (set theory) was first developed in the last quarter of the 19th Century by Georg Cantor the introduction of Σ goes back to Euler in 1755.

    The sign Σ for summation is due to L. Euler (1755), who says “summam indicabimus signo Σ.”
    Original footnote: L. Euler, Institutiones calculi differentialis (St. Petersburg, 1755), Cap. I, § 26, p. 27.
    Florian Cajori, A History of Mathematical Notations, Dover reprint Two Volumes Bound As One, New York 1993, Vol. II § 438, p. 61.

    The long drawn out “s” ∫ for integral, which of course is also a symbol for summation, was introduced by Leibniz who first used it in a manuscript from 1675. (Cajori, Vol. II, §544, p. 187.)
    As already mentioned both the English word sum and the German summe are derived from the Latin summa which is itself a form of the Latin word summus which means highest as in the English word summit. The meaning of sum or total is thought to derive from the habit of counting piles of coins from the bottom to the top.

    Reply
  24. Mark C. Chu-Carroll

    spudbeach:
    The examples that you give are of random events, but they’re not random variables. There’s an important but subtle difference.
    A random variable is an infinite sequence of identical independent events with a range of possible outputs. Two key words in that: “infinite” and “sequence”.
    The decay time of a particular atom of uranium is something which can be described by a probability distribution, which (in the frequentist school) is formally modeled by a random variable. The random variable is just an abstract model, which has properties that simply don’t exist in the real world. We don’t have all of the uranium atoms in the universe lined up so that we can measure their decay times in a sequence. We don’t have an infinite supply of them. And we don’t have a universe in which these things are fully controlled and independent. (For example, the decay of one atom can produce particles which strike other atoms, and change them.) The whole idea of the random variable is purely mathematical abstraction which can never really exist outside of the realm of abstraction.
    To give an example of what I mean… The set of computer programs is countably infinite. I know that. Mathematically, I can prove that. But there’s no such thing as a real infinite set of computer programs. It’s a mathematical ideal which is useful as a simplified way of describing a particular piece of reality. It’s useful, because the real world of programs is complicated. If I wanted to make statements about what programs exist and what they can do, but I wanted it to be real, I would need to add all sorts of intricate bounding measures to every statement and every proof – and that would be prohibitively complex. So I throw away the irrelevant complexities, create an abstraction that focuses on the things I want to study, and work with that.
    The random variable is something like that. It’s an abstraction of the properties of probabilistic phenomena that lets us study just the probabilistic properties.

    Reply
  25. Boian

    Mark,
    I don’t usually comment, but I feel like I have to take you to task. You say that:
    “A random variable is an infinite sequence of identical independent events with a range of possible outputs. Two key words in that: “infinite” and “sequence”.”
    Where does this come from? It makes no sense to me. A random variable is simply a measurable function from the probability space to the real numbers or some other suitable space. Where’s this sequence of independent events that you talk about?
    Also, I’ve never heard of “random event”. All events are (measurable) subset of the sample space. As subsets they don’t change. It’s only the assignment of probabilities that may change. What makes an event random?
    I think what you are trying to get at are results like the Law of Large Numbers or Central Limit Theorem, which involve summing sequences of iid random variables. Those, however, are results about them, and are not part of the definition of random variables.
    Or are you simply trying to talk about the difficulties in determining the distribution of a random variable? Again, that’s not really the definition. We don’t need to know a random variable’s distribution in order to talk about it.
    I feel that spudbeach was accurately describing a random variable. And you know several semesters of probability and stat classes, I’ve never heard anyone make mention of “frequentist” or “bayesian”. I’m not sure how central that debate is to probability theory.
    Finally, someone above asked why a random variable is thus named, and not simply called a function. I think it’s because we often forget about the sample space. We know it’s there, but almost never refer to it. So, we don’t think of random variables so much as functions but as variables. For instance, call the outcome of a die roll x. I don’t know what it is; it takes on different values with different likelihoods. The reason why we often forget about the sample space is that it is often way too complicated and the probability measure on it not that helpful. (I have you random walks and stochastic processes in mind!!) On the other hand, there have been a few situations, where the only way I could make sense of things is to think of a random variable as a function.

    Reply
  26. TruePath

    I kinda think this whole “I’m a Bayesian” or “I’m a frequentist” approach is pretty misguided. Presumably they are either adequate explanations of probability or they are not. Besides they don’t even really do the same thing. One offers a definition of probability the other tells you how you should approach probability.
    Anyway this having been said there is plenty of literature on the philosophy of probability and it makes it quite clear that you can’t define the probability of an event based on it’s frequency. The most obvious point is that we go around assigning probabilities to one off events where talking about a limiting frequeny doesn’t make sense. You might try and claim that it is some kind of idealized frequency if we imagine duplicating the whole world and trying again. But that isn’t quite right nor is it very illuminating.
    In short I think an attempt to define the non-mathematical notion is so difficult as to be essentially hopeless. When doing the math we need to just take the formal notion as an uninterpreted concept and compute with it keeping this fuzzy (a coin is 50/50) type model in the back of our heads as the intended interpratation.

    Reply
  27. Doug Spoonwood

    Mark,
    Sorry, I wrote poorly. I would better put what I meant to say as “I do know random variables as fundamental for most developments of probability theory. However, I doubt that in principle (as Mark suggests) random variables work as all that fundamental FOR THE DEVELOPMENT OF PROBABILITY THEORY.” In other words, we can in principle do probability theory without random variables. I don’t know how this works, as I haven’t read all of Schweizer’s and Sklar’s _Probabilistic Metric Spaces_ or Menger. But, unless I’ve read them wrong, they claim exactly that: one can work with just probability distribution functions instead of random variables.
    Thony C.,
    [As already mentioned both the English word sum and the German summe are derived from the Latin summa which is itself a form of the Latin word summus which means highest as in the English word summit. The meaning of sum or total is thought to derive from the habit of counting piles of coins from the bottom to the top.]
    This etymology provokes me to think that people did think of… or we can think of regular summation as a sort of optimization problem. If one adds 4 and 3 as disjoint sets, one takes the greatest possible number of objects obtainable from those two sets as the sum.

    Reply

Leave a Reply