Fuzzy Logic vs Probability

30 Replies

In the comments on my last post, a few people asked me to explain the difference between fuzzy logic and probability theory. It’s a very good question.

The two are very closely related. As we’ll see when we start looking at fuzzy logic, the basic connectives in fuzzy logic are defined in almost the same way as the corresponding operations in probability theory.

The key difference is meaning.

There are two major schools of thought in probability theory, and they each assign a very different meaning to probability. I’m going to vastly oversimplify, but the two schools are the frequentists and the Bayesians

First, there are the frequentists. To the frequentists, probability is defined by experiment. If you say that an event E has a probability of, say, 60%, what that means to the frequentists is that if you could repeat an experiment observing the occurrence or non-occurrence of E an infinite number of times, then 60% of the time, E would have occurred. That, in turn, is taken to mean that the event E has an intrinsic probability of 60%.

The other alternative are the Bayesians. To a Bayesian, the idea of an event having an intrinsic probability is ridiculous. You’re interested in a specific occurrence of the event – and it will either occur, or it will not. So there’s a flu going around; either I’ll catch it, or I won’t. Ultimately, there’s no probability about it: it’s either yes or no – I’ll catch it or I won’t. Bayesians say that probability is an assessment of our state of knowledge. To say that I have a 60% chance of catching the flu is just a way of saying that given the current state of our knowledge, I can say with 60% certainty that I will catch it.

In either case, we’re ultimately talking about events, not facts. And those events will either occur, or not occur. There is nothing fuzzy about it. We can talk about the probability of my catching the flu, and depending on whether we pick a frequentist or Bayesian interpretation, that means something different – but in either case, the ultimate truth is not fuzzy.

In fuzzy logic, we’re trying to capture the essential property of vagueness. If I say that a person whose height is 2.5 meters is tall, that’s a true statement. If I say that another person whose height is only 2 meters is tall, that’s still true – but it’s not as true as it was for the person 2.5 meters tall. I’m not saying that in a repeatable experiment, the first person would be tall more often than the second. And I’m not saying that given the current state of my knowledge, it’s more likely than the first person is tall than the second. I’m saying that both people possess the property tall – but in different degrees.

Fuzzy logic is using pretty much the same tools as probability theory. But it’s using them to trying to capture a very different idea. Fuzzy logic is all about degrees of truth – about fuzziness and partial or relative truths. Probability theory is interested in trying to make predictions about events from a state of partial knowledge. (In frequentist terms, it’s about saying that I know that if I repeated this 100 times, E would happen in 60; in Bayesian, it’s precisely a statement of partial knowledge: I’m 60% certain that E will happen.) But probability theory says nothing about how to reason about things that aren’t entirely true or false.

And, in the other direction: fuzzy logic isn’t particularly useful for talking about partial knowledge. If you allowed second-order logic, you could have fuzzy meta-predicates that described your certainty about crisp first-order predicates. But with first order logic (which is really where we want to focus our attention), fuzzy logic isn’t useful for the tasks where we use probability theory.

So probability theory doesn’t capture the essential property of meaning (partial truth) which is the goal of fuzzy logic – and fuzzy logic doesn’t capture the essential property of meaning (partial knowledge) which is the goal of probability theory.

30 thoughts on “Fuzzy Logic vs Probability”

guy February 2, 2011 at 6:59 pm

Very interesting post, thank you. I’d be very interested in any pointers you could provide to good introductory reading to follow up your overview.

Forgive me if this is an idiot question, but is there a branch of logic which deals with *concurrent* correctness? Just as you differentiate ‘partial knowledge’ and ‘partial truth’, aren’t there completely known, completely true, BUT differently-valued simultaneous statements?

I’m not experienced to come up with a strong example – the following feels a bit weak – but what I’m getting at is

a) sqrt(2) > 0
b) sqrt(2) < 0

Both meaningful, both true.

ie A logic in which a variable can be simultaneously multivalued – not in the sense that it's unknown, uncertain, inaccurate, fluctuating, continuous or indefinite but that it might have non-equal but completely defined, discrete values at the same time? It seems to me, both intuitively and mathematically that non-exclusivity should be a very common case, but I've never really seen techniques for propagating multiple-valued variables through a calculation… Is this nonsense? 😉

Loading...

Reply ↓
1. MarkCC Post authorFebruary 2, 2011 at 8:10 pm
  
  In fact, what you’re describing is just standard first order predicate logic.
  
  Square root is, in terms of logic, a predicate, something like Square(x, root). So Square(2, 1.41421). In the statement Square(2, x), there are two possible values of x for which the statement is true.
  
  The default assumption, in predicate logic, is that there can be multiple values that satisfy the same predicate. The variable isn’t multivalued, but variables aren’t what you care about in logic: what you care about is true statements – and the statement can be instantiated for multiple values. So you always assume that multiple values are possible, unless there’s a specific statement or proof that they aren’t.
  
  Loading...
  
  Reply ↓
  1. guy February 3, 2011 at 1:44 pm
    
    Aha, i see. Very much appreciate your time explaining this so clearly – I’m embarrassed to have missed such a foundational assumption but as you may imagine, this clarifies *several* points of confusion for me! Thank you.
    
    Loading...
    
    Reply ↓
Janne February 2, 2011 at 7:25 pm

I’m way out on my deep end here, but if you interpreted fuzzy logic statements as inference about degrees of belief, wouldn’t that correspond very closely to Bayesian inference?

Loading...

Reply ↓
Arlenna February 2, 2011 at 7:46 pm

This sounds like wave/particle duality and atomic/molecular orbitals: electrons can be in certain places, and are more or less likely to be in regions of those places depending on things like dipoles, polarity and the inductive effect, but “an electron” is in all of those places at once (as an oscillating wave) and found as a particle where ever you look because you’ve pinpointed it just by defining it.

Loading...

Reply ↓
Eric 'Siggy' Scott February 2, 2011 at 9:34 pm

Thanks for the follow up!

I’m with Janne — what makes the different systems mathematically distinct? I’ve been under the (perhaps erroneous) impression that there’s debate over whether the systems are usefully mathematically distinct, or if Fuzzy Logic is just a handy shorthand.

Maybe a more direct question would be the difference between probability theory and Zadeh’s *possibility* theory, an extension of Fuzzy Logic. Wikipedia happily delivers the details, but not the big picture :-/.

Loading...

Reply ↓
SeanH February 2, 2011 at 11:05 pm

In comments to the other post someone said that probability can be viewed as a generalization of propositional logic, but I have trouble actually seeing how you can reproduce two-valued logic in the formal framework of probability theory (which would be a requirement for it to be a generalization I suppose). Can anyone help?

Loading...

Reply ↓
1. A Mustill February 3, 2011 at 5:25 pm
  
  It’s not too difficult. Consider the syllogism
  A=>B
  A
  therefore B
  and let C = (A=>B) for conciseness.
  
  We want to show that p(B|A&C)=1.
  
  Using the rule from probability theory
  p(X&Y|Z) = p(X|Z)p(Y|X&Z)
  we have
  p(B|A&C) = p(A&B|C)/p(A|C)
  Now, p(A&B|C)=p(A|C) by the syllogism, and so
  p(B|A&C)=1
  so if A and A=>B are both true (p=1), then B is true.
  
  This argument is from Jaynes (2003) “Probability Theory: the logic of science”. Now I think about it, I wonder if it isn’t circular: are we assuming the answer in the line p(A&B|C)=p(A|C)?
  
  Loading...
  
  Reply ↓
2. João Neto February 4, 2011 at 6:12 am
  
  Check Richard Cox’s “Algebra of Probable Inference”
  
  Loading...
  
  Reply ↓
Dave Tweed February 3, 2011 at 1:53 am

I’d say the extend to which I agree or disagree with you depends on what you mean by “say”. AIUI, the key problem inspring fuzzy logic is suppose you and I see someone who’s 6 foot tall and we later need to describe that person. You might say he’s “tall” and I might say he’s “medium height”. To do any reasoning about this, you need to convert what people say into some model about reality. You can frame everything in terms of probability, adding a distribution over what height someone who’s described as “tall” by “some general person” may actually be. There’s no logical problem incorparting this into any probabilistic calculations, but it’s almost impossible to keep track of which “probabilistic uncertainty” has arisen from the “fuzziness of labelling” and which from other sources. The goal of fuzzy logic is to come up with rules of various kinds (eg, always correct rules, sometimes correct rules) making a clear distinction between partial knowledge and other uncertainties distinct from the labelling fuzziness. And that’s partly because a goal is “human understanding”, or in other words what we “say”. It’s not a difference in terms of the “input modelling” as much as what kind of “output” we want.

Loading...

Reply ↓
1. Eric 'Siggy' Scott February 3, 2011 at 11:18 pm
  
  An example: In an AI class I TA’d last semester, I had the students program a naive Bayes classifier for a 20 Questions game. Since to some people a turtle is “green” and to others it’s “brown,” the classifier learned from a database of trials the probability that the question “is it green” would be answered True if the item was a turtle — a highly relevant number when inferring the item from a set of questions answered!
  
  In this case, then, we’re using Bayesian probability to do Fuzzy Logic’s job, no?
  
  Loading...
  
  Reply ↓
stigant February 3, 2011 at 1:42 pm

I’m having trouble understanding the Bayesian point of view.
“To say that I have a 60% chance of catching the flu is just a way of saying that given the current state of our knowledge, I can say with 60% certainty that I will catch it.”

This seems almost circular to me. What does it mean to say, with 60% certainty, that I will catch the flu? I’m apparently a frequentist since the only way I see to give meaning to that statement without referring to probabilities is that if I could somehow create 100 universes with the same initial conditions as right now, in 60 of them I would catch the flu.

Loading...

Reply ↓
1. MarkCC Post authorFebruary 3, 2011 at 3:14 pm
  
  I think the trick to understanding Bayesian probability comes from Bayes theorem.
  
  Bayes theorem basically gives you a way of saying “With this amount of knowledge, I’ve got a certainty of X that even E will happen. Now, if I get an additional bit of knowledge K, here’s how I can update my assessment.
  
  The frequentist approach is based on the idea that there’s really a fundamental probability associated with an event. The Bayesian approach is that you’re assessing your current state of knowledge, which is always subject to being updated.
  
  In the frequentist approach, you’d say that there is a fundamental probability of a random person catching the flu, based on the idea of a repeated experiment. But the probability is fundamentally a property of the event.
  
  In the Bayesian approach, the probability is fundamentally a property of our knowledge. So one possible input is that given a pool of 100 people, observationally, 6 out of 10 will catch the flu. So an initial probability would be 0.6. But then, you might discover that I had someone with the flu sneeze on me on the subway. So you’d update that probability in a specific mathematical way to come to a new prediction. The original probability estimate wasn’t wrong; it was incomplete. As we gain additional knowledge, we can fold it into our predictions – and those predictions will, eventually, converge on the ultimate “yes” or “no” when the event either does or does not occur. Our knowledge is almost always incomplete, and Bayesian probability gives us some tools for working with that.
  
  The difference is really philosophical – whether the probability is really an intrinsic property of the event, or just an assessment of how much we know. In reality, they’re both useful and valid ways of seeing things, and they’ve each got places where they’re useful.
  
  Loading...
  
  Reply ↓
  1. stigant February 3, 2011 at 3:31 pm
    
    Ok, but I’m still having trouble defining what we mean by the probability without resorting to a thought experiment involving multiple trials and expected outcomes. In your example, you say that 6 out of every 10 people will catch the flu. So you have 60% chance of catching the flu. Now, we get more information (you were sneezed on). Bayes’ formula allows us to compute that P(getting flu | you were sneezed) = P(sneezed on|got flu)*P(got flu) / P(sneezed on). But the meanings all of these probabilities still must rest on something like out of 100 people, 50 have been sneezed on etc, and ultimately, if I update my probability of you getting the flu after finding out you were sneezed on to be 80%, that still has to mean that out of 100 sneezees, 80 got the flu. Having the formula makes this update easier to accomplish, but I still don’t see how it differs, philosophically or otherwise, from the frequentist point of view.
    
    Loading...
    
    Reply ↓
    1. phil February 3, 2011 at 8:55 pm
      
      Maybe a different example makes it clearer: If you do an experiment to measure, say, the mass of an object, then in the Bayesian view, you can assign a probability distribution to the mass, and make statements like “the probability that that mass is between x and y is z%” (reflecting your degree of belief about the mass). In the frequentist view, you can’t make that statement – the mass really is a certain true value, so there is no probability distribution over the mass.
      
      Loading...
      
      Reply ↓
      1. Eric 'Siggy' Scott February 3, 2011 at 11:12 pm
        
        But we can still say that, in an equivalence class of similar objects and similar measurements, the (frequentist) probability that we measure the mass to be between x and y is z%?
        
        In many scenarios such a class doesn’t exist in practice. Which is why we prune our big joint distributions into Bayesian networks with bite-sized chunks we can actually train, and use them to infer the probability of variable combinations that have never actually been seen.
        
        Is this a sufficient description of the difference? In cases where we must use Bayesian methods, it’s because the equivalence class doesn’t exist outside Plato’s ideal world?
        
        Loading...
Yiab February 3, 2011 at 2:25 pm

I’m wondering what the connection is (if any) between fuzzy logic and first-order continuous logic (i.e. logic where statements are assigned “truth values” in the real interval [0,1] – the connectives are weird).

Loading...

Reply ↓
lily February 3, 2011 at 5:38 pm

So can you have both?
e.g. the truth value of a certain event having a probability of 60% is .5?

Loading...

Reply ↓
Chris February 3, 2011 at 6:01 pm

Hey MarkCC, can’t you describe the difference by the construction of fuzziness? The fuzzy-logic-systems I know use a fuzzy membership relation, meaning (roughly) the relation gives you not only x in X or not, but can also tell you to what extend this membership is: 1/3 membership and 2/3 non-membership. Using this fuzzy set theoretic construction, creating a logic upon it yields “probability”-like memberships and statements.

The main difference then lies in the inherent meaning, telling you to what extent and not the probability of membership, whereas the statements in the logic usually are interpreted as probablities.

Loading...

Reply ↓
Cyan February 4, 2011 at 1:04 am

@SeanH

Probability can be viewed as a generalization of propositional logic in the sense that it reduces to it when the only probabilities in play are 0 and 1. Then intersection of subsets of the sample space map onto AND, union maps onto OR, and complementation maps onto NOT. In fact, we can derive probability theory by requiring (among other desiderata) that this correspondence must hold; see Cox’s theorem .

@stigant, Eric ‘Siggy’ Scott

Bayesian probability can also be derived from a Dutch book argument that provides an operationalization of probability free of reference classes. If you want to get really fancy, you can follow L. J. Savage’s approach of axiomatizing rational behavior and derive subjective expected utility.

@lily

Googling “fuzzy random variables” turns up results that seem relevant, and might also answer Janne and Eric ‘Siggy’ Scott’s questions about the difference between fuzziness and probability. I tried to read the first result several years ago, but I didn’t get far, so I can’t say more.

Loading...

Reply ↓
Pingback: Logic: Cold and Fuzzy | Dangerous Intersection
Doug Spoonwood February 6, 2011 at 8:43 am

We also have fuzzy probabilities. Like the probability that I will walk x steps tommorow, which even though we know that the event will happen or not, we only know certain things which influence how many steps we walk up to a certain degree with a margin of uncertainty on each factor. So, what we actually know comes out as a fuzzy probability best expressed by a fuzzy number.

Loading...

Reply ↓
Darrell Plank February 8, 2011 at 2:51 pm

I think the Bayesian view is best illustrated by asking whether the n’th decimal digit of pi is 5, where n is larger than the highest digit of pi currently known. Most people would say that the probability is 1/10 but it’s hard to imagine a frequentist interpretation for this. Nobody imagines that we’re going to do several experiments and in 9/10 of them, that digit turned out to be something other than 5. You could talk about the probability of a general digit being 5 and say that if you pick digits at random, they end up being 5 about 1/10 of the time, but a frequentist interpretation for a specific digit just doesn’t make much sense. In that case you’re pretty much forced to say that our current state of knowledge allows us to be 90% certain that it’s not a 5. As soon as that digit is verified to be, say, 6, then the probability goes to 100% that it’s not 5. As our knowledge changes, so do the probabilities. A frequentist would, I think, turn his nose up at the question altogether as an improper one.

On the other hand, if you ask what the phrase “90% certain” means, I think I’d be forced to say that what it means, essentially, is that given several “similar” trials your prediction would be right 90% of the time. In that sense, the two views really aren’t all that different – it’s more a matter of emphasis than any mathematically rigorous distinction. In the flu case, being “60% certain” really means (at least in my mind) that given similar situations where you have similar knowledge about relevant facts such as the number of people around you who are catching the flu, you’d expect that in about 60% of those cases you yourself would catch the flu.

In passing, this is where I think Heisenberg uncertainty becomes very weird. I have a friend who claims to be very comfortable with it because large scale phenomena are probabilistically determined by low level Heisenberg uncertainty. I find it very unsettling because it posits a totally new type of probability – a “real” or “pure” probability that is independent of our knowledge of the situation rather than the type of probability that we’re all used to which is based on our lack of knowledge about current wind currents, temperatures, force applied, etc. which force us to assess the chances on a coin flip to be 1/2 when in actuality the result is predetermined. Dealing with this totally new type of probability and naively expecting our old notions 0f probability to apply to it seems unsettling to me. I’m no genius at low level physics, so maybe I misinterpret. I know it works, which is, I think, what Feynman said. Don’t try to understand, just apply the equations and observe that the results match up to reality. Maybe I’m exactly where I ought to be given that Feynman said that if you’re not uncomfortable with it, then you don’t understand it.

Sorry for the digression.

Loading...

Reply ↓
Xahid July 20, 2011 at 10:49 am

Let consider a word SMARTNESS. How can we measure SMARTNESS? It is fuzzy concept. There is no clear boundary for measuring this. It is linguistic matter. We can not categorize this. If we do this,we will loss actual information. We should use membership function Zadeh said well… ‘Every probabilty measure should based on fuzzy logic’.

Loading...

Reply ↓
maria January 23, 2012 at 10:07 am

“Both Probability and fuzzy logic are same in concept. So we can say that probability and fuzzy logic are both same.”
plz give your comments…….thanks

Loading...

Reply ↓
1. Tony May 24, 2013 at 9:56 pm
  
  A ridiculous statement.
  
  Loading...
  
  Reply ↓
2. Aiman November 3, 2013 at 6:45 am
  
  fuzzy logic is related to the degree of truth eg we say if a person has 6 ft height then he is tall. now if a person will have 5’11” height then we will he is tall up to that degree eg 0.9 degree etc. and probability is that how much percent people have 6 ‘ height.
  
  Loading...
  
  Reply ↓
Shawn May 17, 2013 at 9:26 am

In my opinion (and many others) Fuzzy Logic is nothing than a way of thinking of degree of truth of statements. Let’s take the example in this post: “degree of tallness”. To represent the degree of tallness of two persons, A and B, with 2.5 meters and 2.0 meters respectively, one may simply use the ratio of heights: S1 = “A is tall”, S2 = “B is tall”, S3 = “the degree of truth of S1 is 1.25 times higher than the degree of truth of S2”.

Loading...

Reply ↓
Tony May 23, 2013 at 11:03 pm

I have never confused the two because fuzzy sets provide a discrete and exact measurement of what is, and probability provides a means of quantifying what may be in the future. Also, “60%” is a notation for a percentage, NOT a probability, which would be “.6”.

Loading...

Reply ↓
Gary September 9, 2013 at 2:15 pm

“With this amount of knowledge, I’ve got a certainty of X that even E will happen. Now, if I get an additional bit of knowledge K, here’s how I can update my assessment.”

The difficulty I have with this statement is that Bayes theory appears to measure “knowledge” in terms of a probability. This probability is usually assigned using frequency – a good example of this was the posted question concerning the next digit of pi: the poster suggested that it could one of 0 to 9, with a probability of 1/10. “1/10” is a frequency definition of probability.

Loading...

Reply ↓