Statistics/Probability

Statistics


  1. Introduction
    1. What Is Statistics?
    2. Subjects in Modern Statistics
    3. Why Should I Learn Statistics? 0% developed
    4. What Do I Need to Know to Learn Statistics?
  2. Different Types of Data
    1. Primary and Secondary Data
    2. Quantitative and Qualitative Data
  3. Methods of Data Collection
    1. Experiments
    2. Sample Surveys
    3. Observational Studies
  4. Data Analysis
    1. Data Cleaning
    2. Moving Average
  5. Summary Statistics
    1. Measures of center
      1. Mean, Median, and Mode
      2. Geometric Mean
      3. Harmonic Mean
      4. Relationships among Arithmetic, Geometric, and Harmonic Mean
      5. Geometric Median
    2. Measures of dispersion
      1. Range of the Data
      2. Variance and Standard Deviation
      3. Quartiles and Quartile Range
      4. Quantiles
  6. Displaying Data
    1. Bar Charts
    2. Comparative Bar Charts
    3. Histograms
    4. Scatter Plots
    5. Box Plots
    6. Pie Charts
    7. Comparative Pie Charts
    8. Pictograms
    9. Line Graphs
    10. Frequency Polygon
  7. Probability
    1. Combinatorics
    2. Bernoulli Trials
    3. Introductory Bayesian Analysis
  8. Distributions
    1. Discrete Distributions
      1. Uniform Distribution
      2. Bernoulli Distribution
      3. Binomial Distribution
      4. Poisson Distribution
      5. Geometric Distribution
      6. Negative Binomial Distribution
      7. Hypergeometric Distribution
    2. Continuous Distributions
      1. Uniform Distribution
      2. Exponential Distribution
      3. Gamma Distribution
      4. Normal Distribution
      5. Chi-Square Distribution
      6. Student-t Distribution
      7. F Distribution
      8. Beta Distribution
      9. Weibull Distribution
  9. Testing Statistical Hypothesis
    1. Purpose of Statistical Tests
    2. Formalism Used
    3. Different Types of Tests
    4. z Test for a Single Mean
    5. z Test for Two Means
    6. t Test for a single mean
    7. t Test for Two Means
    8. paired t Test for comparing Means
    9. One-Way ANOVA F Test
    10. z Test for a Single Proportion
    11. z Test for Two Proportions
    12. Testing whether Proportion A Is Greater than Proportion B in Microsoft Excel
    13. Spearman's Rank Coefficient
    14. Pearson's Product Moment Correlation Coefficient
    15. Chi-Squared Tests
      1. Chi-Squared Test for Multiple Proportions
      2. Chi-Squared Test for Contingency
    16. Approximations of distributions
  10. Point Estimates100% developed  as of 12:07, 28 March 2007 (UTC) (12:07, 28 March 2007 (UTC))
    1. Unbiasedness
    2. Measures of goodness
    3. UMVUE
    4. Completeness
    5. Sufficiency and Minimal Sufficiency
    6. Ancillarity
  11. Practice Problems
    1. Summary Statistics Problems
    2. Data-Display Problems
    3. Distributions Problems
    4. Data-Testing Problems
  12. Numerical Methods
    1. Basic Linear Algebra and Gram-Schmidt Orthogonalization
    2. Unconstrained Optimization
    3. Quantile Regression
    4. Numerical Comparison of Statistical Software
    5. Numerics in Excel
    6. Statistics/Numerical_Methods/Random Number Generation
  13. Time Series Analysis
  14. Multivariate Data Analysis
    1. Principal Component Analysis
    2. Factor Analysis for metrical data
    3. Factor Analysis for ordinal data
    4. Canonical Correlation Analysis
    5. Discriminant Analysis
  15. Analysis of Specific Datasets
    1. Analysis of Tuberculosis
  16. Appendix
    1. Authors
    2. Glossary
    3. Index
    4. Links

edit this box


Probability is connected with some unpredictability. We know what outcomes may occur, but not exactly which one. The set of possible outcomes plays a basic role. We call it the sample space and indicate it by S. Elements of S are called outcomes. In rolling a dice the sample space is S = {1,2,3,4,5,6}. Not only do we speak of the outcomes, but also about events, sets of outcomes (or subsets of the sample space). E.g. in rolling a dice we can ask whether the outcome was an even number, which means asking after the event "even" = E = {2,4,6}. In simple situations with a finite number of outcomes, we assign to each outcome s (∈ S) its probability (of occurrence) p(s) (written with a small p), a number between 0 and 1. It is a quite simple function, called the probability function, with the only further property that the total of all the probabilities sum up to 1. Also for events A do we speak of their probability P(A) (written with a capital P), which is simply the total of the probabilities of the outcomes in A. For a fair dice p(s) = 1/6 for each outcome s and P("even") = P(E) = 1/6+1/6+1/6 = 1/2.

When throwing two dice, what is the probability that their sum equals seven?

The general concept of probability for non-finite sample spaces is a little more complex, although it rests on the same ideas.

Introduction

edit

Why have probability in a statistics textbook?

edit

Very little in mathematics is truly self contained. Many branches of mathematics touch and interact with one another, and the fields of probability and statistics are no different. A basic understanding of probability is vital in grasping basic statistics, and probability is largely abstract without statistics to determine the "real world" probabilities.

This section is not meant to give a comprehensive lecture in probability, but rather simply touch on the basics that are needed for this class, covering the basics of Bayesian Analysis for those students who are looking for something a little more interesting. This knowledge will be invaluable in attempting to understand the mathematics involved in various Distributions that come later.

Set notion

edit

A set is a collection of objects. We usually use capital letters to denote sets, e.g. A is the set of females in this room.

  • The members of a set A are called the elements of A, e.g. Patricia is an element of A (Patricia ∈ A); Patrick is not an element of A (Patrick ∉ A).
  • The universal set, U, is the set of all objects under consideration, e.g., U is the set of all people in this room.
  • The null set or empty set, ∅, has no elements, e.g., the set of males above 2.8m tall in this room is an empty set.
  • The complement Ac of a set A is the set of elements in U outside A, i.e. x ∈ Ac iff x ∉ A.
  • Let A and B be 2 sets. A is a subset of B if each element of A is also an element of B. Write A ⊂ B, e.g. the set of females wearing metal frame glasses in this room ⊂ the set of females wearing glasses in this room ⊂ the set of females in this room.

• The intersection A ∩ B of two sets A and B is the set of the common elements. I.e. x ∈ A ∩ B iff x ∈ A and x ∈ B.

• The union A ∪ B of two sets A and B is the set of all elements from A or B. I.e. x ∈ A ∪ B iff x ∈ A or x ∈ B.

Venn diagrams and notation

edit

A Venn diagram visually models defined events. Each event is expressed with a circle. Events that have outcomes in common will overlap with what is known as the intersection of the events.

 
A Venn diagram.


Probability Axioms

edit

Calculating Probability

edit

Negation

edit

Negation is a way of saying "not A", hence saying that the complement of A has occurred. Note: The complement of an event A can be expressed as A' or Ac
For example: "What is the probability that a six-sided die will not land on a one?" (five out of six, or p = 0.833)

 
 
Complement of an Event

Or, more colloquially, "the probability of 'not X' together with the probability of 'X' equals one or 100%."

Relative frequency describes the number of successes over the total number of outcomes. For example if a coin is flipped and out of 50 flips 29 are heads then the relative frequency is

 


The Union of two events is when you want to know Event A OR Event B.
This is different from "And." "And" is the intersection, whereas "OR" is the union of the events (both events put together).

 
In the above example of events you will notice that...

Event A is a STAR and a DIAMOND.

Event B is a TRIANGLE and a PENTAGON and a STAR
(A ∩ B) = (A and B) = A intersect B is only the STAR
But (A ∪ B) = (A or B) = A Union B is EVERYTHING. The TRIANGLE, PENTAGON, STAR, and DIAMOND
Notice that both event A and Event B have the STAR in common. However, when you list the Union of the events you only list the STAR one time!
Event A = STAR, DIAMOND EVENT B = TRIANGLE, PENTAGON, STAR
When you combine them together you get (STAR + DIAMOND) + (TRIANGLE + PENTAGON + STAR) BUT WAIT!!! STAR is listed two times, so one will need to SUBTRACT the extra STAR from the list.
You should notice that it is the INTERSECTION that is listed TWICE, so you have to subtract the duplicate intersection.

Conjunction

edit

Formula for the Union of Events: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

Example:
Let P(A) = 0.3 and P(B) = 0.2 and P(A ∩ B) = 0.15. Find P(A ∪ B).
P(A ∪ B) = (0.3) + (0.2) - (0.15) = 0.35

Example:
Let P(A) = 0.3 and P(B) = 0.2 and P(A ∩ B) = 0. Find P(A ∪ B).
Note: Since the intersection of the events is the null set, then you know the events are DISJOINT or MUTUALLY EXCLUSIVE.
P(A ∪ B) = (0.3) + (0.2) - (0) = 0.5

Disjunction

edit

Law of total probability

edit

The law of total probability is[1] a theorem that, in its discrete case, states if {\displaystyle \left\{{B_{n}:n=1,2,3,\ldots }\right\}}\left\{{B_{n}:n=1,2,3,\ldots }\right\} is a finite or countably infinite partition of a sample space (in other words, a set of pairwise disjoint events whose union is the entire sample space) and each event {\displaystyle B_{n}}B_{n} is measurable, then for any event {\displaystyle A}A of the same probability space:

{\displaystyle P(A)=\sum _{n}P(A\cap B_{n})}{\displaystyle P(A)=\sum _{n}P(A\cap B_{n})} or, alternatively,[1]

{\displaystyle P(A)=\sum _{n}P(A\mid B_{n})P(B_{n}),}{\displaystyle P(A)=\sum _{n}P(A\mid B_{n})P(B_{n}),} where, for any {\displaystyle n}n for which {\displaystyle P(B_{n})=0}{\displaystyle P(B_{n})=0} these terms are simply omitted from the summation, because {\displaystyle P(A\mid B_{n})}{\displaystyle P(A\mid B_{n})} is finite.

Conditional Probability

edit

What is the probability of one event given that another event occurs? For example, what is the probability of a mouse finding the end of the maze, given that it finds the room before the end of the maze?

This is represented as:

 

or "the probability of A given B."

 

If A and B are independent of one another, such as with coin tosses or child births, then:

 

Thus, "what is the probability that the next child a family bears will be a boy, given that the last child is a boy."

This can also be stacked where the probability of A with several "givens."

 

or "the probability of A given that B1, B2, and B3 are true?"

-->


Conclusion: putting it all together

edit