# Stata/Descriptive Statistics

In this section we show how to do summary statistics using Stata. This section include three subsection. The first one deals with commands to describe a whole dataset, the second one, with commands to describe a single variable and the third one with command to describe a set of variables.

## Describe a dataset

• 'des' (describe) : gives the size of the file, the number of observation, the number of variable, the list, labels, types for each variable.
• 'des, s' (describe short) : gives only the size of the file, the number of observations, the number of variables.
• 'des' return the number of changes since last save, the number of variables 'r(k)', the number of observations 'r(N)'.
```. sysuse cancer, clear
(Patient Survival in Drug Trial)
. describe
. des, s
. ret list
```
• codebook
• inspect

## Univariate statistics

### Continuous variables

• su
• su, d
• robmean : robust mean

• ta

## Multivariate statistics

### Continuous variables

• corr returns the matrix of linear correlation between a set of variables.
• corr, cov returns the covariance matrix.

Here is an example. We first simulate a y and x such that there is a positive correlation between them. We plot the two variables and look at the correlation.

```. clear
. set obs 1000
. gen x =  invnorm(uniform())
. gen u =  invnorm(uniform())
. gen y = x + u
. tw sc y x || lfit y x
. corr y x
(obs=1000)

|        y        x
-------------+------------------
y |   1.0000
x |   0.7197   1.0000

```
• wincorr returns the winsorized correlation : tails have replaced by a limit value. This is useful if some extreme values have a big influence on the correlation coefficient.
• spearman and spearman2 gives the Spearman's rank correlation between two variables. This statistics is less sensitive to outliers than Pearson's linear correlation. This is often useful as a robustness check.
```. spearman y x

Number of obs =    1000
Spearman's rho =       0.7090

Test of Ho: y and x are independent
Prob > |t| =       0.0000
```

• ta

### Continuous and discrete variables

• catgraph : plotting means of a continuous variable by categories
• table