# Stata: Using generate to create new variables

06 Jul 2011# Generating New Variables

The primary method for creating new variables in Stata is the `generate`

command. Load the **auto** dataset.

```
clear
sysuse auto
describe
```

## New Variable from Existing Variables

Let’s create a new variable that is the sum of *weight* and *length* (ignore for the moment that summing weights and lengths doesn’t make a ton of sense). Simple with `generate`

. The syntax of `generate`

is:

```
generate nameOfNewVariable=whateverTheNewVariableIsEqualTo
```

So to create a new variable called *weightlength* that is the sum of *weight* and *length* we type:

```
generate weightlength = weight+length
```

Now we have new variable called *weightlength*.

Suppose now that we want to create a new variable that is the square of weight.

```
generate weight2 = weight^2
```

## New Variable that is a Constant

Suppose we want to create a new variable that is a constant value (this isn’t necessarily a good idea and you can use macros to store constants but using a variable can be pretty convenient too). Let’s make a new variable *x* that is equal to 100.

```
generate x = 100
```

Let’s create a new variable that is equal to the mean of weight – we’ll call it *meanweight*.

```
summarize weight
```

```
generate meanweight = 3019.459
```

You can also use the results of the `summarize`

command to create a mean.

```
summarize weight
generate meanweight = r(mean)
```

You can use the `_N`

operator to create a new variable that is equal to the number of observations in a dataset.

```
generate obs = _N
```

If you combine this with `by`

you can create a new variable that will be equal to the number of observations within the levels of the `by`

variable. For example, we can type:

```
by foreign: generate obs = _N
```

This will create a variable that is a constant within the levels of *foreign*. That is, we are going to get the number of foreign cars and the number of domestic cars. If a line in the data is associated with foreign cars the new *obs* variable will have a value of 22 and domestic cars will have a value of 52. Give it a try and see how it works.

## New Variable that is a Random Draw from a Distribution

We can create a new variable that is a random draw from a distribution. Let’s create a new variable whose values will be random draws from a normal distribution with a mean of 0 and a standard deviation of 1. The random normal generator command is `rnormal()`

(it defaults to a mean of 0 and standard deviation of 1 and it will draw as many values as there are observations in the dataset).

```
generate random = rnormal()
```

## Create a New Variable that Indexes the Observations

You can use the `_n`

operator to create a variable that indexes the observation number.

```
generate index = _n
```

This will create a new variable that runs from 1 to 74. You can combine this with `by`

to create an index within another variable.

```
by foreign: index = _n
```

This will create a new variable that runs from 1 to 52 for domestic cars and 1 to 22 for foreign cars.

## Conclusion

I’ve just touched on the ways you can create new variables. You can also use the `egen`

command to create new variables. Try new ways to create variables and be sure to read the help files.