Recording the source of observations

When combining several datasets together, whether via merge or set, it can be useful to have an indicator variable in the output dataset to show which input datasets contributed to each observation in the output dataset. A simple way to do this is:

data out; merge data1 (in=a) data2 (in=b) data3 (in=c); by subject; source = 100*a + 10*b + c; run;

Using the factors 100, 10 and 1 for a, b and c respectively means source can be read as a set of 1/0 flags representing each dataset. A source value of 101 means that only datasets data1 and data3 contributed, and so on. You could also use factors 4, 2 and 1 to get a binary encoding, but that makes the values in source harder to interpret.