Bad Style

I spend a large part of my working life maintaining SAS code written by other people. Sometimes, on opening a particular program for debugging and editing, I can immediately tell that the original author was an experienced and knowledgable programmer. More often, however, I see one of a number of common 'mistakes' that indicate that the original author might have been rather more junior. I put 'mistakes' in quotes because these are not things that are actually wrong, or that directly cause bugs. Instead, they are things that, at best, suggest an incomplete knowledge of how SAS actually works.

The list below is far from complete and I'll add to it from time to time. If you're still learning SAS, or if you write code that will be reviewed by a more senior colleague, you might want to avoid these.

Writing the LENGTH statement for a character variable as if it's a FORMAT statement

A character variable's LENGTH statement consists of the LENGTH keyword, the variable's name, a dollar sign, and the desired length. These are four separate and distinct items. However, I often see this:

data demog; length race $30.; format race $30.; ... run;

The LENGTH statement has been written to match the syntax of the format name in the FORMAT statement (this is often done even when there is no FORMAT statement present). Note that length race $30. is syntactically equivalent to length race $ 30, which is more correct. SAS allows you to omit the space between the dollar sign and the number, and also allows you to put an extraneous full stop after a number, but that doesn't mean that you should. A statement like:

fahr = (cent * 9. / 5.) + 32.;

is also syntactically valid, but very few programmers would consider it good style. Using length race $ 30 shows that you understand the difference.

Adding unnecessary variables to BY statements

A common task in clinical programming is to order a set of observations into, say, date and time order for each subject, and then keep only the last observation for each subject, perhaps to determine their last on-treatment reading. This is often coded as:

proc sort data=weight; by subjid visdate vistime; run; data last_wt; set weight; by subjid visdate vistime; if last.subjid; run;

The BY statement has simply been copied from the PROC SORT into the data step. However, the only 'first.' or 'last.' variable being used in the data step is SUBJID, so there's no need to specify VISDATE and VISTIME on the second BY statement. This is a more correct version:

proc sort data=weight; by subjid visdate vistime; run; data last_wt; set weight; by subjid; if last.subjid; run;

which shows that the programmer has a much better understanding of how PROC SORT, dataset sort orders and FIRST./LAST. processing work.

Over-using the trim() function

The trim() function removes the trailing spaces from a character variable's value. It's one of SAS's quirks that it has no concept of 'end-of-string' - a character variable with a length of 20 is always 20 characters long, with trailing spaces added if necessary. The trailing spaces have to be removed in some situations, but not all. It shows good understanding of SAS if you only use trim() where it's actually necessary. For example, if you want to add some extra text to the end of a character variable, then you need to trim the trailing spaces before concatenating the new value:

data out; length result $ 40; set in; result = put(res, best.-L); if not valid then result = trim(result) || ' (invalid)'; run;

However, if you're adding some extra text to the beginning of a character variable, there's no need to trim the trailing spaces:

data out; length result $ 40; set in; result = put(res, best.-L); if type='T' then result = 'Total: ' || result; run;

Note also that, in both cases, the value of result is already left-justified, so there's no need for any left() functions in there either.

Adding unnecessary dots to macro variable names

When resolving the value of a macro variable using the normal '&' syntax, SAS allows you to use a dot (period) to mark the end of the macro variable's name. You are allowed to use this in all cases, but it isn't needed unless the character following the macro variable's name could be interpreted as part of the name. Many programmers seem to add a dot whenever the variable name is followed by anything but a space. This comes up most often in path names:

%let project=MADDOG2020; libname data "/projects/&project./rawdata";

The dot is not needed above because the slash can never be part of a variable name - the SAS parser knows that the name of the variable is 'project' without any need for a dot. This version:

%let project=MADDOG2020; libname data "/projects/&project/rawdata";

looks neater and demonstrates a better understanding of what the dot notation does. The dot is only needed when the character following the end of the variable name could be part of the name - ie A-Z, 0-9 or the underscore. A typical example of where the dot is needed is:

%let subpat=Patient; title1 "Demographics - All &subpat.s";

We want the title to read "All Patients", but without the dot SAS would be trying to resolve a macro variable called 'subpats', which doesn't exist.

Comments