Mastering WHERE Statement in SAS

Where Statement in SAS

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Table of Contents Hide
  1. Basic Data Subsetting using Where statement in SAS
  2. SAS where statement with multiple conditions
  3. Using Operators with WHERE statement in SAS
  4. IS MISSING and IS NULL
  5. BETWEEN AND
  6. NOT Operator
  7. SAME AND
  8. IN and NOT IN Operator in SAS
  9. CONTAINS (?) Operator in SAS
  10. NOT CONTAINS (^?)
  11. LIKE Operator in SAS
    1. A Simple use of the Like operator
    2. Selecting Values that Begin with a Character String

    Data manipulation and management are vast fields in which SAS stands out as an incredibly versatile and potent tool. Where Statements in SAS are the key to its functionality, allowing users to refine their data operations with surgical precision.

    Do you fully utilize its potential, however? Our post “Advanced Techniques: Mastering the Where Statement in SAS” will ensure just that. Learn how to make the most of this essential SAS statement, discover expert tricks, and elevate your expertise.

    In this article, we will help you understand the nuances of this potent statement, helping you unlock the keys to enhancing data management.

    Basic Data Subsetting using Where statement in SAS

    You can use the WHERE statement in SAS to filter data when reading from a SAS dataset.

    Example:

    where statement

    You may use a WHERE or a subsetting IF statement in this example. There are advantages to using a WHERE statement in SAS instead of a subsetting IF statement.

    You have a larger alternative of operators that can be used with a WHERE statement, and if the input data set is indexed, the WHERE statement in SAS is likely more efficient.

    You may also use a WHERE statement in a SAS procedure to subset the data being processed.

    Note: IF statements are not allowed inside SAS procedures.

    SAS where statement with multiple conditions

    In SAS, the WHERE statement can accommodate multiple conditions by using logical operators such as AND , OR , and NOT .

    Let’s create a dataset named ‘Employee_data’ in SAS with ‘Name’ and ‘Salary’ as the variables.

    Let’s consider an example where we want to extract data for employees earning more than $50,000 but less than or equal to $80,000 from the ‘Employee_data’ dataset.

     50000 AND Salary 

    Where statement with AND condition

    In this case, the WHERE statement is combined with the AND operator. Only those records from ‘Employee_data’ where ‘Salary’ is greater than $50,000 and less than or equal to $80,000 will be selected and saved to the ‘selected_data’ dataset.

    Similarly, the OR operator can be used to select records based on any of the conditions being met. For example, if we want to select employees with salaries either less than $30,000 or more than $70,000, the WHERE statement would be as follows:

     70000; RUN;

    Where statement with OR condition

    The NOT operator can be used to exclude records that meet certain conditions. If we want to select all employees who are not high earners (with salaries greater than $70,000), we can use the WHERE statement as follows:

     70000); RUN;

    Where statement with NOT condition

    Using Operators with WHERE statement in SAS

    Below is the list of operators that you can use with the WHERE statement in SAS.

    IS MISSING and IS NULL

    The IS MISSING and IS NULL operators can handle character or numeric variables in WHERE, ON, and HAVING expressions.

    It results in true if the expression results are missing and false if it is not missing.

    They also work with the NOT operator.

    BETWEEN AND

    The BETWEEN-AND operator in SAS is a useful operator for filtering data within a specified range. It can be applied to both numeric and date variables, making it highly versatile for various data analysis tasks. Typically, the BETWEEN-AND operator is used within a WHERE statement to filter data that falls between two values.

    Consider a simple example where we have a dataset called ‘sashelp.class’. Suppose we want to filter out names with weight between 60 and 90. The SAS code with BETWEEN-AND would look like this:

    Between And

    NOT Operator

    The NOT operator can be used with the WHERE statement in many other ways:

    • where not (score in (34, 44, 84))
    • where not (Score between 50 and 90)
    • where NOT(Section EQ “A”)

    SAME AND

    Multiple WHERE statements cannot be used in a DATA step as IF statements. If SAS encounters a second WHERE statement, it will replace the first.

    The WHERE AND also known as WHERE ALSO will let you use multiple where statements in SAS to add more restrictions.

     12 ; run;

    WHERE SAME AND

    IN and NOT IN Operator in SAS

    The SAS IN operator is a comparison operator that searches for character and numeric values equal to one of the values from a given list of values.

    The SAS where in list of values must be in parentheses, with each character value in quotation marks and separated by either a comma or blank.

    For example, suppose you want to filter for SUVs, sedans or Wagon cars. You could specify the values as:

    Note the use of the strip and Upcase function in the type variable. This ensures that any leading or trailing spaces are removed, and the values will be converted to Uppercase before comparison.

    In addition to this, you can use the logical operator NOT to exclude values that are on the list.

    You can also use a shorthand notation to specify a range of numbers to search. The range is specified by using the syntax M: N as a value in the list to search, where M is the lower bound and N is the upper bound.

    You can replace the below statement in the form of WHERE IN shorthand notation.

    CONTAINS (?) Operator in SAS

    The CONTAINS operator returns a true value if the character string on the left side of the operator contains the string on the right side.

    Contains Operator

    NOT CONTAINS (^?)

    ^= (or NOT CONTAINS) returns a true value if the character string on the left side of the operator does not contain the string on the right side.

    Not Contains

    The “contains” operator (?) and the “not contains” operator (^?) match a substring that appears anywhere in the target character variable.

    LIKE Operator in SAS

    LIKE operator is frequently used for pattern matching, that is, evaluating whether a variable value equals, begins with a specified character, or sounds like a specified value or pattern.

    The LIKE expression uses two wildcard operators. When using the LIKE operator, the underscore(_) wildcard takes the place of a single character, whereas the % sign might be substituted for a string of any length (including a null string).

    A Simple use of the Like operator

    like operator

    Double quotes (“) are used here to include the apostrophe in “Women’s Dress”. Single quotes can be used when the text string does not contain an apostrophe or other single quotes.

    Selecting Values that Begin with a Character String

    The percentage sign (%) wildcard searches for character values that begin with, end with or even contain certain character strings. Below are some of the examples using wildcards.

    The above expression returns all names that begin with “a” and are followed by any characters of any lengths

    like operator

    _ indicates any character at start followed by “a” and ending with any char of any length.

    where wild cards

    At the beginning of a search, two underscores indicate that any two characters followed by “n” should be filtered.

    Like Operator in SAS

    The above expression would return character values with two characters at the beginning followed by n and any one character at the end.

    Like Operator in SAS

    Sounds Like Operator (=*)

    The SOUNDS-LIKE operator is based on the SOUNDEX algorithm for identifying words that sound alike. The SOUNDEX algorithm is English-biased, so it’s not useful for languages other than English.

    Sounds Like Operator

    You can download the dataset employees.sas7bdat used in the program.

    Using a WHERE statement and a WHERE=option in a DATA Step.

    Using the WHERE statement in SAS and a WHERE= dataset option in a DATA step is not a good practice. SAS ignores the WHERE statement when both are used in the same dataset.

    Additionally, a WARNING appears in the SAS log indicating that the WHERE statement cannot be applied.

    What is the difference between if and WHERE statements in SAS?

    In SAS, both the IF and WHERE statements are used to filter data based on certain conditions. However, they operate at different points in the data processing stage and thus have distinct functionalities and uses.

    1. Execution Order: The WHERE statement operates at the time of data input before the Data Step begins. It reads only those observations from the dataset that meet the specified condition. In contrast, the IF statement operates after the Data Step has started, meaning it reads all observations into the program data vector before applying the condition. This makes the WHERE statement more efficient when working with large datasets as it reduces I/O processing time.
    2. Subsetting: While both statements can be used for subsetting datasets, the WHERE statement is generally used in both Data Steps and PROC Steps. The IF statement, on the other hand, can only be used in Data Steps.
    3. Conditions: The WHERE statement can only handle conditions involving existing variables in the dataset. The IF statement is more flexible and can work with conditions involving newly created variables within the same Data Step.
    4. Syntax: WHERE uses SQL-like syntax and supports operators like BETWEEN-AND , CONTAINS , IS MISSING , etc. IF uses a different syntax and does not support these operators, but it can accommodate SAS functions within the condition.

    For example, if you have a dataset named ‘Employee_data’ with ‘Name’ and ‘Salary’ as the variables, and you want to select employees who earn more than $50,000, you could use a WHERE statement or an IF statement as follows:

    Using a WHERE Statement:

     50000; RUN;

    Using an IF Statement:

     50000; RUN;

    Frequently Asked Questions

    What is the WHERE statement in SAS?

    The u003ccodeu003eWHEREu003c/codeu003e statement in SAS is a data step statement used to select specific observations from a dataset based on certain conditions.

    Can I use multiple conditions in a SAS WHERE statement?

    Yes, the u003ccodeu003eWHEREu003c/codeu003e statement in SAS can handle multiple conditions using logical operators like u003ccodeu003eANDu003c/codeu003e, u003ccodeu003eORu003c/codeu003e, and u003ccodeu003eNOTu003c/codeu003e.

    How can I select a range of values using a WHERE statement in SAS?

    The u003ccodeu003eBETWEEN-ANDu003c/codeu003e operator can be used in a u003ccodeu003eWHEREu003c/codeu003e statement in SAS to select a range of values. For example, u003ccodeu003eWHERE Salary BETWEEN 40000 AND 60000;u003c/codeu003eTo clarify, the query will retrieve all records where the ‘Salary’ field is within the range of $40,000 to $60,000, inclusive.

    What does the ^= operator do in a WHERE statement?

    The u003ccodeu003e^=u003c/codeu003e operator is the u0022not containsu0022 operator in SAS. It is used to select observations where a certain character string is not present in a variable. For example, u003ccodeu003eWHERE Position ^= ‘Manager’;u003c/codeu003e would select all records where the ‘Position’ does not contain the word ‘Manager’.

    Can I use the WHERE statement with date values in SAS?

    Yes, you can use the u003ccodeu003eWHEREu003c/codeu003e statement with date values in SAS. For example, u003ccodeu003eWHERE HireDate BETWEEN ’01JAN2015’d AND ’31DEC2020’d;u003c/codeu003e would select all records where the ‘HireDate’ falls within the specified date range.

    Can you use a WHERE statement in proc freq?

    Yes, you can use a u003ccodeu003eWHEREu003c/codeu003e statement in u003ccodeu003ePROC FREQu003c/codeu003e in SAS to limit the analysis to a subset of data that meets the specified condition(s). The u003ccodeu003eWHEREu003c/codeu003e statement is used before the u003ccodeu003eTABLESu003c/codeu003e statement in the u003ccodeu003ePROC FREQu003c/codeu003e call.

    Conclusion

    In conclusion, mastering the Where Statement In SAS is pivotal for anyone seeking to harness the full potential of SAS programming for efficient data management. The ability to accurately apply and interpret WHERE statements, whether used independently or in combination with logical operators, is an indispensable skill in today’s data-driven world. This article has provided detailed insights into the application of the WHERE statement with multiple conditions, enabling you to implement complex logical operations with ease and precision.

    Furthermore, we explored the difference between IF and WHERE statements, highlighting the unique strengths of the latter. The utility of the WHERE statement in various PROC procedures like PROC FREQ was also discussed, showing the breadth of its application. We trust that these advanced techniques have enriched your understanding and will empower your journey in SAS programming.

    Subhro

    Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

    Leave a ReplyCancel reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.