This is a replacement for dplyr::na_if(). It is useful if you want to convert annoying values to NA. Unlike dplyr::na_if(), this function allows you to specify multiple values to be replaced with NA at the same time.

  • na_if_in() replaces values that match its arguments with NA.

  • na_if_not() replaces values that do not match its arguments with NA.

na_if_in(x, ...)

na_if_not(x, ...)

Arguments

x

Vector to modify

...

Values to replace with NA, specified as either:

  • An object, vector of objects, or list of objects.

  • A function (including a purrr-style lambda function) that returns a logical vector of the same length as x. See section "Formulas" for more details.

Value

A modified version of x with selected values replaced with NA.

Formulas

These functions accept one-sided formulas that can evaluate to logical vectors of the same length as x. The input is represented in these conditional statements as ".". Valid formulas take the form ~ . < 0. See examples.

See also

dplyr::na_if() to replace a single value with NA.

dplyr::coalesce() to replace missing values with a specified value.

tidyr::replace_na() to replace NA with a value.

dplyr::recode() and dplyr::case_when() to more generally replace values.

Examples

x <- sample(c(1:5, 99))
# We can replace 99...
# ... explicitly
na_if_in(x, 99)
#> [1]  4 NA  1  3  5  2
# ... by specifying values to keep
na_if_not(x, 1:5)
#> [1]  4 NA  1  3  5  2
# ... or by using a formula
na_if_in(x, ~ . > 5)
#> [1]  4 NA  1  3  5  2

messy_string <- c("abc", "", "def", "NA", "ghi", 42, "jkl", "NULL", "mno")
# We can replace unwanted values...
# ... one at a time
clean_string <- na_if_in(messy_string, "")
clean_string <- na_if_in(clean_string, "NA")
clean_string <- na_if_in(clean_string, 42)
clean_string <- na_if_in(clean_string, "NULL")
clean_string
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"
# ... or all at once
na_if_in(messy_string, "", "NA", "NULL", 1:100)
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"
na_if_in(messy_string, c("", "NA", "NULL", 1:100))
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"
na_if_in(messy_string, list("", "NA", "NULL", 1:100))
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"
# ... or using a clever formula
grepl("[a-z]{3,}", messy_string)
#> [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
na_if_not(messy_string, ~ grepl("[a-z]{3,}", .))
#> [1] "abc" NA    "def" NA    "ghi" NA    "jkl" NA    "mno"

# na_if_in() is particularly useful inside dplyr::mutate
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
faux_census %>%
  mutate(
    state = na_if_in(state, "Canada"),
    age   = na_if_in(age, ~ . < 18, ~ . > 120)
  )
#> # A tibble: 20 × 6
#>    state gender                         age race                  income relig…¹
#>    <chr> <chr>                        <dbl> <chr>                  <dbl> <chr>  
#>  1 CA    female                          80 Native American       2.8 e4 Christ…
#>  2 NY    Woman                           89 Latino                1.49e5 Spirit…
#>  3 CA    Female                          48 White                 4.79e5 Cathol…
#>  4 TX    Male                            63 latinx                8.5 e4 christ…
#>  5 PA    Male                            47 asian                 4.19e4 Baptist
#>  6 TX    Gender is a social construct    57 Race is a social con… 1.00e7 Religi…
#>  7 NA    Male                            49 white                 1.49e5 method…
#>  8 TX    Female                          50 White                 9.88e4 Luther…
#>  9 NY    f                               NA white                 9.07e4 Agnost…
#> 10 WA    F                               33 White                 4.50e4 Jewish 
#> 11 TX    Male                            30 White                 1.27e5 none   
#> 12 OH    Non-binary                      42 Caucasian             2.16e4 Roman …
#> 13 NC    Female                          22 African American      7.42e4 atheist
#> 14 LA    Male                            NA White                 6.1 e4 Christ…
#> 15 LA    Female                          28 Black                 2   e4 Not re…
#> 16 CA    male                            34 Asian American        7.74e4 Christ…
#> 17 TN    M                               64 white                 1.00e7 Nothing
#> 18 FL    Female                          68 white                 4.71e4 None   
#> 19 OH    Male                            39 black                 2.38e4 baptist
#> 20 NH    male                            73 Hispanic              3.32e4 Christ…
#> # … with abbreviated variable name ¹​religion

# This function handles vector values differently than dplyr,
# and returns a different result with vector replacement values:
na_if_in(1:5, 5:1)
#> [1] NA NA NA NA NA
dplyr::na_if(1:5, 5:1)
#> [1]  1  2 NA  4  5