确定哪些列包含货币数据

确定哪些列包含货币数据

本文介绍了R-确定哪些列包含货币数据$的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的数据集,其中有些列的格式设置为货币,某些数字和某些字符。读取数据时,所有货币列均被识别为因素,我需要将其转换为数字。数据集太宽,无法手动识别列。我正在尝试找到一种编程方式,以确定一列是否包含货币数据(例如,以 $开头),然后传递要清除的那列列表。

I have a very large dataset with some columns formatted as currency, some numeric, some character. When reading in the data all currency columns are identified as factor and I need to convert them to numeric. The dataset it too wide to manually identify the columns. I am trying to find a programmatic way to identify if a column contains currency data (ex. starts with '$') and then pass that list of columns to be cleaned.

name <- c('john','carl', 'hank')
salary <- c('$23,456.33','$45,677.43','$76,234.88')
emp_data <- data.frame(name,salary)

clean <- function(ttt){
as.numeric(gsub('[^a-zA-z0-9.]','', ttt))
}
sapply(emp_data, clean)

此示例中的问题在于,此方法适用于所有列,导致name列替换为NA。我需要一种方法来以编程方式仅识别需要将clean函数应用于的列。

The issue in this example is that this sapply works on all columns resulting in the name column being replaced with NA. I need a way to programmatically identify just the columns that the clean function needs to be applied to.. in this example salary.

推荐答案

使用 dplyr stringr 包,您可以使用 mutate_if 来标识包含以 $ 开头的任何字符串的列

Using dplyr and stringr packages, you can use mutate_if to identify columns that have any string starting with a $ and then change the accordingly.

library(dplyr)
library(stringr)

emp_data %>%
  mutate_if(~any(str_detect(., '^\\$'), na.rm = TRUE),
            ~as.numeric(str_replace_all(., '[$,]', '')))

这篇关于R-确定哪些列包含货币数据$的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 10:10