问题描述
假定空字符串的基本as.integer()
强制为NA
,而不会发出警告,例如:
Given that the base as.integer()
coercion of the empty string is NA
without warning, as in:
str( as.integer(c('1234','5678','')) ) # int [1:3] 1234 5678 NA -- no warning
我正在努力理解为什么bit64::as.integer64()
会在没有警告的情况下强制为零:
I'm struggling to understand why bit64::as.integer64()
coerces to zero without warning:
library('bit64')
str( as.integer64(c('1234','5678','')) ) # integer64 [1:3] 1234 5678 0 -- no warning
比较甚至更陌生:
str( as.integer(c('1234','5678','', 'Help me Stack Overflow')) )
# int [1:4] 1234 5678 NA NA -- coercion warning
具有:
str( as.integer64(c('1234','5678','', 'Help me Stack Overflow')) )
# integer64 [1:4] 1234 5678 0 NA -- no warning
我的解决方法惨遭失败:
My workaround for this fails miserably:
asInt64 <- function(s){
require(bit64)
ifelse(grepl('^\\d+$',s), as.integer64(s), NA_integer64_)
}
str(asInt64(c('1234','5678','', 'Help me Stack Overflow')) )
# num [1:4] 6.10e-321 2.81e-320 0.00 0.00
# huh?
所以,我问:
-
为什么会这样?
why does this happen?
最佳解决方法是什么?
推荐答案
为什么会发生
正如@lukeA的评论所指出的那样,as.integer64.character
的来源是:
SEXP as_integer64_character(SEXP x_, SEXP ret_){
long long i, n = LENGTH(ret_);
long long * ret = (long long *) REAL(ret_);
const char * str;
char * endpointer;
for(i=0; i<n; i++){
str = CHAR(STRING_ELT(x_, i)); endpointer = (char *)str; // thanks to Murray Stokely 28.1.2012
ret[i] = strtoll(str, &endpointer, 10);
if (*endpointer)
ret[i] = NA_INTEGER64;
}
return ret_;
}
和strtoll("")
在调用无效值(例如""
或"ABCD"
)时返回零,并返回错误.一个参考 strtoll
示例的处理方式如下:
and strtoll("")
returns zero with an error when called on an invalid value such as ""
or "ABCD"
. One reference strtoll
example handles this like:
/* If the result is 0, test for an error */
if (result == 0)
{
/* If a conversion error occurred, display a message and exit */
if (errno == EINVAL)
{
printf("Conversion error occurred: %d\n", errno);
exit(0);
}
/* If the value provided was out of range, display a warning message */
if (errno == ERANGE)
printf("The value provided was out of range\n");
}
所以我现在想找出的是为什么*endpointer
评估为FALSE. (敬请期待...)
So what I am trying to figure out now is why *endpointer
is evaluating to FALSE. (Stay tuned...)
这是模仿基本as.integer
行为的解决方法:
Here's the workaround to mimic the behavior of base as.integer
:
library(bit64)
charToInt64 <- function(s){
stopifnot( is.character(s) )
x <- as.integer64(s)
# as.integer64("") unexpectedly returns zero without warning.
# Overwrite this result to return NA without warning, similar to base as.integer("")
x[s==""] <- NA_integer64_
# as.integer64("ABC") unexpectedly returns zero without warning.
# Overwrite this result to return NA with same coercion warning as base as.integer("ABC")
bad_strings <- grepl('\\D',s) # thanks to @lukeA for the hint
if( any(bad_strings) ){
warning('NAs introduced by coercion')
x[bad_strings] <- NA_integer64_
}
x
}
要查看其工作原理,
test_string <- c('1234','5678','', 'Help me Stack Overflow')
charToInt64(test_string) # returns int64 [1] 1234 5678 <NA> <NA> with warning
charToInt64(head(test_string,-1)) # returns int64 [1] 1234 5678 <NA> without warning
这篇关于为什么as.integer64(“")返回0而不是NA_integer64_?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!