《学习R》笔记:科学计算器、检查变量和工作区、向量、矩阵和数组、列表和数据框
一、第二章1765243235 科学计算器
要检查两个数字是否一样,要使用 all.equal() ,不要使用 == ,== 符号仅用于比较两个整型数是否存在相同 。
1 2 3 4 5 6 7 8 | > all.equal ( sqrt (2)^2,2) [1] TRUE > all.equal ( sqrt (2) ^ 2,3) [1] "Mean relative difference: 0.5" > isTRUE ( all.equal ( sqrt (2) ^ 2,2)) [1] TRUE > isTRUE ( all.equal ( sqrt (2) ^ 2,3)) [1] FALSE |
二、第三章 检查变量和工作区
变量的类:逻辑类(logical)、三个数值的类(numeric、complex、integer)、用于存储文本的字符character、存储类别数据的因子factor,以及较罕见的存储二进制数据的原始值raw
factor因子,存储类别数据
1 2 3 4 5 6 7 8 | > gender = factor ( c ( "male" , "female" , "male" , "female" )) > gender [1] male female male female Levels: female male > levels (gender) [1] "female" "male" > nlevels (gender) [1] 2 |
在底层,因子的值被存储为整数,而非字符。可以通过调用 as.integer() 清楚的看到
1 2 | > as.integer (gender) [1] 2 1 2 1 |
事实证明,采用整数而非字符文本的存储方式,令内存的使用非常高效
1 2 3 4 5 6 7 8 9 10 | > gender_char = sample ( c ( "female" , "male" ),1000,replace = TRUE ) > gender_char ...... > gender_fac = as.factor (gender_char) > #把数据的类型转换为因子型 > object.size (gender_char) #object.size()函数返回对象的内存大小 8160 bytes > object.size (gender_fac) 4560 bytes |
把因子转换为字符串
1 2 | > as.character (gender) [1] "male" "female" "male" "female" |
改变一个对象的类型(转型casting)
1 2 3 4 5 | > x = "123.456" #使用as*函数改变x的类型 > as.numeric (x) #as(x,"numeric") [1] 123.456 > is.numeric (x) [1] FALSE |
代码 options(digits = n) 设置全局变量确定打印数字的小数点位数。
1 2 3 4 | > options (digits = 10) > (x = runif (5)) [1] 0.040052175522 0.544388080016 0.506369658280 [4] 0.144690239336 0.005838404642 |
runif 函数将生成30个均匀分布于0和1之间的随机数,summary 函数就不同的数据类型提供汇总信息,例如对数值变量:
1 2 3 4 5 6 | > num = runif (30) > summary (num) Min. 1st Qu. Median Mean 0.001235794 0.199856233 0.475356185 0.475318138 3rd Qu. Max. 0.703412558 0.984893506 |
letters、LETTERS 是两个内置的常数
1 2 3 4 5 6 7 8 | > letters [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" [13] "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" [25] "y" "z" > LETTERS [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" [13] "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" [25] "Y" "Z" |
sample 函数为抽样函数,它的格式为:sample( x , size= , replace= ) 第三个参数的缺省值是F ,表示进行的是无放回抽样。
对a~e重复随机抽样30次:
1 2 3 4 | > fac = factor ( sample ( letters [1:5],size = 30,replace = T)) > summary (fac) a b c d e 4 7 2 5 12 |
1 2 3 4 | > bool = sample ( c ( TRUE , FALSE , NA ),30,replace = TRUE ) > summary (bool) Mode FALSE TRUE NA 's logical 10 8 12 |
创建数据框dfr ,这里只显示他的前几行
1 2 3 4 5 6 7 8 9 | > dfr = data.frame (num,fac,bool) > head (dfr) #默认显示前6行 num fac bool 1 0.34019507235 b NA 2 0.77415443189 e TRUE 3 0.02201034524 d TRUE 4 0.11190012516 e NA 5 0.18030911358 a NA 6 0.98489350639 d TRUE |
1 2 3 4 5 6 7 8 | > summary (dfr) num fac bool Min. :0.001235794 a: 4 Mode :logical 1st Qu.:0.199856233 b: 7 FALSE :10 Median :0.475356185 c: 2 TRUE :8 Mean :0.475318138 d: 5 NA 's :12 3rd Qu.:0.703412558 e:12 Max. :0.984893506 |
str 函数能显示对象的结构。对向量来说,它并非很有趣(因为它们太简单了),但 str 对数据框和嵌套列表非常有用:
1 2 3 4 5 6 7 | > str (num) num [1:30] 0.34 0.774 0.022 0.112 0.18 ... > str (dfr) 'data.frame' : 30 obs. of 3 variables: $ num : num 0.34 0.774 0.022 0.112 0.18 ... $ fac : Factor w/ 5 levels "a" , "b" , "c" , "d" ,..: 2 5 4 5 1 4 1 4 1 5 ... $ bool: logi NA TRUE TRUE NA NA TRUE ... |
每个类都有自己的打印(print)方法,以此控制如何显示到控制台。又是,这种打印模糊了其内部结构,或忽略了一些有用的信息。用unclass函数可绕开这一点,显示变量是如何构建的。例如,对因子调用 unclass 函数会显示它仅是一个整数(integer) 向量,拥有一个叫 levels 的属性:
1 2 3 4 | unclass (fac) [1] 2 1 4 3 attr (, "levels" ) [1] "cat" "dog" "goldfish" "hamster" |
attributes 函数能显示当前对象的所有属性列表:
1 2 3 4 5 6 | > attributes (fac) $levels [1] "cat" "dog" "goldfish" "hamster" $class [1] "factor" |
view 函数会把数据框显示为电子表格。edit 和 fix 与其相似,不过它们允许手动更改数据值。
1 2 3 | View (dfr) #不允许更改 new_dfr = edit (dfr) #更改将保存于new_dfr fix (dfr) #更改将保存于dfr |
1 | View ( head (dfr,50)) #查看前50行 |
三、第四章 向量、矩阵和数组
数组能存放多维矩形数据。矩阵是二维数组的特例。
有很多创建序列的方法,seq创建的优点是可设置步长。
1 2 | > (xulie = seq (1,15,2)) [1] 1 3 5 7 9 11 13 15 |
length() 函数查询序列的长度:
1 2 | > length (xulie) [1] 8 |
向量的命名:
1 2 3 4 5 6 7 8 | > c (apple = 1,banana = 2, "kiwi fruit" = 3, 4) apple banana kiwi fruit 1 2 3 4 > x = 1:4 > names (x) = c ( "apple" , "banana" , "kiwi fruit" , "" ) > x apple banana kiwi fruit 1 2 3 4 |
数组的创建:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | > three_d_array = array ( #三维数组 + 1:24, + dim = c (4,3,2), + dimnames = list ( + c ( "one" , "two" , "three" , "four" ), + c ( "ein" , "zwei" , "drei" ), + c ( "un" , "deux" ) + ) + ) > three_d_array , , un ein zwei drei one 1 5 9 two 2 6 10 three 3 7 11 four 4 8 12 , , deux ein zwei drei one 13 17 21 two 14 18 22 three 15 19 23 four 16 20 24 |
1 2 3 4 5 6 7 8 9 10 11 12 13 | > (a_matrix = matrix ( #创建矩阵 + 1:12, + nrow = 4,byrow = T, + dimnames = list ( + c ( "one" , "two" , "three" , "four" ), + c ( "ein" , "zwei" , "drei" ) + ) + )) ein zwei drei one 1 2 3 two 4 5 6 three 7 8 9 four 10 11 12 |
一些函数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | > x = (1:5) ^ 2 > x [1] 1 4 9 16 25 > x[ c (1,3,5)] [1] 1 9 25 > x[ c (-2,-4)] [1] 1 9 25 > x[ c ( TRUE ,F,T,F,T)] [1] 1 9 25 > names (x) = c ( "one" , "four" , "nine" , "sixteen" , "twenty five" ) > x one four nine sixteen twenty five 1 4 9 16 25 > which (x > 10) sixteen twenty five 4 5 > which.min (x) one 1 > which.max (x) twenty five 5 > |
1 2 3 4 5 6 7 8 | > rep (1:5 , 3) [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 > rep (1:5 , each = 3) [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 > rep (1:5 , times = 1:5) [1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 > rep (1:5 , length.out = 7) [1] 1 2 3 4 5 1 2 |
1 2 3 4 | > rep.int (1:5 , 3) [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 > rep_len (1:5 , 13) [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 |
1 2 3 4 5 6 7 8 | > dim (three_d_array) [1] 4 3 2 > dim (a_matrix) [1] 4 3 > nrow (a_matrix) [1] 4 > ncol (a_matrix) [1] 3 |