问题描述
我第一次在R中工作,而且在重命名数据帧(Grade.Data)中的列名时遇到了困难.我有一个从csv文件导入的数据集,该数据集具有如下列名: 学生编号
I am working in R for the first time and I have been having difficulty renaming column names in a dataframe (Grade.Data). I have a dataset imported from an csv file that has column names like this: Student.ID
Grade
Interactive.Exercises.1..Health
Interactive.Exercises.2..Fitness
Quizzes.1..Week.1.Quiz
Quizzes.2..Week.2.Quiz
Case.Studies.1..Case.Study1
Case.Studies.2..Case.Study2
我希望能够更改变量名称,使它们更简单,即从Interactive.Exercises.1.Health更改为Interactive.Exercises.1或Quizzes.1.Week.1.Quiz更改为Quizzes.1.
I would like to be able to change the variable names so that they are more simple, i.e. from Interactive.Exercises.1.Health to Interactive.Exercises.1 or Quizzes.1.Week.1.Quiz to Quizzes.1
到目前为止,我已经尝试过:
So far, I have tried this:
grep(".*[0-9]", names(Grade.Data))
但是我得到了这个返回:
But I get this returned:
[1] 3 4 5 6 7 8 9 11 12 13 14 15 16 17 19 20 21 22 23 24 25
有人可以帮助我弄清楚发生了什么,并写出更好的正则表达式吗?非常感谢.
Can anyone help me figure out what is going on, and write a better regex expression? Thank you so much.
推荐答案
似乎您在第一批数字后截断了列名.
It seems you truncate column names after the first chunk of digits.
您可以使用以下sub
解决方案:
You may use the following sub
solution:
names(Grade.Data) <- sub("^(.*?\\d+).*$", "\\1", names(Grade.Data))
请参见 regex演示
详细信息
-
^
-字符串开头 -
(.*?\\d+)
-第1组(后继替换模式中用\1
表示)匹配的0+个字符越少越好(.*?
),然后匹配1个或多个数字(\d+
) -
.*
-尽可能多的0个字符 -
$
-字符串结尾
^
- start of string(.*?\\d+)
- Group 1 (later referred with\1
from the replacement pattern) matching any 0+ chars as few as possible (.*?
) and then 1 or more digits (\d+
).*
- any 0+ chars as many as possible$
- end of string
这篇关于使用上一列名称和正则表达式模式重命名R中的数据框列名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!