本文介绍了在数据框中搜索唯一值并使用它们创建表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 自从我开始使用R<不久前,我发现这个网站对帮助我构建脚本非常有用。我再次遇到了一个我无法在任何地方找到答案的挑战。这是我的问题:在我的数据中,我有一列在每一行中包含不同的URL。在每个URL中都有我想要提取的特定信息。目前我在excel中做这件事,因为我被告知在R中不可能做,并且没有功能可以做到这一点。 该网址看起来像这个示例格式,它可以在源列中找到 http://www.googleclick.com?utm_source=ADX&ID56789 http://www.googleclick.com?utm_source=ADW& ; ID56009 http: //www.googleclick.com?utm_source=ADWords&ID53389 对我来说重要的URL部分是utm_source = ADX我的数据看起来像这样: 用户/来源 1 / http://www.googleclick.com?utm_source=ADX&ID56789 2 / http://www.googleclick.com?utm_source=ADW&ID56009 3 / http://www.googleclick.com?utm_source=ADWords&ID53389 我需要做的是从URL中捕获utm_source,并将信息转置到不同的列中,如下所示: User / Source / utm_source 1 / googleclick / ADX& ID56789 2 / googleclick / ADW& ID56009 $ b $因此,本质上我需要R在整个数据框中搜索值utm_source =,一旦它找到了他们,我希望它将utm_source =值转换为列名,并将每个单独行中的=后面的所有信息复制到该列中。我知道grep是一个函数,用于查找datafreme中的特定信息,例如数据< - total [grepl(utm_source,total $ Source),]。这会给我所有包含单词utm_source的行,但我需要的是utm_source之后的信息。通常我的数据可以有多达500.000行。目前,我使用excel函数text to columns来实现这一点,我基本上将URL分成了几个小小的部分,并保留了我所需要的列,但这可能是一个非常混乱而漫长的过程。 有没有一种方法可以修改grepl函数来满足我需要的条件? 解决方案没有什么是不可能的。 x User,Source 1,http://www.googleclick.com?utm_source=ADX&ID56789 2,http://www.googleclick.com?utm_source=ADW&ID56009 3,http:// www。 googleclick.com?utm_source=ADWords&ID53389 ,header = TRUE,stringsAsFactors = FALSE) strsplit strsplit(x $ Source,split =\\?utm_source =) [[1]] [1]http://www.googleclick.comADX& ID56789 [[2]] [1]http://www.googleclick.comADW& ID56009 [[3]] [1]http://www.googleclick.comADWords& ID53389 然后找到一款炙手可热的扑克,并坚持你所谓的顾问的眼光。 编辑: 正如Paul Hiemstra所建议的那样,您也可以直接使用正则表达式: gsub(。* \\ utm_source =,,x $ Source) [1]ADX& ID56789ADW& ID56009 ADWords& ID53389 Since I started using R< not long ago, I've found this site very useful in helping me build my scripts. I have yet again came across a challenge for which I can't seem to find an answer anywhere. Here is my problem: In my data I have a column which contains a different URL in each row. In each of those URL's there is a particular piece of information I want to extract. Currently I do it in excel because I've been told it's impossible to do in R and that no function exists to do it. The URL will look like this example format and it will be found in the "source" columnhttp://www.googleclick.com?utm_source=ADX&ID56789http://www.googleclick.com?utm_source=ADW&ID56009http://www.googleclick.com?utm_source=ADWords&ID53389The part of the URL that is of importance to me is the "utm_source=ADX" bit .My data looks something like this:User / Source1 / http://www.googleclick.com?utm_source=ADX&ID567892 / http://www.googleclick.com?utm_source=ADW&ID560093 / http://www.googleclick.com?utm_source=ADWords&ID53389What I need to do is to capture the utm_source from the URL and transpose the information into a different column, example below:User / Source / utm_source1 / googleclick / ADX&ID567892 / googleclick / ADW&ID560093 / googleclick / ADWords&ID53389So in essence I need R to search in the entire dataframe for the value "utm_source=" and once it has found them, I want it to transpose the "utm_source=" value into a column name and to copy all the information that comes after "=" in a that column for each individual row. I know that "grep" is a function that locates a specific piece of information in the datafreme , for example data <- total[grepl("utm_source", total$Source), ]. This will give me all the rows that contain the word "utm_source" but what I need is the information that comes after " utm_source". Usually my data can have as many as 500.000 rows. At the moment I use the excel function "text to columns" for this, and I basically split the URL's into little bits and keep the columns that I need, but this can be a very messy and lengthy process.Is there a way to modify the grepl function to meet the criteria I need? 解决方案 Nothing is impossible.x <- read.csv(text="User, Source1, http://www.googleclick.com?utm_source=ADX&ID567892, http://www.googleclick.com?utm_source=ADW&ID560093, http://www.googleclick.com?utm_source=ADWords&ID53389", header=TRUE, stringsAsFactors=FALSE)First, use strsplitstrsplit(x$Source, split="\\?utm_source=")[[1]][1] " http://www.googleclick.com" "ADX&ID56789" [[2]][1] " http://www.googleclick.com" "ADW&ID56009" [[3]][1] " http://www.googleclick.com" "ADWords&ID53389" Then find a red-hot poker and stick in the eye of your so-called advisor.EDIT:As suggested by Paul Hiemstra, you can also use a regular expression directly:gsub(".*\\?utm_source=", "", x$Source)[1] "ADX&ID56789" "ADW&ID56009" "ADWords&ID53389" 这篇关于在数据框中搜索唯一值并使用它们创建表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-26 19:16