本文介绍了将唯一/本义词计数到新列中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,该数据集的一列每行包含一组国家。有时国家/地区会重复多次,因此我想在下面的数据集中计算每一行中唯一国家的数量:

I have a dataset with a column containing a set of countries in each row. Sometimes countries are repeated more than once, and I would like to count the number of unique countries in each row of my dataset below:

> class(address_countries2$address_countries)
[1] "character"

> head(address_countries2)
                    address_countries
1                         China China
2                   China China China
3                         China China
4                         China China
5 China China China China China China
6                China China Uk China

所需的输出将是这样的新列:

the desired output would be a new column like this:

                    address_countries n_countries
1                         China China           1
2                   China China China           1
3                         China China           1
4                         China China           1
5 China China China China China China           1
6                China China Uk China           2

此代码为我提供了每一行中的单词数:

this code gives me the number of words inside each row:

address_countries2 <- address_countries2 %>% 
  select(address_countries) %>% 
  mutate(n_countries = str_count(address_countries, boundary("word")))


> head(address_countries2)
                    address_countries n_countries
1                         China China           2
2                   China China China           3
3                         China China           2
4                         China China           2
5 China China China China China China           6
6                China China Uk China           4

我尝试添加唯一性( )以及带有str_count()的n_distinct()和distinct(),但出现此错误:

I have tried adding unique() as well as n_distinct() and distinct() with str_count() but I get this error:

Error in mutate_impl(.data, dots) : 
  Column `n_countries` must be length 34760 (the number of rows) or one, not 39

有任何建议吗?

推荐答案

尝试一下:

您的data.frame

Your data.frame

address_countries2<-data.frame(address_countries=c("Chian","China China","China UK"))

计数国家/地区:

list_country<-strsplit(as.character(address_countries2$address_countries)," ")
list_country
[[1]]
[1] "Chian"

[[2]]
[1] "China" "China"

[[3]]
[1] "China" "UK"  

n_countries列

Adding "n_countries" column

address_countries2$n_countries<-unlist(lapply(lapply(list_country, unique),length))

输出

address_countries2
        address_countries n_countries
        1             Chian           1
        2       China China           1
        3          China UK           2

这篇关于将唯一/本义词计数到新列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-27 16:24