问题描述
我正在尝试提取一条推文中的主题标签.所有推文均在csv文件的一列中.尽管在解析字符串和将提取的主题标签放入列表方面有很多资源,但我还没有遇到关于如何解析已经存储在列表或字典中的推文的解决方案.这是我的代码:
I am trying to extract the hashtags in a tweet. All of the tweets are in one column in a csv file. Although, there are resources on parsing strings and putting the extracted hashtags into a list, I haven't come across a solution on how to parse tweets already stored in list or dictionary. Here is my code:
with open('hash.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for line in reader:
tweet = line[1:2] #This is the column that contains the tweets
for x in tweet:
match = re.findall(r"#(\w+)", x)
if match: print x
我可以预料得到"TypeError:期望的字符串或缓冲区",因为它是真的,在这种情况下,"tweet"不是字符串,而是一个列表.
I predictably get 'TypeError: expected string or buffer', because it's true, 'tweet' in this case is not a string- it is a list.
到目前为止,这是我的研究带给我的地方:
Here is where my research has taken me thus far:
http://www.tutorialspoint.com/python/python_reg_expressions.htm
所以我要遍历比赛列表,但我仍然得到整个推文,而不是哈希标签项.我可以剥离主题标签,但是我想剥离除主题标签之外的所有内容.
So I'm iterating through the match list and I'm still getting the whole tweet and not the hashtagged item. I was able to strip the hashtag away but I want to strip everything but the hashtag.
with open('hash.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for line in reader:
tweet = line[1:2]
print tweet
for x in tweet:
match = re.split(r"#(\w+)", x)
hashtags = [i for i in tweet if match]
推荐答案
实际上,您的问题可能只是语法问题.您正在呼叫 tweet = line [1:2]
.在python中,这说从1-2切片",这在逻辑上是您想要的.不幸的是,它以列表的形式返回了答案-因此您最终得到的是[tweet]而不是tweet!
Actually, your problem is probably just a syntax problem. You are calling tweet = line[1:2]
. In python, this says 'take a slice from 1 - 2', which is logically what you want. Unfortunately, it returns the answer as a list -- so you end up with [tweet] instead of tweet!
尝试将该行更改为 tweet = line [1]
,看看是否可以解决您的问题.
Try changing that line to tweet = line[1]
and see if that fixes your problem.
另外,这可能只是您的错字,但我认为您可能希望检查缩进-我认为它应该看起来像
On a separate note, this is probably just a typo on your part, but I think you might want to check your indentation -- I think it should look like
for line in reader:
tweet = line[1:2] #This is the column that contains the tweets
for x in tweet:
match = re.findall(r"#(\w+)", x)
if match: print x
除非我误解了您的逻辑.
unless I'm misunderstanding your logic.
这篇关于在Python的csv列中解析推文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!