python - 读取csv后从单元格进行数据帧切片

我正在使用CSV和DataFrames从Twitter分析中读取数据。

我想从某些单元格中提取网址

输出是这个过程如下

tweet number tweet id               tweet link              tweet text
1            1.0086341313026E+018   "tweet link goes here"  tweet text goes here https://example.com"

我如何切片此“ tweet文本”以获取其网址？我无法使用[-1：-12]对其进行切片，因为有许多带有不同字符数的tweet。

最佳答案

我相信您想要：

print (df['tweet text'].str[-12:-1])
0    example.com
Name: tweet text, dtype: object

更通用的解决方案是使用regex和str.findall列出所有链接，并在必要时首先通过使用str[0]进行索引选择：

pat = r'(?:http|ftp|https)://(?:[\w_-]+(?:(?:\.[\w_-]+)+))(?:[\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?'

print (df['tweet text'].str.findall(pat).str[0])
0    https://example.com
Name: tweet text, dtype: object

关于python - 读取csv后从单元格进行数据帧切片，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/50667036/