问题描述
我正在尝试使用简化后的代码将数据帧转换为一系列数据:
Im attempting to convert a dataframe into a series using code which, simplified, looks like this:
dates = ['2016-1-{}'.format(i)for i in range(1,21)]
values = [i for i in range(20)]
data = {'Date': dates, 'Value': values}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
ts = pd.Series(df['Value'], index=df['Date'])
print(ts)
但是,打印输出如下:
Date
2016-01-01 NaN
2016-01-02 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 NaN
2016-01-09 NaN
2016-01-10 NaN
2016-01-11 NaN
2016-01-12 NaN
2016-01-13 NaN
2016-01-14 NaN
2016-01-15 NaN
2016-01-16 NaN
2016-01-17 NaN
2016-01-18 NaN
2016-01-19 NaN
2016-01-20 NaN
Name: Value, dtype: float64
NaN
来自哪里? DataFrame
对象上的视图不是Series
类的有效输入吗?
Where does NaN
come from? Is a view on a DataFrame
object not a valid input for the Series
class ?
我已经找到了to_series
对象的to_series
函数,对于DataFrame
s有类似的东西吗?
I have found the to_series
function for pd.Index
objects, is there something similar for DataFrame
s ?
推荐答案
我认为您可以使用 values
,它将列Value
转换为数组:
I think you can use values
, it convert column Value
to array:
ts = pd.Series(df['Value'].values, index=df['Date'])
import pandas as pd
import numpy as np
import io
dates = ['2016-1-{}'.format(i)for i in range(1,21)]
values = [i for i in range(20)]
data = {'Date': dates, 'Value': values}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print df['Value'].values
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
ts = pd.Series(df['Value'].values, index=df['Date'])
print(ts)
Date
2016-01-01 0
2016-01-02 1
2016-01-03 2
2016-01-04 3
2016-01-05 4
2016-01-06 5
2016-01-07 6
2016-01-08 7
2016-01-09 8
2016-01-10 9
2016-01-11 10
2016-01-12 11
2016-01-13 12
2016-01-14 13
2016-01-15 14
2016-01-16 15
2016-01-17 16
2016-01-18 17
2016-01-19 18
2016-01-20 19
dtype: int64
或者您可以使用:
ts1 = pd.Series(data=values, index=pd.to_datetime(dates))
print(ts1)
2016-01-01 0
2016-01-02 1
2016-01-03 2
2016-01-04 3
2016-01-05 4
2016-01-06 5
2016-01-07 6
2016-01-08 7
2016-01-09 8
2016-01-10 9
2016-01-11 10
2016-01-12 11
2016-01-13 12
2016-01-14 13
2016-01-15 14
2016-01-16 15
2016-01-17 16
2016-01-18 17
2016-01-19 18
2016-01-20 19
dtype: int64
谢谢您 @ajcr 以获得更好的解释,为什么会出现NaN
:
Thank you @ajcr for better explanation why you get NaN
:
将Series
或DataFrame
列提供给pd.Series
时,它将使用您指定的index
重新编制索引.由于您的DataFrame
列具有整数index
(而不是date index
),因此会出现很多缺失值.
When you give a Series
or DataFrame
column to pd.Series
, it will reindex it using the index
you specify. Since your DataFrame
column has an integer index
(not a date index
) you get lots of missing values.
这篇关于pandas.Series()使用DataFrame列创建将返回NaN数据条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!