我有一个数据文件,其中包含某些特定格式的数据,并且在处理时有一些额外的行要忽略。我需要处理数据并基于该数据计算一个值。

样本数据:

Average monthly temperatures in Dubuque, Iowa,
January 1964 through december 1975, n=144

24.7    25.7    30.6    47.5    62.9    68.5    73.7    67.9    61.1    48.5    39.6    20.0
16.1    19.1    24.2    45.4    61.3    66.5    72.1    68.4    60.2    50.9    37.4    31.1
10.4    21.6    37.4    44.7    53.2    68.0    73.7    68.2    60.7    50.2    37.2    24.6
21.5    14.7    35.0    48.3    54.0    68.2    69.6    65.7    60.8    49.1    33.2    26.0
19.1    20.6    40.2    50.0    55.3    67.7    70.7    70.3    60.6    50.7    35.8    20.7
14.0    24.1    29.4    46.6    58.6    62.2    72.1    71.7    61.9    47.6    34.2    20.4
8.4     19.0    31.4    48.7    61.6    68.1    72.2    70.6    62.5    52.7    36.7    23.8
11.2    20.0    29.6    47.7    55.8    73.2    68.0    67.1    64.9    57.1    37.6    27.7
13.4    17.2    30.8    43.7    62.3    66.4    70.2    71.6    62.1    46.0    32.7    17.3
22.5    25.7    42.3    45.2    55.5    68.9    72.3    72.3    62.5    55.6    38.0    20.4
17.6    20.5    34.2    49.2    54.8    63.8    74.0    67.1    57.7    50.8    36.8    25.5
20.4    19.6    24.6    41.3    61.8    68.5    72.0    71.1    57.3    52.5    40.6    26.2


样本文件的来源:http://robjhyndman.com/tsdldata/data/cryer2.dat

注意:此处,行代表年份,列代表月份。

我正在尝试编写一个函数,该函数从给定的URL返回任何月份的平均温度。

我尝试如下:

def avg_temp_march(f):

   march_temps = []

    # read each line of the file and store the values
    # as floats in a list
    for line in f:
        line = str(line, 'ascii') # now line is a string
        temps = line.split()
    # check that it is not empty.
        if temps != []:
            march_temps.append(float(temps[2]))

    # calculate the average and return it
    return sum(march_temps) / len(march_temps)

avg_temp_march("data5.txt")


但我收到错误line = str(line, 'ascii')

TypeError: decoding str is not supported

最佳答案

我认为没有必要将字符串转换为字符串。

我尝试通过一些修改来修复您的代码:

def avg_temp_march(f):
    # f is a string read from file

    march_temps = []

    for line in f.split("\n"):
        if line == "":  continue
        temps = line.split(" ")
        temps = [t for t in temps if t != ""]

        # check that it is not empty.
        month_index = 2
        if len(temps) > month_index:

            try:
                march_temps.append(float(temps[month_index]))
            except Exception, e:
                print temps
                print "Skipping line:", e
    # calculate the average and return it
    return sum(march_temps) / len(march_temps)


输出:

['Average', 'monthly', 'temperatures', 'in', 'Dubuque,', 'Iowa,']
Skipping line: could not convert string to float: temperatures
['January', '1964', 'through', 'december', '1975,', 'n=144']
Skipping line: could not convert string to float: through
32.475


根据您的原始问题(在进行最新编辑之前),我认为您可以通过这种方式解决您的问题。

# from urllib2 import urlopen
from urllib.request import urlopen #python3

def avg_temp_march(url):
  f = urlopen(url).read()
  data = f.split("\n")[3:] #ingore the first 3 lines
  data = [line.split() for line in data if line!=''] #ignore the empty lines
  data = [map(float, line) for line in data] #Convert all numbers to float
  month_index = 2 # 2 for march
  monthly_sum = sum([line[month_index] for line in data])
  monthly_avg = monthly_sum/len(data)
  return monthly_avg

print avg_temp_march("http://robjhyndman.com/tsdldata/data/cryer2.dat")

07-26 09:11
查看更多