问题描述
我正在使用ConvNets建立一个进行天气预报的模型.我的输入数据是96x144矩阵(代表地理区域)的10K样本,在网格的每个点上的固定高度都具有变量Z(地势高度)的值.如果我包括3个不同的高度(Z在不同的高度中非常不同),则我将具有以下输入形状:(num_samples,96,144,3).样本是每小时一次,一个样本= 1小时.我有将近2年的数据.输入的数据(Z)代表该小时的大气状态.
这可以被认为是具有3个通道的图像,但不是0-256范围内的像素值,而是Z的范围更大(最后一个通道的高度范围为7500至9500,第一个一个具有500到1500 aprox的范围.
我想预测降水(会下雨还是下雨?仅是二进制,是或否).
在那个网格中,在我的国家那个空间区域,我仅具有特定(x,y)点的输出数据(整个区域只有122个气象站,其中有降雨数据).只有122(x,y)个点,我的值为1(那小时下雨)或0(没有).
所以我的输出矩阵是一个(num_samples,122)向量,如果该样本(该小时)下雨,则该站点的索引中包含1,否则,该索引中包含0.
因此,我在VGG16模型和该模型之间混合使用了 https://github.com/prl900/precip-encoder-decoders/blob/master/encoder_vgg16.py ,这是我在纸上找到的用于该特定应用程序的模型.
我想知道我是否以正确的方式构建模型,我只是更改了输入层以匹配我的形状,更改了FC层的最后一层以匹配我的类(122,因为对于特定的输入样本,我希望有一个0x或1的1x122向量,具体取决于该站是否下雨,对吗?).而且由于概率不是互斥的(如果在一个以上的站点下雨,我可以有多个1),所以我在最后一层使用了"Sigmoid"激活.
我不知道要在编译中使用哪个度量,并且acc,mae和categorical acc在所有时期都保持相同(在第二个时期略有增加,但是之后,acc和val_acc保持不变)每个时代).
并且,在输出矩阵中有空值(车站没有数据的小时数),我只是用-1值(如我不知道"的标签)来填充NaN.这可能是因为什么都不起作用的原因?
非常感谢您的帮助,对不起您的解释.
def get_vgg16():
model = Sequential()
# Conv Block 1
model.add(BatchNormalization(axis=3, input_shape=(96,144,3)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 2
model.add(BatchNormalization(axis=3))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 3
model.add(BatchNormalization(axis=3))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 4
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 5
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# FC layers
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
model.add(Dense(122, activation='sigmoid'))
#adam = Adam(lr=0.001)
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=[metrics.categorical_accuracy,metrics.binary_accuracy, 'acc'])
print(model.summary())
return model
要改进模型,需要考虑多种因素:
您的损失选择
您可以在这里做各种事情.您可以选择使用L2损耗(最小化平方距离),其中每个站的目标都没有下雨(0)或下雨(1).另一个(更准确的)选择是将每个输出视为该站下雨的概率.然后,您将对每个输出值应用一个二进制交叉熵损失. >
二元交叉熵只是应用于两类分类问题的规则交叉熵.请注意,只有两种可能的结果时,P(y)= 1-P(x).因此,您不需要添加任何额外的神经元.
弥补损失
不要将丢失的目标设置为-1.这没有意义,只会给训练带来噪音.假设您正在使用L2损失.如果您的网络预测该值会下雨,那将意味着(1-(-1))^ 2 = 4,这是一个非常高的预测误差.相反,您希望网络忽略这些情况.
您可以通过掩盖损失来做到这一点.假设您进行了Y =(num_samples,122)个预测,并且目标矩阵T的形状相同.您可以定义一个大小相同的二进制掩码M,其中一个为您知道的值,而在缺失值位置为零.那么,您的损失将为L = M *损失(Y,T).对于缺失的值,损失将始终为0,没有梯度:从它们中将不会学到任何东西.
标准化输入
标准化/标准化始终是一种好习惯.这样可以避免某些功能比其他功能具有更大的相关性,从而加快了培训速度.在输入幅度很大的情况下,它也有助于稳定训练,防止梯度爆炸.
在您的情况下,您有三个通道,每个通道遵循不同的分布(它具有不同的最小值和最大值).在计算min + max/mean + stdv值时,您需要分别考虑每个通道(高度)的所有样本数据,然后应用这两个值对所有样本上的对应通道进行归一化/标准化.也就是说,给定一个大小为(N,96,144,3)的张量,请分别对每个大小为(N,96,144,1)的子张量进行标准化/标准化.您将需要对测试数据应用相同的变换,因此请保存缩放比例值以备后用.
I am using ConvNets to build a model to make weather forecast. My input data is 10K samples of a 96x144 matrix (which represents a geographic region) with values of a variable Z (geopotential height) in each point of the grid at a fixed height. If I include 3 different heights (Z is very different in different heights) then I have this input shape: (num_samples,96,144,3). The samples are for every hour, one sample = one hour. I have nearly 2 years of data. And the input data (Z) represents the state of the atmosphere in that hour.
That can be thought as an image with 3 channels, but instead of pixel values in a 0-256 range i have values of Z in a much larger range (last channel of height has a range of 7500 to 9500 and the first one has a range of 500 to 1500 aprox).
I want to predict precipitation (will it rain or not? just that, binary, yes or no).
In that grid, that region of space in my country, i only have output data in specific (x,y) points (just 122 weather stations with rain data in the entire region). There are just 122 (x,y) points where i have values of 1 (it rained that hour) or 0 (didn't).
So my output matrix is a (num_samples,122) vector which contains 1 in the station index if in that sample (that hour) did rain or 0 if it didn't.
So i used a mix between VGG16 model and this one https://github.com/prl900/precip-encoder-decoders/blob/master/encoder_vgg16.py which is a model used for this specific application that i found on a paper.
I wish to know if i'm building the model the right way, I just changed the input layer to match my shape and the last layer of the FC layer to match my classes (122, because for a specific sample of input, i wish to have an 1x122 vector with a 0 or 1 depending if in that station rained or not, is this right?). And because of the probabilities are not mutually-exclusive (i can have many 1s if it rained in more than one station) i used the 'sigmoid' activation in the last layer.
I DON'T know which metric to use in the compile, and acc, mae, and categorical acc are just staying the same all epochs (in the second epoch increases a little but after of that, acc and val_acc stay the same for every epoch).
AND, in the output matrix there are null values (hours in which the station doesn't have data), i am just filling that NaNs with a -1 value (like an 'i don't know' label). This may be the reason because nothing works?
Thanks for the help and sorry for the over-explanation.
def get_vgg16():
model = Sequential()
# Conv Block 1
model.add(BatchNormalization(axis=3, input_shape=(96,144,3)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 2
model.add(BatchNormalization(axis=3))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 3
model.add(BatchNormalization(axis=3))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 4
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Conv Block 5
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization(axis=3))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# FC layers
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
model.add(Dense(122, activation='sigmoid'))
#adam = Adam(lr=0.001)
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=[metrics.categorical_accuracy,metrics.binary_accuracy, 'acc'])
print(model.summary())
return model
There are various things to consider in order to improve the model:
Your choice of loss
You could do various things here. Using a L2 loss (squared distance minimization) is an option, where your targets are no rain (0) or rain (1) for each station. Another (more accurate) option would be to consider each output as the probability of it raining at that station. Then, you would apply a binary cross entropy loss for each one of the output values.
The binary cross entropy is just the regular cross entropy applied to two-class classification problems. Please note that P(y) = 1 - P(x) when there are only two possible outcomes. As such, you don't need to add any extra neurons.
Mask the loss
Do not set the missing targets to -1. This does not make sense and only introduces noise to the training. Imagine you are using an L2 loss. If your network predicts rain for that value, that would mean (1 - (-1))^2 = 4, a very high prediction error. Instead, you want the network to ignore these cases.
You can do that by masking the losses. Lets say you make Y = (num_samples, 122) predictions, and have an equally shaped target matrix T. You could define a binary mask M of the same size, with ones for the values you know, and zeros in the missing value locations. Then, your loss would be L = M * loss(Y, T). For missing values, the loss would always be 0, with no gradient: nothing would be learnt from them.
Normalize the inputs
It is always good practice to normalize/standardize the inputs. This avoids some features having more relevance than others, speeding up the training. In cases where the inputs have very large magnitudes, it also helps stabilise the training, preventing gradient explosions.
In your case, you have three channels, and each one follows a different distribution (it has a different minimum and maximum value). You need to consider, separately for each channel (height), the data on all samples when computing the min+max / mean+stdv values, and then apply these two values to normalize/standardize the corresponding channel on all samples. That is, given a tensor of size (N,96,144,3), normalize/standardize each sub-tensor of size (N,96,144,1) separately. You will need to apply the same transform to the test data, so save the scaling values for later.
这篇关于ConvNet缺少用于天气预报的输出数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!