历时两天,博主终于将informer-tensorflow调试成功了,下一步就可以进行数据集更换和改进了,下面介绍自己的调试过程:

项目介绍

Transformer模型

Transformer是由谷歌团队近期提出的一种新型网络模型,抛弃了传统的CNN和RNN,整个网络结构完全是由Attention机制组成。更准确地讲,Transformer由且仅由self-Attenion和Feed Forward Neural Network组成。一个基于Transformer的可训练的神经网络可以通过堆叠Transformer的形式进行搭建,作者的实验是通过搭建编码器和解码器各6层,总共12层的Encoder-Decoder,并在机器翻译中取得了BLEU值得新高。

大放异彩

作者采用Attention机制的原因是考虑到RNN(或者LSTM,GRU等)的计算限制为是顺序的,也就是说RNN相关算法只能从左向右依次计算或者从右向左依次计算,这种机制带来了两个问题:
时间片 t 的计算依赖 t-1 时刻的计算结果,这样限制了模型的并行能力;
顺序计算的过程中信息会丢失,尽管LSTM等门机制的结构一定程度上缓解了长期依赖的问题,但是对于特别长期的依赖现象,LSTM依旧无能为力。
具体关于transformer请阅读博主这篇博客:
Transformer学习记录

不足:

近年来的研究表明,相对RNN类型的模型来看,Transformer在长距离依赖的表达方面表现出了较高的潜力。然而,Transformer存在几个严重的问题,使其不能直接适用于LSTF问题,例如、高内存使用量和编码器-解码器体系结构固有的局限性。其中Transformer模型主要存在下面三个问题:
self-attention机制的二次计算复杂度问题:self-attention机制的点积操作使每层的时间复杂度和内存使用量为L 。
高内存使用量问题:对长序列输入进行堆叠时,J个encoder-decoder层的堆栈使总内存使用量为 ,这限制了模型在接收长序列输入时的可伸缩性。
预测长期输出的效率问题:Transformer的动态解码过程,使得输出变为一个接一个的输出,后面的输出依赖前面一个时间步的预测结果,这样会导致推理非常慢。

Informer应运而生

面对Transformer表现出的缺陷,各种改进方法层出不穷,而Informer也由此应运而生,其设计目标为:改进Transformer模型,使其计算、内存和体系结构更高效,同时保持更高的预测能力。
该模型具有三个显著特征:

Informer是在2021年的AAAI 最佳论文奖项中一篇来自北京航空航天大学的论文中提出的,其能够解决长序列预测问题(LSTF),其为对Transformer的改进
最开始的版本是pytorch开发的,里面的内容很详细,很多方法是自定义的,由于博主技术有限,对这些自定义函数的解读让我感到头昏脑胀,且之前对pytorch没有太多应用,因此便使用了tensorflow版本。
Informer学习记录之Informer-Tensorflow版本-LMLPHP

环境要求

该项目是基于Tensorflow为后端来开发的,由于里面涉及自定义网络,所以对版本有要求,据说tensorflow1.6之前的keras是不支持自定义网络层的。
博主的项目环境为:
CUDA10.0 CuDNN7.4
tensorflow-gpu 1.14.0
keras 2.2.5
以上这些要求是有联系的,因为博主之前使用的是cuda8,其最高只能支持到tensorflow-gpu 1.0.0,因此博主便又安装了CUDA10.0,相应的,也就需要匹配的CuDNN等。
其余需要scikit-learn0.24.2,numpy1.16.0,pandas1.1.5

项目目录结构

Informer学习记录之Informer-Tensorflow版本-LMLPHP

代码解析

主函数读取参数

# 选择模型(去掉required参数,选择informer模型)
parser.add_argument('--model', type=str, default='informer',help='model of experiment, options: [informer, informerstack, informerlight(TBD)]')

# 数据选择(去掉required参数)
parser.add_argument('--data', type=str, default='WTH', help='data')
# 数据上级目录
parser.add_argument('--root_path', type=str, default='./data/', help='root path of the data file')
# 数据名称
parser.add_argument('--data_path', type=str, default='WTH.csv', help='data file')
# 预测类型(多变量预测、单变量预测、多元预测单变量)
parser.add_argument('--features', type=str, default='M', help='forecasting task, options:[M, S, MS]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate')
# 数据中要预测的标签列
parser.add_argument('--target', type=str, default='OT', help='target feature in S or MS task')
# 数据重采样(h:小时)
parser.add_argument('--freq', type=str, default='h', help='freq for time features encoding, options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], you can also use more detailed freq like 15min or 3h')
# 模型保存位置
parser.add_argument('--checkpoints', type=str, default='./checkpoints/', help='location of model checkpoints')

# 输入序列长度
parser.add_argument('--seq_len', type=int, default=96, help='input sequence length of Informer encoder')
# 先验序列长度
parser.add_argument('--label_len', type=int, default=48, help='start token length of Informer decoder')
# 预测序列长度
parser.add_argument('--pred_len', type=int, default=24, help='prediction sequence length')
# Informer decoder input: concat[start token series(label_len), zero padding series(pred_len)]

# 编码器default参数为特征列数
parser.add_argument('--enc_in', type=int, default=7, help='encoder input size')
# 解码器default参数与编码器相同
parser.add_argument('--dec_in', type=int, default=7, help='decoder input size')
parser.add_argument('--c_out', type=int, default=7, help='output size')

# 模型宽度
parser.add_argument('--d_model', type=int, default=512, help='dimension of model')
# 多头注意力机制头数
parser.add_argument('--n_heads', type=int, default=8, help='num of heads')
# 模型中encoder层数
parser.add_argument('--e_layers', type=int, default=2, help='num of encoder layers')
# 模型中decoder层数
parser.add_argument('--d_layers', type=int, default=1, help='num of decoder layers')
# 网络架构循环次数
parser.add_argument('--s_layers', type=str, default='3,2,1', help='num of stack encoder layers')
# 全连接层神经元个数
parser.add_argument('--d_ff', type=int, default=2048, help='dimension of fcn')
# 采样因子数
parser.add_argument('--factor', type=int, default=5, help='probsparse attn factor')
# 1D卷积核
parser.add_argument('--padding', type=int, default=0, help='padding type')
# 是否需要序列长度衰减
parser.add_argument('--distil', action='store_false', help='whether to use distilling in encoder, using this argument means not using distilling', default=True)
# 神经网络正则化操作
parser.add_argument('--dropout', type=float, default=0.05, help='dropout')
# attention计算方式
parser.add_argument('--attn', type=str, default='prob', help='attention used in encoder, options:[prob, full]')
# 时间特征编码方式
parser.add_argument('--embed', type=str, default='timeF', help='time features encoding, options:[timeF, fixed, learned]')
# 激活函数
parser.add_argument('--activation', type=str, default='gelu',help='activation')
# 是否输出attention
parser.add_argument('--output_attention', action='store_true', help='whether to output attention in ecoder')
# 是否需要预测
parser.add_argument('--do_predict', action='store_true', help='whether to predict unseen future data')
parser.add_argument('--mix', action='store_false', help='use mix attention in generative decoder', default=True)
# 数据读取
parser.add_argument('--cols', type=str, nargs='+', help='certain cols from the data files as the input features')
# 多核训练(windows下选择0,否则容易报错)
parser.add_argument('--num_workers', type=int, default=0, help='data loader num workers')
# 训练轮数
parser.add_argument('--itr', type=int, default=2, help='experiments times')
# 训练迭代次数
parser.add_argument('--train_epochs', type=int, default=6, help='train epochs')
# mini-batch大小
parser.add_argument('--batch_size', type=int, default=32, help='batch size of train input data')
# 早停策略
parser.add_argument('--patience', type=int, default=3, help='early stopping patience')
# 学习率
parser.add_argument('--learning_rate', type=float, default=0.0001, help='optimizer learning rate')
parser.add_argument('--des', type=str, default='test',help='exp description')
# loss计算方式
parser.add_argument('--loss', type=str, default='mse',help='loss function')
# 学习率衰减参数
parser.add_argument('--lradj', type=str, default='type1',help='adjust learning rate')

# 是否使用GPU加速训练
parser.add_argument('--use_gpu', type=bool, default=True, help='use gpu')
parser.add_argument('--gpu', type=int, default=0, help='gpu')

# 取参数值
args = parser.parse_args()
# 获取GPU
args.use_gpu = True if torch.cuda.is_available() and args.use_gpu else False

调用模型开启实验

args = parser.parse_args()

data_parser = {
    'ETTh1': {'data': 'ETTh1.csv', 'T': 'OT', 'M': [7, 7, 7], 'S': [1, 1, 1]},
    'ETTh2': {'data': 'ETTh2.csv', 'T': 'OT', 'M': [7, 7, 7], 'S': [1, 1, 1]},
    'ETTm1': {'data': 'ETTm1.csv', 'T': 'OT', 'M': [7, 7, 7], 'S': [1, 1, 1]},
}
if args.data in data_parser.keys():
    data_info = data_parser[args.data]
    args.data_path = data_info['data']
    args.target = data_info['T']
    args.enc_in, args.dec_in, args.c_out = data_info[args.features]
Exp = ExpInformer
for ii in range(args.itr):
    setting = '{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_at{}_eb{}_{}_{}'.format(args.model, args.data,                                                                                          args.features,                                                                                         args.seq_len, args.label_len,                                                                                        args.pred_len,                                                                                         args.d_model, args.n_heads,                                                                                         args.e_layers, args.d_layers,                                                                                         args.d_ff, args.attn,                                                                                          args.embed, args.des, ii)
    exp = Exp(args)
    print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
    exp.train(setting)
    print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting))
    exp.test(setting)

Exp模块开启训练与测试

Train方法

def train(self, setting):
    train_data = self._get_data(flag='train')
    valid_data = self._get_data(flag='val')
    train_steps = len(train_data)
    valid_steps = len(valid_data)
    early_stopping = tf.keras.callbacks.EarlyStopping(patience=self.args.patience, restore_best_weights=True)
    checkpoint_path = f'./checkpoints/{setting}'
    if not os.path.exists(checkpoint_path):
        os.makedirs(checkpoint_path)
    model_optim = self._select_optimizer()
    loss = tf.keras.losses.mse
    self.model.compile(optimizer=model_optim, loss=loss, metrics=['mse', 'mae'])
    self.model.fit(train_data.get_dataset(),
                   steps_per_epoch=train_steps,
                   validation_data=valid_data.get_dataset(),
                   validation_steps=valid_steps,
                   callbacks=[early_stopping],
                   epochs=self.args.train_epochs)
    self.model.save_weights(checkpoint_path+'/model.h5')
    return self.model

在Exp类调用时,进行初始化构造网络模型和数据集:

 def _build_model(self):
        model = Informer(
            self.args.enc_in,
            self.args.dec_in,
            self.args.c_out,
            self.args.seq_len,
            self.args.label_len,
            self.args.pred_len,
            self.args.batch_size,
            self.args.factor,
            self.args.d_model,
            self.args.n_heads,
            self.args.e_layers,
            self.args.d_layers,
            self.args.d_ff,
            self.args.dropout,
            self.args.attn,
            self.args.embed,
            self.args.data[:-1],
            self.args.activation
        )
        return model

获取数据集

def _get_data(self, flag):
    args = self.args
    if flag == 'test':
        shuffle_flag = False
        drop_last = True
        batch_size = args.batch_size
    else:
        shuffle_flag = True
        drop_last = True
        batch_size = args.batch_size
    dataset = DatasetETT(
        root_path=args.root_path,
        data_path=args.data_path,
        flag=flag,
        size=[args.seq_len, args.label_len, args.pred_len],
        features=args.features,
        shuffle=shuffle_flag,
        drop_last=drop_last,
        batch_size=batch_size,
        is_minute=self.args.data == 'ETTm1'
    )
    print(flag, len(dataset))
    return dataset

Test方法

  def test(self, setting):
        test_data = self._get_data(flag='test')
        loss, mse, mae = self.model.evaluate_generator(test_data.get_dataset(), len(test_data))
        print('mse:{}, mae:{}'.format(mse, mae))

数据预处理 data_loader.py

  def __init__(self, root_path, flag='train', size=None,
                 features='S', data_path='ETTh1.csv',
                 target='OT', scale=True, batch_size=32, shuffle=True, drop_last=True, is_minute=False):

        # size [seq_len, label_len pred_len]
        # info
        if size == None:
            self.seq_len = 24 * 4 * 4
            self.label_len = 24 * 4
            self.pred_len = 24 * 4
        else:
            self.seq_len = size[0]
            self.label_len = size[1]
            self.pred_len = size[2]
        # init
        assert flag in ['train', 'test', 'val']
        type_map = {'train': 0, 'val': 1, 'test': 2}
        self.set_type = type_map[flag]

        self.features = features
        self.target = target
        self.scale = scale
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.drop_last = drop_last
        self.is_minute = is_minute

        self.root_path = root_path
        self.data_path = data_path

        self.__read_data__()

    def __read_data__(self):
        scaler = StandardScaler()
        df_raw = pd.read_csv(os.path.join(self.root_path,
                                          self.data_path))

        minute_multiplier = 1
        if self.is_minute:
            minute_multiplier = 4

        border1s = [0, 12 * 30 * 24 * minute_multiplier - self.seq_len,
                    12 * 30 * 24 + 4 * 30 * 24 * minute_multiplier - self.seq_len]
        border2s = [12 * 30 * 24 * minute_multiplier,
                    12 * 30 * 24 + 4 * 30 * 24 * minute_multiplier,
                    12 * 30 * 24 + 8 * 30 * 24 * minute_multiplier]
        border1 = border1s[self.set_type]
        border2 = border2s[self.set_type]

        if self.features == 'M':
            cols_data = df_raw.columns[1:]
            df_data = df_raw[cols_data]
        elif self.features == 'S':
            df_data = df_raw[[self.target]]

        if self.scale:
            data = scaler.fit_transform(df_data.values)
        else:
            data = df_data.values
        df_stamp = df_raw[['date']][border1:border2]
        df_stamp['date'] = pd.to_datetime(df_stamp.date)
        df_stamp['month'] = df_stamp.date.apply(lambda row: row.month, 1)
        df_stamp['day'] = df_stamp.date.apply(lambda row: row.day, 1)
        df_stamp['weekday'] = df_stamp.date.apply(lambda row: row.weekday(), 1)
        df_stamp['hour'] = df_stamp.date.apply(lambda row: row.hour, 1)
        if self.is_minute:
            df_stamp['minute'] = df_stamp.date.apply(lambda row: row.minute, 1)
            df_stamp['minute'] = df_stamp.minute.map(lambda x: x // 15)
        data_stamp = df_stamp.drop(['date'], 1).values
        self.data_x = data[border1:border2]
        # self.data_y = data[border1:border2]
        self.data_stamp = data_stamp

网络模型构造

数据嵌入层

class DataEmbedding(nn.Module):
    def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
        super(DataEmbedding, self).__init__()

        self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
        self.position_embedding = PositionalEmbedding(d_model=d_model)
        self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type, freq=freq) if embed_type!='timeF' else TimeFeatureEmbedding(d_model=d_model, embed_type=embed_type, freq=freq)

        self.dropout = nn.Dropout(p=dropout)

    def forward(self, x, x_mark):
        x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
        
        return self.dropout(x)

Encoder层


class Encoder(Layer):
    def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
        super(Encoder, self).__init__()
        self.attn_layers = attn_layers
        self.conv_layers = conv_layers if conv_layers is not None else None
        self.norm = norm_layer

    def call(self, x, attn_mask=None):
        # x [B, L, D]
        if self.conv_layers is not None:
            for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
                x = attn_layer(x, attn_mask=attn_mask)
                x = conv_layer(x)
            x = self.attn_layers[-1](x)
        else:
            for attn_layer in self.attn_layers:
                x = attn_layer(x, attn_mask=attn_mask)

        if self.norm is not None:
            x = self.norm(x)

        return x

Decoder层

注意力机制层

class AttentionLayer(Layer):
    def __init__(self, attention, d_model, n_heads, d_keys=None,
                 d_values=None):
        super(AttentionLayer, self).__init__()

        d_keys = d_keys or (d_model//n_heads)
        d_values = d_values or (d_model//n_heads)
        self.d_model = d_model

        self.inner_attention = attention
        self.query_projection = tf.keras.layers.Dense(d_keys * n_heads)
        self.key_projection = tf.keras.layers.Dense(d_keys * n_heads)
        self.value_projection = tf.keras.layers.Dense(d_values * n_heads)
        self.out_projection = tf.keras.layers.Dense(d_model)
        self.n_heads = n_heads

    def build(self, input_shape):
        print(input_shape)
        B, L, _ = input_shape[0]
        _, S, _ = input_shape[1]
        H = self.n_heads

        self.queries = self.add_weight(shape=(B, L, H, self.d_model),
                                 initializer='random_normal',
                                 trainable=True)

        self.keys = self.add_weight(shape=(B, S, H, self.d_model),
                                       initializer='random_normal',
                                       trainable=True)

        self.values = self.add_weight(shape=(B, S, H, self.d_model),
                                       initializer='random_normal',
                                       trainable=True)

    def call(self, inputs, attn_mask=None):
        queries, keys, values = inputs
        B, L, _ = queries.shape
        _, S, _ = keys.shape
        H = self.n_heads

        self.queries = tf.reshape(self.query_projection(queries), (B, L, H, -1))
        self.keys = tf.reshape(self.key_projection(keys), (B, S, H, -1))
        self.values = tf.reshape(self.value_projection(values), (B, S, H, -1))

        out = tf.reshape(self.inner_attention([self.queries, self.keys, self.values], attn_mask=attn_mask), (B, L, -1))

        return self.out_projection(out)

构造模型(拼接)

class Informer(tf.keras.Model):
    def __init__(self, enc_in, dec_in, c_out, seq_len, label_len, out_len, batch_size,
                 factor=5, d_model=512, n_heads=8, e_layers=3, d_layers=2, d_ff=512,
                 dropout=0.0, attn='prob', embed='fixed', data='ETTh', activation='gelu'):
        super(Informer, self).__init__()
        self.pred_len = out_len
        self.attn = attn
        self.seq_len = seq_len
        self.label_len = label_len
        self.batch_size = batch_size

        # Encoding
        self.enc_embedding = DataEmbedding(enc_in, d_model, embed, data, dropout)
        self.dec_embedding = DataEmbedding(dec_in, d_model, embed, data, dropout)
        # Attention
        Attn = ProbAttention if attn == 'prob' else FullAttention
        # Encoder
        self.encoder = Encoder(
            [
                EncoderLayer(
                    AttentionLayer(Attn(False, factor, attention_dropout=dropout),
                                   d_model, n_heads),
                    d_model,
                    d_ff,
                    dropout=dropout,
                    activation=activation
                ) for l in range(e_layers)
            ],
            [
                ConvLayer(
                    d_model
                ) for l in range(e_layers - 1)
            ],
            norm_layer=tf.keras.layers.LayerNormalization()
        )
        # Decoder
        self.decoder = Decoder(
            [
                DecoderLayer(
                    AttentionLayer(FullAttention(True, factor, attention_dropout=dropout),
                                   d_model, n_heads),
                    AttentionLayer(FullAttention(False, factor, attention_dropout=dropout),
                                   d_model, n_heads),
                    d_model,
                    d_ff,
                    dropout=dropout,
                    activation=activation,
                )
                for l in range(d_layers)
            ],
            norm_layer=tf.keras.layers.LayerNormalization()
        )
      
        self.projection = tf.keras.layers.Dense(c_out)
    def call(self, inputs, enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
        x_enc, x_dec, x_mark_enc, x_mark_dec = inputs
        x_enc.set_shape((self.batch_size, self.seq_len, x_enc.shape[2]))
        x_mark_enc.set_shape((self.batch_size, self.seq_len, x_mark_enc.shape[2]))
        x_dec.set_shape((self.batch_size, self.label_len+self.pred_len, x_dec.shape[2]))
        x_mark_dec.set_shape((self.batch_size, self.label_len+self.pred_len, x_mark_dec.shape[2]))
        enc_out = self.enc_embedding(x_enc, x_mark_enc)
        enc_out = self.encoder(enc_out, attn_mask=enc_self_mask)
        dec_out = self.dec_embedding(x_dec, x_mark_dec)
        dec_out = self.decoder(dec_out, enc_out, x_mask=dec_self_mask, cross_mask=dec_enc_mask)
        dec_out = self.projection(dec_out)
        return dec_out[:, -self.pred_len:, :]  # [B, L, D]

运行结果

调试过程可谓是相当艰辛,应该是我的环境问题导致的,主要是一些数据类型问题导致的,最后看看我们的运行结果:
Informer学习记录之Informer-Tensorflow版本-LMLPHP

11-06 08:32