本文介绍了从Vowpal Wabbit的内存中读取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 有没有办法发送数据训练模型在Vowpal Wabbit没有写入磁盘? 这是我想要做的。我有一个相对较大的数据集在csv(约2gb)适合内存没有问题。我把它加载到一个数据框架,我有一个函数,将数据框架中的数据转换为VW格式。 现在,为了训练一个模型,我必须先将转换后的数据写入文件,然后将该文件提供给VW。写入磁盘部分太长了,特别是因为我想尝试不同的具有不同特征转换的各种模型,因此我必须多次将数据写入磁盘。 所以,假设我可以在R中创建一个字符向量,其中每个元素是一个VW格式的数据行,我如何将它馈送到VW而不写入磁盘? 我考虑使用守护进程模式并将字符向量写入localhost连接,但是我无法在守护进程模式下获得VW到 train - 我不是确保这是可能的。 我愿意使用c ++(通过Rcpp包)如果必要使这项工作。 非常感谢您提前。 更新: 感谢大家的帮助。如果有人感兴趣,我只是输出到VW的建议在答案,如下: #两个示例行data datarows #打开到VW $ b的连接$ b con< - pipe(vw -f my_model.vw)#写入连接并关闭 writeLines(datarows,con) close(con) 解决方案 Vowpal Wabbit支持从标准输入读取数据(cat train.dat | vw) ,所以你可以直接从R打开一个管道。 守护进程模式支持训练。如果你需要增量/连续学习,你可以使用一个虚拟示例,其标记以字符串save开头。您也可以指定模型文件名: 1 save_filename | 另一种选择是使用VW作为库,请参阅示例。 请注意,VW支持各种使用特征命名空间的特征工程。 Is there a way to send data to train a model in Vowpal Wabbit without writing it to disk?Here's what I'm trying to do. I have a relatively large dataset in csv (around 2gb) which fits in memory with no problem. I load it in R into a data frame, and I have a function to convert the data in that dataframe into VW format.Now, in order to train a model, I have to write the converted data to a file first, and then feed that file to VW. And the writing to disk part takes way too long, especially since I want to try different various models with different feature transformations, and thus I have to write the data to disk multiple times.So, assuming I'm able to create a character vector in R, in which each element is a row of data in VW format, how could I feed that into VW without writing it to disk?I considered using the daemon mode and writing the character vector to a localhost connection, but I couldn't get VW to train in daemon mode -- I'm not sure this is even possible.I'm willing to use c++ (through the Rcpp package) if necessary to make this work.Thank you very much in advance.UPDATE:Thank you everyone for your help. In case anyone's interested, I just piped the output to VW as suggested in the answer, like so:# Two sample rows of datadatarows <- c("1 |name 1:1 2:4 4:1", "-1 |name 1:1 4:1")# Open connection to VWcon <- pipe("vw -f my_model.vw")# Write to connection and closewriteLines(datarows, con)close(con) 解决方案 Vowpal Wabbit supports reading data from standard input (cat train.dat | vw), so you can open a pipe directly from R.Daemon mode supports training. If you need incremental/contiguous learning, you can use a trick with a dummy example whose tag starts with string "save". Optionally you can specify the model filename as well:1 save_filename|Yet another option is to use VW as library, see an example.Note that VW supports various feature engineering using feature namespaces. 这篇关于从Vowpal Wabbit的内存中读取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-18 23:22
查看更多