本文介绍了用R从JavaScript中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 感谢您对此感兴趣。 我给了[单调乏味]的任务,看看某些药物的来源国是什么,因为它们是注册的哥伦比亚食品和药物管理局。该机构使用一个JavaScript(.jsp扩展名)的网站,我想知道是否有可能自动化该过程。 这是查找的一步一步: 前往代理网站:代理商咨询网站 在下拉列表中选择Medicamentos左边 在expendiente下(最上面的最右边的框)写下我们要查找的号码(我必须检查的900+中的两个是:2203和3519)。 点击搜索按钮(buscar) 点击下表中的链接 理想情况下,获取以FABRICANTE(制造商)开头的表格行,但能够保存文档就足够了(我打算在以后使用R来获取/清理/分析数据)。 > 点击清理按钮(nueva consulta) 从第3步到第7步重新开始。 我完全不知道这是否可以完成,如果有的话,所以我会很感激任何可以让我从任何方向开始的指导(除了我现在手头的那个:手工查看它们!)。我对R和一些VB很熟悉,但如果可以用任何其他语言,我会试试看。 我尝试过: p> 我试图找到与从javascript中提取数据有关的任何信息,但是我发现的大部分内容都与使用javascript将数据从将不同类型的数据库转换为html / xml;或者只从一个响应中提取数据(这不是我想要自动化的部分),因为一旦我处于响应中,仅查看[源县]的值就很容易了。consult部分是最难的!)。我觉得如此偏离轨道,以至于我无法充分地搜索。我非常感谢指导/想法/起始者 我已经用检查员(firefox)打开了代理网站,但在发现变量expediente是获得expediente的价值(不是很有用!)。我不知道是否可以(以及如何)在页面上迭代以更改该变量的值。 谢谢! / p> 解决方案 我已经使用 phantomjs 和 RSelenium 包。有关如何设置 phantomjs 的详细信息可以在 http://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-saucelabs.html#id2a phantomjs ,而不需要Selenium Server详细信息这里。由于其无头的本质,对于您所勾画的任务应该更加快捷。 您的问题的第一部分可以实现如下: appURL< - http://web.sivicos.gov.co:8080/consultas/consultas/consreg_encabcum.jsp library( RSelenium) pJS remDr remDr $ open() remDr $ navigate(appURL)#获取选择框(MEDICAMENTOS)的第三个列表项( webElem< - remDr $ findElement(css,select [name ='grupo'] option:nnth-child(3)) webElem $ clickElement()#选择此元素#发送文本到输入值=name =expediente webElem< - remDr $ findElement(css,input [name ='expediente']) webElem $ sendKeysToElement(list(2203))#点击Buscar按钮 remDr $ findElement(id,INPUT2)$ clickElement() 现在表单已填入并点击链接。数​​据位于iframe中名称= DATOS。 Iframes需要切换到: $ $ p $ #切换到数据库iframe remDr $ switchToFrame(remDr $ findElement(css,iframe [name ='datos'])) remDr $ findElement(css,a)$ clickElement()#点击iframe中给出的链接 #得到结果数据 appData< - remDr $ getPageSource()[[1]] #关闭幻影js pJS $ stop() iframe的数据现在包含在 appData 。作为一个例子,我们使用简单的提取函数 readHTMLTable 来查看第三个表: readHTMLTable(appData,which = 3) V1 V2 V3 V4 V5 V6 1 Presentacion Comercial< NA> < NA> < NA> < NA> < NA> 2 Expediente Consec Termino Unidad / Medida Cantidad Descripcion 3 000002203 01 0176 ml 60,00 FRASCO AMBAR POR 60 ML 4 000002203 02 0176 ml 120,00 FRASCO AMBAR POR 120 ML 5 000002203 03 0176 ml 90,00 FRASCO AMBAR POR 90 ML V7 V8 V9 1 NA< < NA> < NA> 2详情请见Estado Fecha Inactiv 3 2007/01/30 Activo 4 2007/01/30 Activo 5 2012/03/15 Activo Thanks for taking interest in this.I was given the [tedious] task to look what is the country of origin of some medicins, as they are registered with the colombian food and drug administration. The agency uses a website with a javascript (.jsp extension) and I would like to know if it is possible to automate the process.This is the step by step of the lookup:Go to agency's website: Agency's consult siteSelect "Medicamentos" in the droplist in the leftUnder "expendiente" (rigthmost box in the top) write the number we're looking for (two of the 900+ I have to check are: 2203 and 3519). Radio-button selection is indifferent.hit search button ("buscar")Click the link presented in the table belowIdeally, get the table line that starts with FABRICANTE (manufacturer), but being able to save the document would be enough (I plan to get/clean/analyze the data using R later on).Hit the clean button ("nueva consulta")Start all over from steps 3 to 7.I don't have the slightest idea whether this could be accomplished, and if so, how; so I'd appreciate any guidance that allow me to start in any direction (other than the one I have at hand now: looking them by hand!). I'm familiar with R and some VB, but if it's possible in any other language, I'll give it a try.What I've tried:I tried to find any information related to extracting data from javascript, but most of what I've found is related to using javascript to pass data from different sort of databases into html/xml; or extrating the data from only one response (that's not the part I want to automate, as once I'm at the response, it would be easy to only look at the value [county of origin]. The "consult" part is the hardest!). I've felt so off-track that I think I'm clueless as to how to search adequately. Guidance / ideas /starters are much appreciatedI've opened the agency's site with the inspector (firefox), but stoped just after finding that the variable "expediente" is the one that gets the value for "expediente" (not very useful!). I don't know if possible (and how to) iterate on the page to change the value for that variable.Thanks! 解决方案 I have used phantomjs with the RSelenium package. Details on how to setup phantomjs can be found at http://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-saucelabs.html#id2aphantomjs can be driven directly without the need for a Selenium Server details here . It should be alot quicker for the task you outline due to its headless nature.The first part of your question can be achieved as follows:appURL <- "http://web.sivicos.gov.co:8080/consultas/consultas/consreg_encabcum.jsp"library(RSelenium)pJS <- phantom()remDr <- remoteDriver(browserName = "phantom")remDr$open()remDr$navigate(appURL)# Get the third list item of the select box (MEDICAMENTOS)webElem <- remDr$findElement("css", "select[name='grupo'] option:nth-child(3)")webElem$clickElement() # select this element# Send text to input value="" name="expedientewebElem <- remDr$findElement("css", "input[name='expediente']")webElem$sendKeysToElement(list(2203))# Click the Buscar buttonremDr$findElement("id", "INPUT2")$clickElement()Now the form has been filled in and the link clicked. The data is in an iframe with name="datos".Iframes need to be switched to:# switch to datos iframeremDr$switchToFrame(remDr$findElement("css", "iframe[name='datos']"))remDr$findElement("css", "a")$clickElement() # click the link given in the iframe# get the resulting dataappData <- remDr$getPageSource()[[1]]# close phantom jspJS$stop()The data for the iframe is now contained in appData. As an example we look at the third table using the simple extraction function readHTMLTable:readHTMLTable(appData, which = 3)V1 V2 V3 V4 V5 V61 Presentacion Comercial <NA> <NA> <NA> <NA> <NA> 2 Expediente Consec Termino Unidad / Medida Cantidad Descripcion3 000002203 01 0176 ml 60,00 FRASCO AMBAR POR 60 ML4 000002203 02 0176 ml 120,00 FRASCO AMBAR POR 120 ML5 000002203 03 0176 ml 90,00 FRASCO AMBAR POR 90 MLV7 V8 V91 <NA> <NA> <NA> 2 Fecha insc Estado Fecha Inactiv3 2007/01/30 Activo4 2007/01/30 Activo5 2012/03/15 Activo 这篇关于用R从JavaScript中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-06 07:52