我该如何编码网址
http://www.chemspider.com/inchi.asmx/InChIToSMILES?inchi=InchI=1S/C21H30O9/c1-11(5-6-21(28)12(2)8-13(23)9-20(21,3)4)7-15(24)30-19-18(27)17(26)16(25)14(10-22)29-19/h5-8,14,16-19,22,25-28H,9-10H2,1-4H3/b6-5+,11-7-/t14-,16-,17+,18-,19+,21-/m1/s1&token=e4a6d6fb-ae07-4cf6-bae8-c0e6115bc681
做这个
http://www.chemspider.com/inchi.asmx/InChIToSMILES?inchi=InChI%3D1S%2FC21H30O9%2Fc1-11(5-6-21(28)12(2)8-13(23)9-20(21%2C3)4)7-15(24)30-19-18(27)17(26)16(25)14(10-22)29-19%2Fh5-8%2C14%2C16-19%2C22%2C25-28H%2C9-10H2%2C1-4H3%2Fb6-5%2B%2C11-7-%2Ft14-%2C16-%2C17%2B%2C18-%2C19%2B%2C21-%2Fm1%2Fs1
在R上?
我试过了
URL编码
但它不起作用。
谢谢
最佳答案
似乎您想摆脱除URL GET数据说明符之外的所有内容,然后再对关联的数据进行编码。
url <- "..."
library(stringi)
(addr <- stri_replace_all_regex(url, "\\?.*", ""))
## [1] "http://www.chemspider.com/inchi.asmx/InChIToSMILES"
args <- stri_match_first_regex(url, "[?&](.*?)=([^&]+)")
(data <- stri_replace_all_regex(
stri_trans_general(args[,3], "[^a-zA-Z0-9\\-()]Any-Hex/XML"),
"&#x([0-9a-fA-F]{2});", "%$1"))
## [1] "InchI%3D1S%2FC21H30O9%2Fc1-11(5-6-21(28)12(2)8-13(23)9-20(21%2C3)4)7-15(24)30-19-18(27)17(26)16(25)14(10-22)29-19%2Fh5-8%2C14%2C16-19%2C22%2C25-28H%2C9-10H2%2C1-4H3%2Fb6-5%2B%2C11-7-%2Ft14-%2C16-%2C17%2B%2C18-%2C19%2B%2C21-%2Fm1%2Fs1"
(addr <- stri_c(addr, "?", args[,2], "=", data))
## [1] "http://www.chemspider.com/inchi.asmx/InChIToSMILES?inchi=InchI%3D1S%2FC21H30O9%2Fc1-11(5-6-21(28)12(2)8-13(23)9-20(21%2C3)4)7-15(24)30-19-18(27)17(26)16(25)14(10-22)29-19%2Fh5-8%2C14%2C16-19%2C22%2C25-28H%2C9-10H2%2C1-4H3%2Fb6-5%2B%2C11-7-%2Ft14-%2C16-%2C17%2B%2C18-%2C19%2B%2C21-%2Fm1%2Fs1"
在这里,我使用了ICU的音译器(通过
stri_trans_general
)。除A..Z
,a..z
,0..9
,(
,)
和-
以外的所有字符均已转换为十六进制表示形式(似乎使用
URLencode
甚至,
都不处理reserved=TRUE
)形式为&#xNN;
。然后,每个&#xNN;
用%NN
转换为stri_replace_all_regex
。