在Linux中,我有类似以下内容:

boost::property_tree::xml_parser::read_xml(argv[1], pt);
// huge parsing
            for (auto const& itemNode : rootNode2.second) {
                const pt::ptree& dealAttributes = itemNode.second.get_child("<xmlattr>", empty_ptree());
                for (auto const& dealAttr : dealAttributes)
                {
                    const std::string& attrName = dealAttr.first;
                    const std::string& attrVal = dealAttr.second.data();
  • 原始xml是Windows-1251。
  • attrVal是俄语文字
  • 我需要它与程序中的常量进行比较。

  • 我应该如何定义这个常数?在Linux中,默认的俄语encon似乎不是Windows-1251,所以我的比较失败了。

    这是一个示例xml:

    <?xml version="1.0" encoding="cp1251"?>
    <root>
        <item accounting_currency_code="RUB" board_name="ФБ Т+2"
            broker_commission="71.85" broker_ref="3/00/2"
            conclusion_date="2013-09-11T00:00:00" conclusion_time="2013-09-11T11:04:35"
            deal_no="480" execution_date="2013-09-12T00:00:00" price="144.700000"
            price_currency_code="RUB" request_no="1976"
            security_grn_code="1-02-008-A" security_name="ГАЗПРОМ ао"
            sell_qnty="5000.00000000" volume_currency="718500.00"
            volume_rur="718500.00"/>
    </root>
    

    最佳答案

    好的,这是我通过“随机样本”“成功”执行的步骤:

  • 创建样本输入:

    称呼它为input.xml并确保将其保存在cp2151中:

    <?xml version="1.0" encoding="windows-1251"?>
    <root>
      <deal id="1" silly="е">
            hello
        </deal>
    </root>
    

    那是U + 0435(名称:CYRILLIC小写字母IE)。
  • 配置系统以支持cp1251语言环境

    对我来说,我不得不
  • 编辑/var/lib/locales/supported.d/local添加ru_RU.CP1251 CP1251,然后添加
  • sudo dpkg-reconfigure locales
  • 埋葬!

    使用注入(inject)特定语言环境的read_xmlwrite_xml

    使用Boost Locale生成知道字符集转换构面的语言环境实例:

    boost::locale::generator gen;
    auto loc = gen.generate("ru_RU.CP1251");
    
  • 利润



  • 完整演示

    Live On Coliru

    #include <boost/property_tree/ptree.hpp>
    #include <boost/property_tree/xml_parser.hpp>
    #include <boost/locale.hpp>
    #include <boost/locale/generator.hpp>
    #include <iostream>
    #include <fstream>
    
    using boost::property_tree::ptree;
    
    static ptree const& empty_ptree() {
        static ptree _instance;
        return _instance;
    }
    
    int main(int argc, char** argv) {
        assert(argc>1);
        boost::locale::generator gen;
        auto loc = gen.generate("ru_RU.CP1251");
    
        ptree pt;
    
        read_xml(argv[1], pt, 0, loc);
    
        ptree::value_type& rootNode2 = *pt.begin();
    
        // huge parsing
        for (auto const& itemNode : rootNode2.second) {
            const ptree& dealAttributes = itemNode.second.get_child("<xmlattr>", empty_ptree());
            for (auto const& dealAttr : dealAttributes)
            {
                const std::string& attrName = dealAttr.first;
                const std::string& attrVal  = dealAttr.second.data();
    
                std::cout << "Attribute '" << attrName << "' hath value "; // '" << attrVal << "'\n";
    
                int pos = 1;
                for (uint8_t ch : attrVal) // prevent sign-extension
                {
                    if (pos++ == 8) {
                        std::cout << '\n';
                        pos = 1;
                    }
                    std::cout << std::hex << std::setw(2) << std::setfill('0') << std::showbase << static_cast<int>(ch) << " ";
                }
                std::cout << "\n";
            }
        }
    
        auto settings = boost::property_tree::xml_writer_make_settings<std::string>(' ', 4, "windows-1251");
        //boost::property_tree::xml_parser::write_xml_element(ofs, "root", pt, 0, settings);
        write_xml("debug.xml", pt, loc, settings);
    }
    

    它在Coliru上运行,包括语言环境支持(荣誉,Coliru!),十六进制转储debug.xml进行验证:
    Attribute 'id' hath value 0x31
    Attribute 'silly' hath value 0xe5
    0000000: 3c3f 786d 6c20 7665 7273 696f 6e3d 2231  <?xml version="1
    0000010: 2e30 2220 656e 636f 6469 6e67 3d22 7769  .0" encoding="wi
    0000020: 6e64 6f77 732d 3132 3531 223f 3e0a 3c72  ndows-1251"?>.<r
    0000030: 6f6f 743e 0a20 2020 200d 2623 3130 3b20  oot>.    .&#10;
    0000040: 200d 2623 3130 3b0a 2020 2020 3c64 6561   .&#10;.    <dea
    0000050: 6c20 6964 3d22 3122 2073 696c 6c79 3d22  l id="1" silly="
    0000060: e522 3e0d 2623 3130 3b20 2020 2020 2020  .">.&#10;
    0000070: 2068 656c 6c6f 0d26 2331 303b 2020 2020   hello.&#10;
    0000080: 3c2f 6465 616c 3e0a 3c2f 726f 6f74 3e0a  </deal>.</root>.
    

    如您所见,0x22 0xe5 0x22是cp1251中"е"的正确十六进制表示形式

    10-07 22:16