本文介绍了Python - 从字符串解析 IPv4 地址(即使经过审查)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标:编写 Python 2.7 代码以从字符串中提取 IPv4 地址.

Objective: Write Python 2.7 code to extract IPv4 addresses from string.

字符串内容示例:

以下是 IP 地址:192.168.1.1、8.8.8.8、101.099.098.000.这些也可以显示为 192.168.1[.]1 或 192.168.1(.)1 或 192.168.1[dot]1 或 192.168.1(dot)1 或 192 .168 .1 .1 或 192.168.1. 1. 这些审查方法可以应用于任何点(例如:192[.]168[.]1[.]1).

The following are IP addresses: 192.168.1.1, 8.8.8.8, 101.099.098.000.These can also appear as 192.168.1[.]1 or 192.168.1(.)1 or 192.168.1[dot]1 or 192.168.1(dot)1 or 192 .168 .1 .1 or 192. 168. 1. 1. and these censorship methods could apply to any of the dots (Ex: 192[.]168[.]1[.]1).

从上面可以看出,我正在努力寻找一种方法来解析一个 txt 文件,该文件可能包含以多种审查"形式描述的 IP(以防止超链接).

As you can see from the above, I am struggling to find a way to parse through a txt file that may contain IPs depicted in multiple forms of "censorship" (to prevent hyper-linking).

我认为正则表达式是要走的路.也许说一些类似的话;由分隔符列表"中的任何内容分隔的四个整数 0-255 或 000-255 的任何分组,该列表由句点、括号、括号或任何其他上述示例组成.这样,分隔符列表"可以根据需要更新.

I'm thinking that a regex expression is the way to go. Maybe say something along the lines of; any grouping of four ints 0-255 or 000-255 separated by anything in the 'separators list' which would consist of periods, brackets, parenthesis, or any of the other aforementioned examples. This way, the 'separators list' could be updated at as needed.

不确定这是否是正确的方式,甚至可能如此,非常感谢您对此的任何帮助.

Not sure if this is the proper way to go or even possible so, any help with this is greatly appreciated.

更新:感谢下面递归的回答,我现在有以下代码适用于上述示例.它会...

Update:Thanks to recursive's answer below, I now have the following code working for the above example. It will...

  • 找到IP
  • 将它们放入列表
  • 清理它们的空格/大括号/等
  • 并将未清理的列表条目替换为已清理的条目.

警告:以下代码不考虑不正确/无效的 IP,例如 192.168.0.256 或 192.168.1.2.3目前,它将删除前面提到的尾随 6 和 3.如果它的第一个八位字节无效(例如:256.10.10.10),它将丢弃前导 2(导致 56.10.10.10).

Caveat: The code below does not account for incorrect/non-valid IPs such as 192.168.0.256 or 192.168.1.2.3Currently, it will drop the trailing 6 and 3 from the aforementioned. If its first octet is invalid (ex:256.10.10.10) it will drop the leading 2 (resulting in 56.10.10.10).

import re

def extractIPs(fileContent):
    pattern = r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)([ (\[]?(\.|dot)[ )\]]?(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})"
    ips = [each[0] for each in re.findall(pattern, fileContent)]
    for item in ips:
        location = ips.index(item)
        ip = re.sub("[ ()\[\]]", "", item)
        ip = re.sub("dot", ".", ip)
        ips.remove(item)
        ips.insert(location, ip)
    return ips

myFile = open('***INSERT FILE PATH HERE***')
fileContent = myFile.read()

IPs = extractIPs(fileContent)
print "Original file content:\n{0}".format(fileContent)
print "--------------------------------"
print "Parsed results:\n{0}".format(IPs)

推荐答案

下面的代码将...

  • 即使经过审查也能在字符串中查找 IP(例如:192.168.1[dot]20 或 10.10.10 .21)
  • 将它们放入列表
  • 清除审查制度(空格/大括号/括号)
  • 并将未清理的列表条目替换为已清理的条目.

警告:下面的代码不考虑不正确/无效的 IP,例如 192.168.0.256 或 192.168.1.2.3 目前,它将删除尾随数字(6 和 3 从前述).如果它的第一个八位字节无效(例如:256.10.10.10),它将丢弃前导数字(导致 56.10.10.10).

Caveat: The code below does not account for incorrect/non-valid IPs such as 192.168.0.256 or 192.168.1.2.3 Currently, it will drop the trailing digit (6 and 3 from the aforementioned). If its first octet is invalid (ex: 256.10.10.10), it will drop the leading digit (resulting in 56.10.10.10).


import re
def extractIPs(fileContent):
    pattern = r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)([ (\[]?(\.|dot)[ )\]]?(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})"
    ips = [each[0] for each in re.findall(pattern, fileContent)]
    for item in ips:
        location = ips.index(item)
        ip = re.sub("[ ()\[\]]", "", item)
        ip = re.sub("dot", ".", ip)
        ips.remove(item)
        ips.insert(location, ip)
    return ips


myFile = open('***INSERT FILE PATH HERE***')
fileContent = myFile.read()

IPs = extractIPs(fileContent)
print "Original file content:\n{0}".format(fileContent)
print "--------------------------------"
print "Parsed results:\n{0}".format(IPs)

这篇关于Python - 从字符串解析 IPv4 地址(即使经过审查)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 16:33