本文介绍了Python:建议改进一个chunk-by-chunk代码读取几百万点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Python中写了一个代码来读取 code>)。在生成器中添加多边形内部的每个点( if pnpoly(px,py,verts))。

  • 我们对输出文件

  • 使用另一个和 p在inside_points 中,这是使用生成器)

  • 写入输出文件( file_out.write / code>)



  • 因为此方法仅将多边形内的点添加到列表,



    如果上面显示的方法不起作用,您应该 只使用chunks。
    当使用块时,应该正确处理异常。例如:

     来自liblas import LASException 

    chunkSize = 100000
    for i in xrange 0,len(f),chunkSize):
    try:
    chunk = f [i:i + chunkSize]
    除了LASException:
    rem = len(f)-i
    chunk = f [i:i + rem]

    我不明白你想在这里完成什么。 视频打印是什么意思?



    关于(3):由于您不再使用原始的块,您可以重新使用该名称。意识到在python变量只是一个。



    关于(4):您未使用 else 它完全。


    I wrote a code to read *.las file in Python. *las file are special ascii file where each line is x,y,z value of points.

    My function read N. number of points and check if they are inside a polygon with points_inside_poly.

    I have the following questions:

    1. When I arrive at the end of the file I get this message: LASException: LASError in "LASReader_GetPointAt": point subscript out of range because the number of points is under the chunk dimension. I cannot figure how to resolve this problem.
    2. a = [file_out.write(c[m]) for m in xrange(len(c))] I use a = in order to avoid video print. Is it correct?
    3. In c = [chunk[l] for l in index] I create a new list c because I am not sure that replacing a new chunk is the smart solution (ex: chunk = [chunk[l] for l in index]).
    4. In a statement if else...else I use pass. Is this the right choice?

    Really thank for help. It's important to improve listen suggestions from expertise!!!!

    import shapefile
    import numpy
    import numpy as np
    from numpy import nonzero
    from liblas import file as lasfile
    from shapely.geometry import Polygon
    from matplotlib.nxutils import points_inside_poly
    
    
    # open shapefile (polygon)
    sf = shapefile.Reader(poly)
    shapes = sf.shapes()
    # extract vertices
    verts = np.array(shapes[0].points,float)
    
    # open las file
    f = lasfile.File(inFile,None,'r') # open LAS
    # read "header"
    h = f.header
    
    # create a file where store the points
    file_out = lasfile.File(outFile,mode='w',header= h)
    
    
    chunkSize = 100000
    for i in xrange(0,len(f), chunkSize):
        chunk = f[i:i+chunkSize]
    
        x,y = [],[]
    
        # extraxt x and y value for each points
        for p in xrange(len(chunk)):
            x.append(chunk[p].x)
            y.append(chunk[p].y)
    
        # zip all points
        points = np.array(zip(x,y))
        # create an index where are present the points inside the polygon
        index = nonzero(points_inside_poly(points, verts))[0]
    
        # if index is not empty do this otherwise "pass"
        if len(index) != 0:
            c = [chunk[l] for l in index] #Is It correct to create a new list or i can replace chunck?
            # save points
            a = [file_out.write(c[m]) for m in xrange(len(c))] #use a = in order to avoid video print. Is it correct?
        else:
            pass #Is It correct to use pass?
    
    f.close()
    file_out.close()
    

    code proposed by @Roland Smith and changed by Gianni

    f = lasfile.File(inFile,None,'r') # open LAS
    h = f.header
    # change the software id to libLAS
    h.software_id = "Gianni"
    file_out = lasfile.File(outFile,mode='w',header= h)
    f.close()
    sf = shapefile.Reader(poly) #open shpfile
    shapes = sf.shapes()
    for i in xrange(len(shapes)):
        verts = np.array(shapes[i].points,float)
        inside_points = [p for p in lasfile.File(inFile,None,'r') if pnpoly(p.x, p.y, verts)]
        for p in inside_points:
            file_out.write(p)
    f.close()
    file_out.close()
    

    i used these solution:1) reading f = lasfile.File(inFile,None,'r') and after the read head because i need in the *.las output file2) close the file3) i used inside_points = [p for p in lasfile.File(inFile,None,'r') if pnpoly(p.x, p.y, verts)] instead of

    with lasfile.File(inFile, None, 'r') as f:
    ...     inside_points = [p for p in f if pnpoly(p.x, p.y, verts)]
    ...
    

    because i always get this error message

    Traceback (most recent call last):File "", line 1, inAttributeError: _exit_

    解决方案

    Regarding (1):

    First, why are you using chunks? Just use the lasfile as an iterator (as shown in the tutorial), and process the points one at a time. The following should get write all the points inside the polygon to the output file, by using the pnpoly function in a list comprehension instead of points_inside_poly.

    from liblas import file as lasfile
    import numpy as np
    from matplotlib.nxutils import pnpoly
    
    with lasfile.File(inFile, None, 'r') as f:
        inside_points = (p for p in f if pnpoly(p.x, p.y, verts))
        with lasfile.File(outFile,mode='w',header= h) as file_out:
            for p in inside_points:
                file_out.write(p)
    

    The five lines directly above should replace the whole big for-loop. Let's go over them one-by-one:

    • with lasfile.File(inFile...: Using this construction means that the file will be closed automatically when the with block finishes.
    • Now comes the good part, the generator expression that does all the work (the part between ()). It iterates over the input file (for p in f). Every point that is inside the polygon (if pnpoly(p.x, p.y, verts)) is added to the generator.
    • We use another with block for the output file
    • and all the points (for p in inside_points, this is were the generator is used)
    • are written to the output file (file_out.write(p))

    Because this method only adds the points that are inside the polygon to the list, you don't waste memory on points that you don't need!

    You should only use chunks if the method shown above doesn't work.When using chunks you should handle the exception properly. E.g:

    from liblas import LASException
    
    chunkSize = 100000
    for i in xrange(0,len(f), chunkSize):
        try:
            chunk = f[i:i+chunkSize]
        except LASException:
            rem = len(f)-i
            chunk = f[i:i+rem]
    

    Regarding (2): Sorry, but I fail to understand what you are trying to accomplish here. What do you mean by "video print"?

    Regarding (3): since you are not using the original chunk anymore, you can re-use the name. Realize that in python a "variable" is just a nametag.

    Regarding (4): you aren't using the else, so leave it out completely.

    这篇关于Python:建议改进一个chunk-by-chunk代码读取几百万点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

    08-20 12:03