取消HTTP请求时关闭所有goroutines

取消HTTP请求时关闭所有goroutines

本文介绍了取消HTTP请求时关闭所有goroutines的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


我正在制作网络爬虫.我将网址传递给搜寻器函数,并对其进行解析以获取锚标记中的所有链接,然后为每个网址使用单独的goroutine为所有这些网址调用相同的搜寻器函数.
但是,如果在我收到响应之前发送请求并取消它,则该特定请求的所有常规程序仍在运行.
现在我想要的是,当我取消请求时,由于该请求而被调用的所有goroutine都会停止.
请指导.
以下是我的 crawler 函数的代码.


I am making a web crawler. I'm passing the url through a crawler function and parsing it to get all the links in the anchor tag, then I am invoking same crawler function for all those urls using seperate goroutine for every url.
But if if send a request and cancel it before I get the response, all the groutines for that particular request are still running.
Now what I want is that when I cancel the request all the goroutines that got invoked due to that request stops.
Please guide.
Following is my code for the crawler function.

func crawler(c echo.Context, urlRec string, feed chan string, urlList *[]string, wg *sync.WaitGroup) {
    defer wg.Done()
    URL, _ := url.Parse(urlRec)
    response, err := http.Get(urlRec)
    if err != nil {
        log.Print(err)
        return
    }

    body := response.Body
    defer body.Close()

    tokenizer := html.NewTokenizer(body)
    flag := true
    for flag {
        tokenType := tokenizer.Next()
        switch {
        case tokenType == html.ErrorToken:
            flag = false
            break
        case tokenType == html.StartTagToken:
            token := tokenizer.Token()

            // Check if the token is an <a> tag
            isAnchor := token.Data == "a"
            if !isAnchor {
                continue
            }

            ok, urlHref := getReference(token)
            if !ok {
                continue
            }

            // Make sure the url begines in http**
            hasProto := strings.Index(urlHref, "http") == 0
            if hasProto {
                if !urlInURLList(urlHref, urlList) {
                    if strings.Contains(urlHref, URL.Host) {
                        *urlList = append(*urlList, urlHref)
                        // fmt.Println(urlHref)
                        // c.String(http.StatusOK, urlHref+"\n")Documents
                        if !checkExt(filepath.Ext(urlHref)) {
                            wg.Add(1)
                            go crawler(c, urlHref, feed, urlList, wg)
                        }
                    }
                }
            }
        }
    }
}

以下是我的POST请求处理程序

And following is my POST request handler

func scrapePOST(c echo.Context) error {
    var urlList []string
    urlSession := urlFound{}
    var wg sync.WaitGroup
    urlParam := c.FormValue("url")
    feed := make(chan string, 1000)
    wg.Add(1)
    go crawler(c, urlParam, feed, &urlList, &wg)
    wg.Wait()
    var count = 0
    for _, url := range urlList {
        if filepath.Ext(url) == ".jpg" || filepath.Ext(url) == ".jpeg" || filepath.Ext(url) == ".png" {
            urlSession.Images = append(urlSession.Images, url)
        } else if filepath.Ext(url) == ".doc" || filepath.Ext(url) == ".docx" || filepath.Ext(url) == ".pdf" || filepath.Ext(url) == ".ppt" {
            urlSession.Documents = append(urlSession.Documents, url)
        } else {
            urlSession.Links = append(urlSession.Links, url)
        }
        count = count + 1
    }
    urlSession.Count = count
    // jsonResp, _ := json.Marshal(urlSession)
    // fmt.Print(urlSession)
    return c.JSON(http.StatusOK, urlSession)
}

推荐答案

回显上下文公开了HTTP请求,该请求的上下文已经与服务器请求绑定在一起.只需获取该上下文,然后检查其是否取消,和/或将其传递给采用上下文的方法即可.

The echo context exposes the HTTP request, which has a context tied to the server request already. Just get that context, and check it for cancellation, and/or pass it along to methods that take a context.

ctx := c.Request().Context()
select {
case <-ctx.Done():
    return ctx.Err()
default:
    // Continue handling the request
}

// and pass along to the db or whatever else:
rows, err := db.QueryContext(ctx, ...)

如果客户端中止连接,则请求范围的上下文将自动取消.

If the client aborts the connection, the Request-scoped context will automatically be cancelled.

如果您想添加自己的取消条件(超时或其他条件),也可以这样做:

If you want to add your own cancellation conditions, (timeouts, or whatever) you can do that, too:

req := c.Request()
ctx, cancel := context.WithCancel(req.Context())
req.WithContext(ctx)
defer cancel()
// do stuff, which may conditionally call cancel() to cancel the context early

这篇关于取消HTTP请求时关闭所有goroutines的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 13:42