I have a large text file of URLs which I have to download via wget. I have written a small python script which basically loops through each domain name and download them using wget (os.system("wget "+URL)). But the problem is that wget just hangs on a connection if the remote server doesn't reply after connecting.How do I set a time limit in such a case? I want to terminate wget after some time if the remote server is not replying after connection.



This seems to be less a question about python, and more a question about how to use wget.in gnu wget, which you are likely using, the default number of retries is 20. you can set trieds using -t, perhaps wget -t0 would quickly skip it if the file fails to download. alternatively, you could use the -S flag to get sever response, and have python react appropriately. But, the most helpful options to you would be -T or timeout, set that to -T10 to have it timeout after ten seconds and move on.

If all you are doing is iterating through a list and downloading a list of URLs I would just use wget, no need for python here. In fact, you can do it in one line

awk '{print "wget -t2 -T5 --append-output=wget.log \"" $0 "\""}' listOfUrls | bash

what this is doing is running through a list of urls, and calling wget, where wget tries to download the file twice, and waits 5 seconds before terminating the connection, it also appends the response to wget.log, which you can grep at the end looking for a 404 error.


08-04 23:32