我对BeautifulSoup和python开发人员完全是新手,我想编写一个脚本来为我的个人网站自动执行一些操作。
我做这个:
#!/usr/bin/env python
""" Test menu for Website
"""
import urllib2
from bs4 import BeautifulSoup
print (47 * '-')
print (" C H O I C E L I S T")
print (47 * '-')
print ("1. Page One")
print ("2. Page Two")
print ("3. Page Three")
print ("4. Page Four")
print (47 * '-')
print (47 * '-')
#############################
## Robust error handling ##
## only accpet int ##
#############################
## Wait for valid input in while...not ###
is_valid=0
while not is_valid :
try :
choice = int ( raw_input('Enter your choice [1-8] : ') )
is_valid = 1 ## set it to 1 to validate input and to terminate the while..not loop
except ValueError, e :
print ("'%s' is not a valid choice." % e.args[0].split(": ")[1])
### Take action as per selected choice list option ###
if choice == 1:
print ("www.mywebsite.com/page_one.html")
elif choice == 2:
print ("www.mywebsite.com/page_two.html")
elif choice == 3:
print ("www.mywebsite.com/page_three.html")
elif choice == 4:
print ("www.mywebsite.com/page_four.html")
else:
print ("Invalid choice. try again...")
print (47 * '-')
print (47 * '-')
username = raw_input("Please, type your username\n")
html_content = urllib2.urlopen("http://" + [choice] + "/" + username)
soup = BeautifulSoup(html_content, "lxml")
#####################
## STRINGS REPLACE ##
#####################
start_msg = "Hey, you have "
end_msg = "comments !"
end_str = "read !"
####################
## COMMENTS COUNT ##
####################
count_comments = soup.find("span", "sidebar-comments")
count_comments
count_comments_final = count_comments.find_next("meta")
################
## COUNT READ ##
################
count_read = soup.find("span", "sidebar-read")
count_read
count_read_final = count_read.find_next("meta")
##################
## PRINT RESULT ##
##################
print start_msg + count_comments_final['content'].split(':')[1] + end_msg
print start_msg + count_read_final['content'].split(':')[1] + end_str
有了这个脚本,我想要:
1-选择我的网页(选择列表-4)
2-输入我的用户名
3-解析我选择的网页,并获得所有评论和所有阅读次数的计数。
我的问题在这里
html_content = urllib2.urlopen("http://" + [choice] + username)
,我无法获取良好网址所必需的参数!您能帮我找到正确的语法吗!
我的最终到达网址应为:http://www.mywebsite.com/page_one.html/username
最佳答案
这是一个奇怪的URL,但是您唯一需要做的就是将URL存储在变量中并重用。
另外,我将使用字典来映射一个int选择和实际的URL:
mapping = {
1: 'www.mywebsite.com/page_one.html',
2: 'www.mywebsite.com/page_two.html',
3: 'www.mywebsite.com/page_three.html',
4: 'www.mywebsite.com/page_four.html'
}
try:
page = mapping[choice]
except KeyError:
print ("Invalid choice. try again...")
# TODO: try again? :)
username = raw_input("Please, type your username\n")
url = "http://{page}/{username}".format(page=page, username=username)
html_content = urllib2.urlopen(url)