首先检查网站的链接,然后获取所有链接。
我需要检查mysql的帮助,如果链接已经存在,如果存在,则不要插入它们,如果其中一些不存在,则插入它们。

  created_at = time.strftime("%Y/%d/%m/ %H:%M:%S")
afdelings = 'it-support'

url = 'www.careerjet.dk/sog/jobs?s=L%C3%A6rling&l=Danmark'
r  = requests.get("http://" +url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
side1 = "http://www.careerjet.dk/"
cur = connect.cursor()

for link in soup.select('.title > a'):
  linkfrom = side1 + (link.get('href'))
  f = string.split(linkfrom, '\n')
  for line in f:
    if ("""SELECT count(*) from jobtest WHERE link = %s""", (line)) == 0:
      cur.execute("""INSERT INTO jobtest (afdeling, dato, link) VALUES (%s, %s, %s)""", (afdelings, created_at, line))

with connect:
  connect.commit()

connect.close()


请任何帮助深表感谢。

最佳答案

您需要先执行选择。

像这样的东西

 created_at = time.strftime("%Y/%d/%m/ %H:%M:%S")
 afdelings = 'it-support'

 url = 'www.careerjet.dk/sog/jobs?s=L%C3%A6rling&l=Danmark'
 r  = requests.get("http://" +url)
 data = r.text
 soup = BeautifulSoup(data, "html.parser")
 side1 = "http://www.careerjet.dk/"
 cur = connect.cursor()

 for link in soup.select('.title > a'):
   linkfrom = side1 + (link.get('href'))
   f = string.split(linkfrom, '\n')
   for line in f:

     #-------ADDED CODE
     data_tmp = """SELECT count(*) from jobtest WHERE link = %s""", (line)
     data_tmp = cur.fetchall()
     #-------END ADDED CODE

     if (data_tmp == 0 ) :
       cur.execute("""INSERT INTO jobtest (afdeling, dato, link) VALUES (%s, %s, %s)""", (afdelings, created_at, line))

 with connect:
   connect.commit()

 connect.close()

09-10 00:31
查看更多