对于这种时间格式:發表於: 星期一 五月 28, 2012 6:59 am

import re
INPUT = "發表於: 星期一 五月 28, 2012 6:59 am 文章主題: 對《大話新聞》改組的誠心思考/蔬菜麵"
pattern = re.compile(r'[\d]+')
b = re.findall(pattern, INPUT)
a = INPUT.split(' ')
monthdict = {"一月": "","二月": "", "三月": "", "四月": "", "五月": "", "六月": "",
"七月": "", "八月": "", "九月": "", "十月": "", "十一月": "", "十二月": ""}
year = a[4]
month = monthdict[a[2]]
day = b[0]
if a[6] == 'pm':
hour = int(b[2].encode('utf-8')) + 12
hour= b[2]
min = b[3]
OUTPUT = "%s-%s-%s %s:%s:00"% (year, month, day, hour, min)
print OUTPUT

对于这种正常的时间格式   http://www.cdnews.com.tw 2015-11-02 17:33:55

import re
INPUT="http://www.cdnews.com.tw 2015-11-02 17:33:55"
pattern = re.compile(r'[\d]+')
a = re.findall(pattern, INPUT)
year = a[0]
month = a[1]
day = a[2]
hour = a[3]
minute = a[4]
second = a[5]
OUTPUT = "%s-%s-%s %s:%s:%s" % (year,month,day,hour,minute,second)
print OUTPUT

对于这种时间格式  發表於: 星期三 十二月 14, 2016 6:45 pm

import re
INPUT = "發表於: 星期三 十二月 14, 2016 6:45 pm"
pattern = re.compile(r'[\d]+')
b = re.findall(pattern, INPUT)
a = INPUT.split(' ')
monthdict = {"一月": "","二月": "", "三月": "", "四月": "", "五月": "", "六月": "","七月": "", "八月": "", "九月": "", "十月": "", "十一月": "", "十二月": ""}
year = a[4]
month = monthdict[a[2]]
day = b[0]
if a[6] == 'pm':
hour = int(b[2].encode('utf-8')) + 12
elif a[6] == 'am':
h = int(b[2])
if h >= 10:
hour = h
elif h<10:
hour= ""+b[2]
min = b[3]
OUTPUT = "%s-%s-%s %s:%s:00"% (year, month, day, hour, min)
print OUTPUT
05-11 22:06