URL提交是百度提供的一個(gè)站長(zhǎng)工具,用于給站長(zhǎng)提供手工收錄某些URL的接口,但是該接口有驗(yàn)證碼識(shí)別部分,比較難弄。所以編寫了如下程序進(jìn)行驗(yàn)證碼自動(dòng)識(shí)別:
主要思路
獲取多個(gè)驗(yàn)證碼,提交到 http://lab.ocrking.com/ 進(jìn)行多次識(shí)別,然后計(jì)算每個(gè)驗(yàn)證碼圖片識(shí)別出來的 字母或數(shù)字 進(jìn)行統(tǒng)計(jì),得出統(tǒng)計(jì)率最高的 即為驗(yàn)證碼。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
import time
import json
import re
?
?
if __name__ == "__main__":
??? i = 1
??? s = requests.session()
??? s.headers.update({'Referer':'http://zhanzhang.baidu.com/sitesubmit/index','User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36'})
??? r = s.get('http://zhanzhang.baidu.com/sitesubmit/index')
??? s2 = requests.session()
??? r = s.post('http://zhanzhang.baidu.com/captcha',data={'async':'false','n':time.time()})
??? url = json.loads(r.content)['url']
??? temp = []
??? while 1:
??????? try:
??????????? r = s.get(url)
??????????? img_data = r.content
??????????? r = s2.get('http://lab.ocrking.com/')
??????????? try:
??????????????? content = ' '.join(r.content.split())
??????????????? sid =? re.findall(r'"sid" : "(.+?)"',content)[0]
??????????????? hash_1 = re.findall(r'"hash" : "(.+?)"',content)[0]
??????????????? timestamp = re.findall(r'"timestamp" : "(.+?)"',content)[0]
??????????? except:
??????????????? print 'error on get orking info!'
??????????????? continue
??????????? files = {'Filedata':('icode.jpeg', img_data)}
??????????? data? = {'Filename':'icode.jpeg','sid':sid,'hash':hash_1,'timestamp':timestamp}
??????????? r = s2.post('http://lab.ocrking.com/upload.html',files = files,data= data)
??????????? r = s2.post('http://lab.ocrking.com/ocrking.html',data={'upfile':r.content,'type':'captcha','charset':'7'})
??????????? icode = re.findall(r'
??????????? if len(icode) != 4 :
??????????????? continue
??????????? temp.append(icode)
??????????? i = i + 1
??????????? if i == 3 :
??????????????? break
??????? except Exception,e:
??????????? print e
??????????? pass
?
??? a = {'0':{},'1':{},'2':{},'3':{}}
??? for aa in temp:
??????? i = 0
??????? while i <=3 :
??????????? try:
??????????????? a[str(i)][aa[i]] =? a[str(i)][aa[i]] + 1
??????????? except:
??????????????? a[str(i)][aa[i]] = 1
??????????? i = i + 1
??? icode = ['','','','']
??? for index in a:
??????? temp_times = 0
??????? for index_1 in a[index]:
??????????? if a[index][index_1] >= temp_times :
??????????????? temp_times = a[index][index_1]
??????????????? icode[int(index)] = index_1
?
??? icode =? ''.join(icode)
?
??? img_name = 'temp\\'+icode+'.png'
??? file_object = open(img_name, 'w')
??? file_object.write(img_data)
??? file_object.close()
?
?
?
??? #r = s.post('http://zhanzhang.baidu.com/sitesubmit/sitepost',data={'url':'http://lab.ocrking.com/','captcha':icode})
?
??? #print r.content
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號(hào)聯(lián)系: 360901061
您的支持是博主寫作最大的動(dòng)力,如果您喜歡我的文章,感覺我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對(duì)您有幫助就好】元
