Python 示例介绍

举报
charles 发表于 2017/09/22 17:50:50 2017/09/22
【摘要】 Python 示例介绍

1.概述

1.1介绍

运用python +requests+beautifulsoup程序实现对shodan搜索引擎中的数据进行爬取,获取malware(恶意软件)IP的筛选,根据国家、服务、组织、产品进行搜索获取具体的IP信息。具体思路是通过requests构造Fromdata登录对后台发送数据请求,获取网页源码之后利用BeautifulSoup和正则对源码进行文本筛选,获取我们需要的相关信息。

 

2.原理介绍            

2.1 shodan介绍

Shodan是一个搜索引擎,与谷歌不同的是,Shodan不是在网上搜索网址,而是直接进入互联网的背后通道。Shodan可以说是一款黑暗谷歌,一刻不停的在寻找着所有和互联网关联的服务器、摄像头、打印机、路由器等等。每个月Shodan都会在大约5亿个服务器上日夜不停地搜集信息。

Shodan所搜集到的信息是极其惊人的。凡是链接到互联网的红绿灯、安全摄像头、家庭自动化设备以及加热系统等等都会被轻易的搜索到。Shodan的使用者曾发现过一个水上公园的控制系统,一个加油站,甚至一个酒店的葡萄酒冷却器。而网站的研究者也曾使用Shodan定位到了核电站的指挥和控制系统及一个粒子回旋加速器。

Shodan真正值得注意的能力就是能找到几乎所有和互联网相关联的东西。而Shodan真正的可怕之处就是这些设备几乎都没有安装安全防御措施,其可以随意进入。

 

 

 

2.2 Requests介绍

requestspython的一个HTTP客户端库,跟urlliburllib2类似,那为什么要用requests而不用urllib2呢?官方文档中是这样说明的:
python
的标准库urllib2提供了大部分需要的HTTP功能,但是API太逆天了,一个简单的功能就需要一大堆代码。本着偷懒的想法,个人更倾向于requests这个第三方库的运用。

Requests的简单应用介绍:

#导入模块

Import requests

#发送GET请求

r = requests.get('http://www.zhidaow.com')

#获取网页源码

rtext()

#禁止跳转

r=requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN', allow_redirects = False)

#获取网页状态码,用来判断请求是否成功

r.status_code

#获取响应头

r.headers

#发送POST,PUT,DELETE,HEAD,OPTIONS请求

#r = requests.post("http://httpbin.org/post")

#r = requests.put("http://httpbin.org/put")

#r = requests.delete("http://httpbin.org/delete")

#r = requests.head("http://httpbin.org/get")

#r = requests.options("http://httpbin.org/get")

更多的应用可以上http://www.zhidaow.com/post/python-requests-install-and-brief-introduction查询

2.3 BeautifulSoup介绍

BeautifulSoup是一个可以从HTMLXML文件中提取数据的Python.它能够通过你喜欢的转换器实现惯用的文档导航,查找,修改文档的方式.Beautiful Soup会帮你节省数小时甚至数天的工作时间.

BeautifulSoup简单用法的介绍:

#首先设定一个html文本

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<body>

<p><b>The Dormouse's story</b></p>

<p>Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" id="link1">Elsie</a>,

<a href="http://example.com/lacie" id="link2">Lacie</a> and

<a href="http://example.com/tillie" id="link3">Tillie</a>;

and they lived at the bottom of a well.</p>

<p>...</p>

#导入BeautifulSoup模块

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc)

#几个简单的浏览结构化数据的方法

soup.title  #获取title的标签行

# <title>The Dormouse's story</title>

 

soup.title.name

# u'title'

 

soup.title.string   #打印title标签中的文本

# u'The Dormouse's story

'

soup.title.parent.name  #title标签的父节点

# u'head

'

soup.p  #所有P标签行

# <p><b>The Dormouse's story</b></p>

 

soup.p['class'] #打印P标签中的class属性值

# u'title

'

soup.a  #打印a标签行

# <a href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')#找到所有的a标签行

# [<a href="http://example.com/elsie" id="link1">Elsie</a>,

#  <a href="http://example.com/lacie" id="link2">Lacie</a>,

#  <a href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")

# <a href="http://example.com/tillie" id="link3">Tillie</a>

从文档中找到所有<a>标签的链接:

for link in soup.find_all('a'):

    print(link.get('href'))

    # http://example.com/elsie

    # http://example.com/lacie

    # http://example.com/tillie

从文档中获取所有文字内容:

print(soup.get_text())

# The Dormouse's story

#

# The Dormouse's story

#

# Once upon a time there were three little sisters; and their names were

# Elsie,

# Lacie and

# Tillie;

# and they lived at the bottom of a well.

#

# ...

以上是beautifulsoup的基本用法,个人喜欢beautifulsoup和正则一起用,这样比较省时省力。

2.4环境搭建

1、搭建python环境
1)
安装python安装包
下载地址:
https://www.python.org/downloads/release/python-2711/

2) 在环境变量中添加相应的变量名与变量值
变量名:Path
变量值:C:\Python27 (即Python的安装路径)

3)测试Python是否配置完成
cmd中输入python -V 提示以下信息表示配置完成

C:\Users\Administrator>python -V
Python 2.7.11

 

2、安装 pip

打开C:\Python27\Scripts查看安装python中是否有pip文件,将C:\Python27\Scripts加入系统环境变量即可,然后打开CMD窗口,执行pip命令,查看pip是否可用。

3.安装 requestsbs4

1)通过pip 安装

pip install requests

2)安装bs4

pip install bs4

2.5 程序代码

#_*_coding:utf-8_*_

import requests

from bs4 import BeautifulSoup

import re

import time

import sys

reload(sys)

sys.setdefaultencoding('utf-8')

url_infos=[]

ip_list=[]

searchfor_list=[]

addon_list=[]

information=[]

ports=[2280,1604,1177,16464,443]

organizations=['George+Mason+University','Enzu','Psychz+Networks','Turk+Telekom','HOPUS+SAS']

products=['Gh0st+RAT+trojan','DarkComet+trojan','njRAT+trojan','ZeroAccess+trojan','XtremeRAT+trojan']

page_nums=range(1,6)

countries=['AE','AF','AL','AM','AO','AR','AT','AU','AZ','BD','BE','BF','BG','BH','BI','BJ','BL','BN','BO','BR',

'BW','BY','CA','CF','CG','CH','CL','CM','CN','CO','CR','CS','CU','CY','DE','DK','DO','DZ','EC','EE','EG','ES',

'ET','FI','FJ','FR','GA','GB','GD','GE','GH','GN','GR','GT','HK','HN','HU','ID','IE','IL','IN','IQ','IR','IS',

'IT','JM','KG','KH','KP','KR','KT','KW','KZ','LA','LB','LC','LI','LK','LR','LT','LU','LV','LY','MA','MC','MD',

'MG','ML','MM','MN','MO','MT','MU','MW','MX','MY','MZ','NA','NE','NG','NI','NL','NO','NP','NZ','OM','PA','PE',

'PG','PH','PK','PL','PT','PY','QA','RO','RU','SA','SC','SD','SE','SG','SI','SK','SM','SN','SO','SY','SZ','TD',

'TG','TH','TJ','TM','TN','TR','TW','TZ','UA','UG','US','UY','UZ','VC','VE','VN','YE','YU','ZA','ZM','ZR','ZW']

class Get_IP():

    def __init__(self,url,headers):

        self.headers=headers

        self.url=url

    def get_info(self):

        for data in datas:

            req=requests.post(self.url,params=data,headers=self.headers)

            html=req.text

            pattern=re.compile(r'<a.*?>(.*?)</a>')

            pattern0=re.compile(r'value=\'(.*?)\'/')

            soup=BeautifulSoup(html,'html.parser')

            Add_on=soup.find_all(text=re.compile(r'Added\ on'))

            for i in soup.select('input[id="search_input"]'):

                text_b=re.search(pattern0,str(i)).group(1)

            for i in Add_on:

                addon_list.append(i)

            for i in soup.find_all(href=re.compile(r'/host/')):

                if 'Details' not in i.get_text():

                    ip_list.append(i.get_text())

                    searchfor_list.append(text_b)

        return ip_list,searchfor_list,addon_list

class Get_Ip_Info():

    def __init__(self,url,headers):

        self.url=url

        self.headers=headers

    def get_ip_info(self):

        for data in url_infos:

            req_ip=requests.post(self.url,params=data,headers=self.headers)

            html_ip=req_ip.text

            soup=BeautifulSoup(html_ip,'html.parser')

            tag_text=[]

            tag_content=[]

           

            for i in soup.find_all('th'):

                tag_text.append(i.get_text())

            for i in soup.find_all('td'):

                tag_content.append(i.get_text())

            for i in soup.select('meta[name="twitter:description"]'):

                pattern=re.compile(r'content="Ports open:(.*?)"')

                ports=re.search(pattern,str(i)).group(1)

            info=dict(zip(tag_content,tag_text))

            info['Ports']=ports

            info['Ip']=data['continue'].strip('https://www.shodan.io/host/')

            information.append(info)

if __name__=="__main__":

headers={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) ""Gecko/20100101 Firefox/25.0"}

    url='https://account.shodan.io/login'

    file_path=r'C:\Users\Administrator\Downloads\shoudan%s.txt'%time.strftime('%Y-%m-%d',time.localtime(time.time()))

    file_w=open(file_path,'wb')

    all_url=[]

    datas=[]

for country in countries:

for page_num in page_nums:

home_url='https://www.shodan.io/search?query=category%3A%22malware%22+country%3A%22'+'%s'%country+'%22&'+'page=%d'%page_num

            all_url.append(home_url)

    for port in ports:

        for page_num in page_nums:

            home_url='https://www.shodan.io/search?query=category%3A%22malware%22+port%3A%22'+'%d'%port+'%22&'+'page=%d'%page_num

            all_url.append(home_url)

    for org in organizations:

        for page_num in page_nums:

            home_url='https://www.shodan.io/search?query=category%3A%22malware%22+org%3A%22'+'%s'%org+'%22&'+'page=%d'%page_num

            all_url.append(home_url)

    for product in products:

        for page_num in page_nums:

            home_url='https://www.shodan.io/search?query=category%3A%22malware%22+product%3A%22'+'%s'%product+'%22&'+'page=%d'%page_num

            all_url.append(home_url)

    for continue_url in all_url:     

        info={

        'username':'xxxxxxxxxx',#Shodan账户名

        'password':'xxxxxxxxxx',#Shodan密码

        'grant_type':'password',

        'continue':continue_url,

        'login_submit':'Log in'

        }

        datas.append(info)

    app=Get_IP(url,headers)

    app.get_info()

    for ip in ip_list:

        url_ip='https://www.shodan.io/host/'+'%s'%ip

        url_info={

        'username':'xxxxxxxxxx',#Shodan账户名

        'password':'xxxxxxxxxx',#Shodan密码

        'grant_type':'password',

        'continue':url_ip,

        'login_submit':'Log in'

        }

        url_infos.append(url_info)

    app=Get_Ip_Info(url,headers)

    app.get_ip_info()

    total_info=zip(searchfor_list,information,addon_list)

    for i in range(len(total_info)):

        search_for=total_info[i][0]

        add_on=total_info[i][2]

        ip_info=total_info[i][1]['Ip']

        try:

            city_info=str(total_info[i][1]['City'])

        except KeyError:

            city_info='NULL'

        try:

            ports_info=str(total_info[i][1]['Ports'])

        except KeyError:

            ports_info='NULL'

        try:

            country_info=str(total_info[i][1]['Country'])

        except KeyError:

            country_info='NULL'

        try:

            hostnames_info=str(total_info[i][1]['Hostnames'])

        except KeyError:

            hostnames_info='NULL'

word=search_for+' ||' +country_info+' '+city_info+' ||'+hostnames_info+' ||'+ip_info+'||'+ports_info+' ||'+add_on

        file_w.write(word+'\r\n')

    file_w.close()

2.6 功能介绍:

根据搜索条件,对Shodan搜索到的信息进行收集,并每天产生信息日志。初始的一个搜索条件是category:"malware"对恶意软件的一个搜索,然后再根据国家、服务、组织、产品细化搜索,获取所有涉及到的IP地址,最后继续根据IP地址,进入details页面,返回IP涉及到的详细数据(PortsIpHostnameCountryCity


【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。