机器学习常备资料汇总

举报
技术火炬手 发表于 2018/08/20 10:04:59 2018/08/20
【摘要】 一. Python 编程规范简明 Python 编程规范https://blog.csdn.net/gzlaiyonghao/article/details/2834883Python语言规范http://zh-google-styleguide.readthedocs.io/en/latest/google-python-styleguide/python_language_rules/P...

一. Python 编程规范
简明 Python 编程规范
https://blog.csdn.net/gzlaiyonghao/article/details/2834883
Python语言规范
http://zh-google-styleguide.readthedocs.io/en/latest/google-python-styleguide/python_language_rules/
Python风格规范
http://zh-google-styleguide.readthedocs.io/en/latest/google-python-styleguide/python_style_rules/


二.机器学习所需-海量数据集
常用的搜索网站


|UCI Machine Learning Repository
http://archive.ics.uci.edu/ml/index.php
最著名的UCI数据集库,许多论文的数据均来源于此。
|AWS Public Datasets
https://aws.amazon.com/cn/datasets/
亚马逊云服务提供的数据集,涵盖天文、生物、化学、天气、经济等多领域。
|YAHOO Webscope datasets
https://webscope.sandbox.yahoo.com/
雅虎提供的数据集,包含图像、语言、排名分类等多领域数据。
|Kaggle datasets
https://www.kaggle.com/datasets
Kaggle竞赛平台提供的数据集库,能在里面发现很多来自工业界有趣的数据,
比如Uber、Netflix Prize、McDonald's等的数据。


计算机视觉
|ImageNet
http://www.image-net.org/
图像处理最著名的数据集,可以根据你的项目需求搜索任一种类的图像,⽤用来
做对象识别,定位,分类和屏幕解析等问题。有14197122个不同尺寸的图像,
总计140GB。
|MNIST
http://yann.lecun.com/exdb/mnist/
基本上是新提出的机器学习算法必跑的一个数据集。MNIST是一个手写数字数
据库,它有60000个训练样本集和10000个测试样本集,是NIST数据库的一个
子集。
|The CIFAR-10 dataset
https://www.cs.toronto.edu/~kriz/cifar.html
32x32 彩×××像。
|Google Open Images
https://github.com/ejlb/google-open-image-download
Google Open Images 是Google公司开放的大型图像标注数据集,包含 900万
张图像中 7800种类别内容的标注。


自然语言处理
|文本分类数据集
https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M
由 DBPedia、Amazon、Yelp、Yahoo!、Sogou 和 AG的文本分类数据整合成
的一个大型数据集。样本大小从 120K 到 3.6M, 问题从 2 级到 14 级。
|WikiText
https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset
维基百科文章中的大型语言建模语料库。
|Billion Words
http://www.statmt.org/lm-benchmark/
常用来训练如word2vec或Glove的分布式词表征
|Stanford Sentiment Treebank
https://link.zhihu.com/?target=http%253A//nlp.stanford.edu/sentiment/code.html
用于情感分析的数据集


语音识别
|2000 HUB5 English
https://catalog.ldc.upenn.edu/LDC2002T43
英语的语音数据。
|CHIME
http://spandh.dcs.shef.ac.uk/chime_challenge/data.html
包含噪声的语音识别数据集
|TED-LIUM
http://www-lium.univ-lemans.fr/en/content/ted-lium-corpus
TED演讲的语音数据集,有对应的全文本。


其它类
|UCR Time Series
http://www.cs.ucr.edu/~eamonn/time_series_data/
时间序列界的“Imagnet”,发文章必跑。
|Million Song Dataset
https://labrosa.ee.columbia.edu/millionsong/
做音乐推荐或分类的程序员可能会用到。
|Netflix 推荐系统数据
http://dataju.cn/Dataju/web/datasetInstanceDetail/32
电影评价数据集,该数据集中包含随机挑选的 48万 Netflix客户,对 1.7万 部
电影,超过 1百万 条评价,数据时间段为 1998.10 到 2005.11。评价以5分制
评分为基准,每部电影评价为1-5分,客户信息进行了脱敏处理。
|Udacity 自动驾驶数据集
https://github.com/udacity/self-driving-car/
Udacity 学城开放的自动驾驶课程中的自动驾驶汽车数据集,旨在打造一个开
源的自动驾驶项目。多个二进制压缩文件,总计100G左右


三.IT名企机器学习岗位最新面经
腾讯
1.【数据挖掘面经】腾讯+百度+华为(均拿到sp offer)--转http://
blog.csdn.net/zhaoyu106/article/details/52853377
2.腾讯 - 机器学习(深圳)
http://www.job592.com/pay/ms224054.html

百度
1.【数据挖掘面经】腾讯+百度+华为(均拿到sp offer)--转http://
blog.csdn.net/zhaoyu106/article/details/52853377
2.2016百度‘机器学习/数据挖掘岗位’面经,一面+二面+三面
http://blog.csdn.net/zzukun/article/details/52687842
3.百度 - 机器学习(北京)
http://www.job592.com/pay/ms276014.html
http://www.job592.com/pay/ms276174.html
4.百度 机器学习/数据挖掘 ⼀一面 被淘汰 记
http://blog.csdn.net/mpbchina/article/details/8018005
5.面经 百度机器学习、自然语言处理NLP
http://blog.sina.com.cn/s/blog_8af1069601013hl1.html

阿里巴巴
1.国内互联⽹公司算法&机器学习岗(阿⾥星)面试总结
http://www.100mian.com/mianshi/sousuosuanfa/49035.html
2.阿里实习生的四次面试经历(机器学习)
http://www.jianshu.com/p/0a1148bb0c70

华为
1.【数据挖掘⾯经】腾讯+百度+华为(均拿到sp offer)--转http://
blog.csdn.net/zhaoyu106/article/details/52853377

美团
1.美团机器学习岗面经
http://blog.csdn.net/zr459927180/article/details/51966345

⽹易
⽹易 - 机器学习(⼴州)
http://www.job592.com/pay/ms276733.html

GOOGLE
1.Google Data Scientist Interview Questions
https://www.glassdoor.com/Interview/Google-Data-Scientist-Interview-
Questions-EI_IE9079.0,6_KO7,21.htm
2.Google Software Engineer Machine Learning Interview Questions
https://www.glassdoor.com/Interview/Google-Software-Engineer-Machine-
Learning-Interview-Questions-EI_IE9079.0,6_KO7,41.htm

FACEBOOK
1.Machine Learning Software Engineer Interview
https://www.glassdoor.com/Interview/Facebook-Machine-Learning-Software-
Engineer-Interview-Questions-EI_IE40772.0,8_KO9,43.htm
2.acebook Machine Learning Interview Questions
https://www.glassdoor.com/Interview/Facebook-Machine-Learning-Interview-
Questions-EI_IE40772.0,8_KO9,25.htm
3.Facebook Research Scientist Interview Questions
https://www.glassdoor.com/Interview/Facebook-Research-Scientist-Interview-
Questions-EI_IE40772.0,8_KO9,27.htm

Amazon
1.Amazon Machine Learning Scientist Interview Questions
https://www.glassdoor.com/Interview/Amazon-Machine-Learning-Scientist-
Interview-Questions-EI_IE6036.0,6_KO7,33.htm
2.Amazon Applied Scientist Interview Questions
https://www.glassdoor.com/Interview/Amazon-Applied-Scientist-Interview-
Questions-EI_IE6036.0,6_KO7,24.htm
3.Amazon Machine Learning Interview Questions
https://www.glassdoor.com/Interview/Amazon-Machine-Learning-Interview-
Questions-EI_IE6036.0,6_KO7,23.htm
4.How should I prepare for an interview with the Amazon machine learning
group?
https://www.quora.com/How-should-I-prepare-for-an-interview-with-the-
Amazon-machine-learning-group
5.Amazon Interview Experience | 220 (On-Campus)
http://www.geeksforgeeks.org/amazon-interview-experience-220-on-campus/

LINKEDIN
1.LinkedIn Software Engineer/Machine Learning Interview Questions
https://www.glassdoor.com/Interview/LinkedIn-Interview-Questions-
E34865.htm?filter.jobTitleExact=Software+Engineer%2FMachine+Learning
2.LinkedIn Data Scientist Interview Questions
https://www.glassdoor.com/Interview/LinkedIn-Data-Scientist-Interview-
Questions-EI_IE34865.0,8_KO9,23.htm
3.LinkedIn Machine Learning Engineer Interview Questions
https://www.glassdoor.com/Interview/LinkedIn-Machine-Learning-Engineer-
Interview-Questions-EI_IE34865.0,8_KO9,34.htm

其他公司
1.机器学习岗⾯试点滴聚集http://blog.csdn.net/Andrewseu/article/details/
53940726
2.Machine Zone ⾯经 OA+Onsite
http://www.themianjing.com/2015/06/machine-zone-
%E9%9D%A2%E7%BB%8F-oaonsite/
3.银江 - 机器学习(杭州)
http://www.job592.com/pay/ms269483.html
4.独家揭密|来自硅谷机器学习岗位面经!
https://zhuanlan.zhihu.com/p/25066733
5.国内互联⽹公司算法&机器学习岗(阿里星)⾯试总结
http://www.pig66.com/weixintoutiao/dianzandang/2016-02-22/636092.html
6.⼀线互联网公司机器学习岗位面试经验
http://blog.csdn.net/shuaishuai3409/article/details/52886453
7.搜狗机器学习算法⼯程师面试经验分享
http://www.yjbys.com/mianshi/692123.html
8.4个月面试25+公司,学长教你找Data Scientist职位! (内含Facebook,
Google等15家⼤公司面经)
http://posts.careerengine.us/p/57695dc6fba2e4e631c8b043

相关题⽬

Interview/
4.机器学习面经
http://www.cnblogs.com/xiangzhi/p/4842757.html

6.机器学习面试知识点总结(不断补充中)
http://www.cnblogs.com/zuochongyan/p/5407053.html
7.【干货】机器学习常见的算法面试题总结
http://www.cstor.cn/textdetail_10481.html
8.What are some common Machine Learning interview questions?
https://www.quora.com/What-are-some-common-Machine-Learning-interviewquestions
9.Machine Learning Interview Questions
https://www.glassdoor.com/Interview/machine-learning-interview-questions-
SRCH_KO0,16.htm
10.21 Must-Know Machine Learning Interview Questions and Answers
https://elitedatascience.com/machine-learning-interview-questions-answers
11.Top 50 Machine Learning Interview Questions & Answers
http://career.guru99.com/top-50-interview-questions-on-machine-learning/
12.ML Job Interview Questions
https://www.reddit.com/r/MachineLearning/comments/1wmayh/
ml_job_interview_questions/
13.Top-down learning path: Machine Learning for Software Engineers
https://github.com/ZuzooVn/machine-learning-for-software-engineers
14.What are some good interview questions for statistical algorithm developer
candidates?
https://stats.stackexchange.com/questions/210639/what-are-some-goodinterview-
questions-for-statistical-algorithm-developer-candi
15.Interview Questions for Data Scientist / Research Engineer Positions
https://www.linkedin.com/pulse/interview-questions-data-scientist-researchengineer-
ahmed-el-deeb


本文转自asdud博客51CTO博客,如需转载,请自行联系原作者。

原文链接


【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。