博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Spark的39个机器学习库
阅读量:4683 次
发布时间:2019-06-09

本文共 3699 字,大约阅读时间需要 12 分钟。

Apache Spark itself

1. 

AMPLab

Spark originally came out of Berkeley AMPLab and even today AMPLab projects, even though they are not in Apache Spark Foundation, enjoy a status a bit over your everyday github project.

Spark's own MLLib forms the bottom layer of the three-layer ML Base, with MLI being the middle layer and ML Optimizer being the most abstract layer.

2. 

3. ML Optimizer (aka )

Ghostware was described in 2014 but never released. Of the 39 machine learning libraries, this is the only one that is vaporware, and is included only due to its AMPLab and ML Base status.

Other than ML Base

4. 

A recent project from June, 2015, this set of stochastic learning algorithms claims 25x - 75x faster performance than Spark MLlib on Stochastic Gradient Descent (SGD). Plus it's an AMPLab project that begins with the letters "sp", so it's worth watching.

5. 

Brought machine learning pipelines to Spark, but pipelines have matured in recent versions of Spark. Also promises some computer vision capability, but there are  I previously blogged about.

6. 

A server to manage a large collection of machine learning models.

7. 

Faster machine learning on Spark by optimizing communication patterns and shuffles, as described in the paper

Frameworks

GPU-based

8. 

 

I previously blogged 

9. 

Brand new and frankly why I started this list for this blog post. Provides an interface to .

Non-GPU-based

10. 

Parameter server for model-parallel rather than data-parallel (as Spark's MLlib is).

11. 

From Airbnb, used in their automated pricing

12. 

Logistic regression, LDA, Factorization machines, Neural Network, Restricted Boltzmann Machines

13. 

Similar to Spark DataFrames, but agnostic to engine (i.e. will run on engines other than Spark in the future). Includes cross-validation and interfaces to external machine learning libraries.

Interfaces to other Machine Learning systems

14. 

Wraps Stanford .

15. 

Interface to Python's 

16. 

Interface to 

17. 

Wraps , machine learning in Hive

18. 

Export PMML, an industry standard XML format for transporting machine learning models.

Add-ons that enhance MLlib's existing algorithms

19. 

Adds dropout capability to Spark MLLib, based on the paper .

20. 

Adds arbitrary distance functions to K-Means

21. 

Visualize the Streaming Machine Learning algorithms built into Spark MLlib

Algorithms

Supervised learning

22. 

Factorization Machines

23. 

Recursive Neural Networks (RNNs)

24. 

SVM based on the performant Spark communication framework CoCoA listed above.

25. 

Based on 

26. 

Matrix Factorization Recommendation System

Unsupervised learning

27. 

40x faster clustering than Spark MLlib K-Means

28. 

K-Means that produces more uniformly-sized clusters, based on 

29. 

Build graphs using k-nearest-neighbors and locality sensitive hashing (LSH)

30. 

Online Latent Dirichlet Allocation (LDA), Gibbs Sampling LDA, Online Hierarchical Dirichlet Process (HDP)

Algorithm building blocks

31. 

Adaboost and MP-Boost

32. 

Port to Spark of . If your machine learning cost function happens to be convex, then TFOCS can solve it.

33. 

Linear algebra operators to work with Spark MLlib's linalg package

Feature extractors

34. 

Information-theoretic basis for feature selection, based on 

35. 

Given labeled data, "discretize" one of the continuous numeric dimensions such that each bin is relatively homogenous in terms of data classes. This is a foundational idea CART and ID3 algorithms to generate decision trees. Based on .

36. 

Distributed  for dimensionality reduction.

37. 

Sparse feature vectors

Domain-specific

38. 

K-Means, Regression, and Statistics

39. 

来自:http://datascienceassn.org/content/39-machine-learning-libraries-spark-categorized

转载于:https://www.cnblogs.com/bigdatafly/articles/5203509.html

你可能感兴趣的文章
springmvc入门详解
查看>>
Struts2和springmvc的区别
查看>>
用户名、密码等15个常用的js正则表达式
查看>>
对比多层字典是否相同函数
查看>>
用最简单的例子理解适配器模式(Adapter Pattern)
查看>>
你在哪编程?你的程序原料是什么?
查看>>
ehcache 简介
查看>>
java uuid 例子
查看>>
linux zip 压缩密码
查看>>
【SICP练习】26 练习1.32
查看>>
Centos下安装破解Jira7的操作记录
查看>>
PHP 正则表达式匹配 preg_match 与 preg_match_all 函数
查看>>
Python AES_ECB_PKCS5加密代码
查看>>
SpringBoot--外部配置
查看>>
C#中的线程三 (结合ProgressBar学习Control.BeginInvoke)
查看>>
sqlserver工作日常使用sql--持续完善中
查看>>
文件I/O与标准I/O
查看>>
大数据学习之路(持续更新中...)
查看>>
项目开发总结报告(GB8567——88)
查看>>
enumerate使用
查看>>