- Data Analytics / Data Pipeline
- PANCAKE stack
- PrestoDB : https://prestodb.io/ : Distributed SQL for big data ( HIVE, HBASE, cassandra , any SQL
- Arrow : Apache arrow ( https://arrow.apache.org/ ) , high performing columnar-in memory
- Nifi : https://blogs.apache.org/nifi/ ( big fat 700MB) Excellent tool for graphing and data routing , transformation, visualization
- Cassandra : http://cassandra.apache.org/ : Clustered no sql DB , written in java
- Airflow : https://github.com/apache/incubator-airflow : Orchestrator, work flow engine, configure hadoop jobs and direct the scheduler
- Kafka : https://kafka.apache.org/ : High speed messaging
- ElasticSearch
- Other Tools
- Apache-Spark :
- TensorFlow
- Algebird
- CoreNLP
- Kibana
- Apache Zeppelin : Multi-purpose Notebook for
- Data Ingestion
- Data Discovery
- Data Analytics
- Data Visualization and Collaboration
- PIG scripts and Map Reduce scripts
- http://pipeline.io/ : Optimizing tensor flow for prod
- http://jupyter.org/ : Awesome online tool to have shared notepad, code ,
- PMML : Predictive model markup language : JPMML : Java PMML ( https://www.ibm.com/developerworks/library/ba-ind-PMML1/ )
Working on a goal to bring this up in simple kubenetes cluster.
No comments:
Post a Comment