Summary of Denver Data Science Day, September 16th, 2017
http://denverdatascienceday.com/
This was an extremely informative conference on Data Science. All the speakers were excellent and the sessions were extremely informative and thought provoking.
- Alex Sandovsky, Senior Director of Data Science at Oracle was the first keynote who talked about how computers are deciding what you will buy at the grocery store; how companies are marketing in the age of data science.
Some highlights of the talk:
- What people buy
- Who they are
- Where they go
- What they do
They have 115 million buyer households with identity matched data including retail, tv viewership, consumer goods & auto.
bluekai - cookie based data marketplace
add this - java scripts powered tools
Where data comes from?
- postal address
- cookies
- email addresses
- ip addresses
- phone DeviceID
- web browser
- gaming consoles
- logins
Strong ID Graph is used for Marketing
pandas - programming language
Oracle Data Cloud technologies: Terraform cloud, Hadoop file system, Luigi Task Manager, workflow tool using python, Hive and Spark (apache)
Advertising Campaign - Bidding and Optimization
2. Barton Rhodes, Senior Data Scientist at Pandata gave a tour of using the Google cloud platform to amplify your team's data engineering game.
Key Highlights:
- Big Query
- Dataflow
- Dataproc
- Datalab
- Data prep
- Machine Learning
- Container Engine
- Vision API
- Natural language API
- Video learning
Google Cloud SDK
- gcloud
- gsutil
- bq
- kubectl
- supports Python, Java, NodeJS
Bigquery - massively parallel query engine
Dataflow (aka Apache Beam)
Research Paper: The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
Cloud TPUs -> integrated circuits highly optimized for reduced precision operations, integration with Tensor flow; accelerates AI inference and training
3. Victor Amin, Data Scientist at Sendgrid gave a hands on session on Machine Learning and how to identify abuse
Key Highlights:
Live Azure Tutorial: https://notebooks.azure.com/vamin/libraries/M3AAWG40
Vocabulary
- Model
- Training Set
- Sample
- Attribute or feature
- Target
4. The second keynote of the day was by Shawn Rogers, Senior Director of Analytic Strategy at TIBCO Software who talked about the Algorithmic Economy
Key Highlights:
The industry has moved beyond Data. Algorithms are the future
Big Belly reinvents Trash Can
- Sensor driven
- wireless communication and messaging
- predictive routing
- smart trashcan
- how often to clean
- when will it be full
- bigbelly.com
French Fryer supplier
Telenor Cell phone provider - using data science to serve unbanked
GE's cloud platform - predix
Apervita Health care algorithm marketplace
Algorithmia Library ~ 800 algorithms ready to use
Data Mapper - selling a platform to analyze drone data
Quantiacs - FinTech - a marketplace to connect the FinTech quant to the industry
Tibco Statistica workspace
Problems with unbridled uses of Algorithms
- Case of Target using algorithms to detect and target families with coupons of baby products to expectant moms
- The 23.5 million Dollar book: The making of a Fly: Genetics of Animal design
- The 440 million Dollar Software Test - Knight Capital 2012
5. Bill Vander Lugt, Data Scientist at Galvanize gave a talk on Deep Learning and the Future of Natural Language processing using Google's new translation engine
Key Highlights:
Chomsky on NLP: "Colorless green ideas sleep furiously"
Grammar based NLP vs Neural Net
- Deep Learning
- Backpropagation
- LTSM
- Word2Vec
Google's Seq2seq
Models
- Neural Net
- Recurrent Neural Net
- Encoder/Decoder
- LSTM/GRU
Google SyntaxNet: grammar based
Finally the folks at Galvanize mentioned about Kaggle datasets: https://www.kaggle.com/datasets
Here was the actual schedule of the conference:

Comments
Post a Comment