Key takeaways — O’reilly AI London Conference, Oct 9–11, 2018
I had an opportunity to attend the O’reilly AI London Conference, Oct 9–11, 2018. Given our short attention span these days, let me try a more clickbait style approach for the takeaways :)
1. AI Gurus are the new rock stars and there was never a better time to be in this field. There continues to be tremendous interest in Enterprise AI. This was the first O’reilly AI Conference in Europe and it was ‘Sold Out’. Including this one, O’reilly now organizes 5 AI Conferences all over the world in a year, and all of them get ‘Sold Out’. The situation is still better than say NIPS, which got Sold Out in a couple of weeks. (“The meeting has sold out. The waitlist is now full” — NIPS 2018 website.)
2. There were all the usual suspects at the conference: IBM, Amazon, Google, Microsoft, Intel – mostly prompting their Cloud offerings. It is interesting see the ‘Head of ..’, ‘VP of …’ come one after the other touting their respective platforms, without commenting on how their offering differs from what the previous speaker was presenting.
In my opinion, there is hardly any differentiation between the ML/DL offerings of the different Cloud platforms today. All of them are basically providing a managed service out of algorithms which are anyway available (for free) in the research/open source domain. Whatever proprietary 5% differentiation exists, it can probably lead to 5% higher accuracy for a very specific use-case. The point however is that if you had such a strategic use-case for which that 5% mattered, you would probably develop it in-house rather than going to the Cloud. Note that I am not saying that Cloud ML offerings do not have any value. On the contrary, they are great for fast experimentation and showing early results; however, it is not worth going into lengthy debates/RFIs/RFPs to assess if Google Cloud is better than Watson.
3. Do NOT go into every Google or Facebook talk thinking it will improve your IQ and give you groundbreaking insights into how AI is practiced at these companies. Having followed such talks at AI conferences for some time now, I think I have finally uncovered their underlying operational pattern.
On the one hand, these companies are becoming more open than ever; at least in the field of AI. A few years ago, one could have hardly imagined Facebook sharing the algorithmic details of their ML based services and the underlying infrastructure, Amazon/Uber sharing details of their forecasting algorithms, etc. However, this is exactly the type of information that you would find on their respective blogs today — a few recommended ones to follow:
The presentations however are a different story. It would seem that once these blogs are published, a common set of slides are issued internally which are then presented by different employees at different venues. For instance, the following talk at the conference:
Is basically based on their paper/blog post published last year.
Plus, there are of course some (e.g. the ones below) which have absolutely nothing new or interesting to offer and would most likely not have even earned a slot if the presenters were not from Google.
4. On a more technical note, Deep Learning (DL) methods for NLP and Computer Vision (Image Classification, Object Detection, etc.) seem to have reached a saturation point – there is nothing fundamentally new happening in these fields. The focus is rather on enterprise tools in the form of more Cloud services offering these APIs, mature Open Source frameworks; leading to their wider adoption in enterprise settings.
5. As such, the focus on AI Research, and consequently of this conference has shifted to the following topics:
- DL for Forecasting
- AIOps: DevOps to operationalize AI products/models
- Explainable AI
- GANs for synthetic data generation
- Reinforcement Learning
We will explore the above topics in more detail in the rest of this report.
6. DL for Forecasting: For years, Forecasting research has focused on statistical methods, e.g. ARIMA; which are quite mature and widely used. Recently, given the time-step nature of Recurrent Neural Networks (RNNs), and it variant Long Short-term Memory (LSTM) networks, DL Researchers have start challenging the foothold of statistical methods for Forecasting. While the jury is still out on this topic, and the best algorithm for your problem will always depend on your data characteristics; there is growing consensus that a hybrid: (statistical + ML/DL) model works best.
“Interestingly, one winning entry to the M4 Forecasting Competition was a hybrid model that included both hand-coded smoothing formulas inspired by a well known the Holt-Winters method and a stack of dilated long short-term memory units (LSTMs).”
The above observation was presented by a very interesting talk by Uber:
Forecasting at Uber: Machine learning approaches — Andrea Pasqua (Uber)
which actually is summary of their blog post:
Forecasting at Uber: An Introduction
Other interesting talks on this topic included the below presentation by SAS, where they explored different architectures of combining statistical and ML/DL models.
Business forecasting using hybrid approach: A new forecasting method using deep learning and time series — Pasi Helenius (SAS), Larry Orimoloye (SAS)
The below presentation provided a great overview of recent research papers in this field:
Deep prediction: A year in review for deep learning for time series — Aileen Nielsen (Skillman Consulting)
Among the cited papers was the below one by Amazon, which provides some interesting insights into their probabilistic forecasting algorithms:
DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks — Valentin Flunkert, David Salinas, Jan Gasthaus (Amazon)
For an introduction to this topic, there was a very good tutorial presented at the conference by a team of Microsoft Data Scientists.
an overview of RNN architectures for Time Series Forecasting. It does provide a performance comparison of the architectures as well. All the notebooks are available on their GitHub page and can be used as a starting point to implement the different architectures.
7. AIOps: DevOps to operationalize AI products/models
As Enterprise AI usage mature, there is increasing focus on AIOps/MLOps to set up the right DevOps practices to efficiently support AI use-cases from experimentation to production scale deployments.
There were some interesting knowledge sharing presentations on this topic by Zalando and LinkedIn.
Architecting AI applications - Mikio Braun (Zalando SE)
TonY: Native support of TensorFlow on Hadoop - Jonathan Hung (LinkedIn), Keqiu Hu (LinkedIn), Anthony Hsu (LinkedIn)
The major announcement in this area was of course the recent announcement of PyTorch 1.0, and how it compares (and will compete) w.r.t. TensorFlow.
The significance of PyTorch 1.0 comes from FB’s earlier positioning that they were using different frameworks for research/experimentation and production – PyTorch for Research and Caffee2 for Production – relying and investing in the ONNX toolchain to perform model transformation between the 2 frameworks.
PyTorch 1.0 promises to unify the two worlds allowing the usage of 1 framework for both experimentation and production. Coming from FB, there is and will be significant interest in PyTorch 1.0. At this stage however, in my opinion, it is still early days and PyTorch seems to lack the maturity and widespread adoption (in terms of the availability of 3rd party open source libraries leveraging TensorFlow) of TensorFlow. TensorFlow has also been investing in improving its debugging and visualization capabilities; and there is no need to abandon all your TensorFlow notebooks and start migrating them to PyTorch 1.0 just yet.
8. Explainable AI
This refers to the requirement that when a ML/DL model takes a decision, it is possible to explain the parameters underlying that decision. In the interests of taking unbiased and fair decisions, and not least the “Right to Explainability” clause in GDPR; there is considerable interest in the topic and there were quite a few related presentations at the conference.
How to build privacy and security into deep learning models — Yishay Carmiel (IntelligentWire)
Building safe artificial intelligence with OpenMined — Andrew Trask (OpenMined)
Protecting your secrets — Katharine Jarmul (KIProtect)
For me, the standout announcement/presentation on this topic was the one below by IBM:
It features a business-oriented dashboard to help explain AI-powered recommendations or decisions, and tools to mitigate bias early in the data collection and management phases. For instance, the below dashboard shows a “Policy Age” bias creeping into a Claims Approval process. For more technical details, please refer to Ruchir Puri’s blog on this topic.
As I have long said, IBM Watson may not be the smartest, but it is one of the most enterprise ready platforms out there. And, with the release of this service; they have once again validated this view. It is not very clear which algorithms/models are supported by the “explainability” layer, and other cloud platforms will most likely catch-up soon; but the point is that they were the first to bring to market this enterprise handy feature and it has value even if it works only for the most basic ML based classification/segmentation models.
9. GANs for synthetic data generation
Synthetic data generation, esp. using Generative Adversarial Networks (GANs), remains an area of active research to address the shortage of training data to train DL models.
10. Reinforcement Learning (RL)
I will keep this short. Let me know if you made it this far are still looking for more insights ☺ Suffice it to say that with the current saturation setting into DL methods, there is quite a bit of expectation that RL will be the next big thing in AI.
The really positive development in this space is the growing availability of RL frameworks (below + a few others by OpenAI, Facebook and Microsoft/Bonsai) that will allow non-specialists to leverage RL - having the same effect that TensorFlow, Keras and Caffe/PyTorch had on DL adoption in the last couple of years.