2. Center for Open Source
Data and AI
Technologies (CODAIT)
Code – Build and improve practical frameworks to
enable more developers to realize immediate
value.
Content – Showcase solutions for complex and
real-world AI problems.
Community – Bring developers and data
scientists to engage with IBM
Improving Enterprise AI lifecycle in
Open Source
• Team contributes to over 10 open source projects
• 17 committers and many contributors in Apache projects
• Over 1100 JIRAs and 66,000 lines of code committed to Apache Spark itself; over 65,000
LoC into SystemML
• Over 25 product lines within IBM leveraging Apache Spark
• Speakers at over 100 conferences, meetups, unconferences and more
CODAIT
codait.org
6. Fabric for Deep Learning
https://github.com/IBM/FfDL
FfDL Github Page
https://github.com/IBM/FfDL
FfDL dwOpen Page
https://developer.ibm.com/code/open/projects/
fabric-for-deep-learning-ffdl/
FfDL Announcement Blog
http://developer.ibm.com/code/2018/03/20/
fabric-for-deep-learning
FfDL Technical Architecture Blog
http://developer.ibm.com/code/2018/03/20/
democratize-ai-with-fabric-for-deep-learning
Deep Learning as a Service within Watson Studio
https://www.ibm.com/cloud/deep-learning
Research paper: “Scalable Multi-Framework
Management of Deep Learning Training Jobs”
http://learningsys.org/nips17/assets/papers/
paper_29.pdf
• Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’
aims at making Deep Learning easily accessible to Data
Scientists, and AI developers.
• FfDL Provides a consistent way to train and visualize Deep
Learning jobs across multiple frameworks like TensorFlow,
Caffe, PyTorch, Keras etc.
FfDL
6
Community Partners
FfDL is one of InfoWorld’s 2018 Best of Open Source
Software Award winners for machine learning and deep
learning!
7. AIOps
Trained
Model
Deployed
Model
And there are platforms to serve your models, create model catalogues etc.
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
FfDL kube-batch
Jupyter Enterprise Gateway
MAX
Istio OpenWhisk
9. Is it fair?
Is it easy to
understand?
Did anyone
tamper with it?
Is it accountable?
#21, #32, #93
#21, #32, #93
What does it take to trust a decision made by a machine?!
(Other than that it is 99% accurate)?!
13. IBM Adversarial Robustness
Toolbox
ART
ART is a library dedicated to adversarial
machine learning. Its purpose is to allow rapid
crafting and analysis of attack and defense
methods for machine learning models. The
Adversarial Robustness Toolbox provides an
implementation for many state-of-the-art
methods for attacking and defending
classifiers.
13
https://github.com/IBM/adversarial-robustness-
toolbox
The Adversarial Robustness Toolbox contains
implementations of the following attacks:
Deep Fool (Moosavi-Dezfooli et al., 2015)
Fast Gradient Method (Goodfellow et al., 2014)
Jacobian Saliency Map (Papernot et al., 2016)
Universal Perturbation (Moosavi-Dezfooli et al., 2016)
Virtual Adversarial Method (Moosavi-Dezfooli et al.,
2015)
C&W Attack (Carlini and Wagner, 2016)
NewtonFool (Jang et al., 2017)
The following defense methods are also supported:
Feature squeezing (Xu et al., 2017)
Spatial smoothing (Xu et al., 2017)
Label smoothing (Warde-Farley and Goodfellow, 2016)
Adversarial training (Szegedy et al., 2013)
Virtual adversarial training (Miyato et al., 2017)
14. Poisoning detection
• Detection based on
clustering activations
• Proof of attack strategy
Evasion detection
• Detector based on
inputs
• Detector based on
activations
Robustness metrics
• CLEVER
• Empirical robustness
• Loss sensitivity
Unified model API
• Training
• Prediction
• Access to loss and
prediction gradients
Evasion defenses
• Feature squeezing
• Spatial smoothing
• Label smoothing
• Adversarial training
• Virtual adversarial
training
• Thermometer encoding
• Gaussian data
augmentation
Evasion attacks
• FGSM
• JSMA
• BIM
• PGD
• Carlini & Wagner
• DeepFool
• NewtonFool
• Universal perturbation
14
Implementation for state-of-the-art methods for attacking and defending
classifiers.
20. 20
Statement Score
I’m a sikh +0.3
I’m a christian +0.1
I’m a jew -0.2
I’m a homosexual -0.5
I’m queer -0.1
I’m straight +0.1
”We dedicated lot of efforts to making sure the NLP API avoids bias, but we don't always get it right. This is an example of one of those
times, and we are sorry. We take this seriously and are working on improving our models. We will correct this specific case, and, more
broadly, building more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone.“
Google spokesperson
https://motherboard.vice.com/en_us/article/j5jmj8/google-artificial-intelligence-bias
Bias in Sentiment Analysis (Motherboard, Oct 25, 2017)!
“determines the degree to which sentence expressed a negative or positive sentiment, on a scale of -1 to 1”
21. IBM Confidential 21
“designed her to tweet and engage people on other social media”
"Unfortunately, within the first 24 hours of coming online, we became aware of a coordinated effort by some users to abuse Tay's
commenting skills to have Tay respond in inappropriate ways. As a result, we have taken Tay offline and are making adjustments.”
Microsoft spokesperson
https://www.npr.org/2016/03/27/472067221/internet-trolls-turn-a-computer-into-a-nazi
Tay (NPR, March 2016)!
22. 22
“used to inform decisions about who can be set free at every stage of the criminal justice system”
Bias in Recidivism Assessment (Propublica, May 2016)!
23. 23https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
• “The formula was particularly likely to
falsely flag black defendants as future
criminals, wrongly labeling them this way
at almost twice the rate as white
defendants.
• White defendants were mislabeled as
low risk more often than black
defendants.”
“Northpointe does not agree that the results of your analysis, or the claims being made based upon that analysis, are
correct or that they accurately reflect the outcomes from the application of the model.”
“used to inform decisions about who can be set free at every stage of the criminal justice system”
Bias in Recidivism Assessment (Propublica, May 2016)!
32. Unwanted bias and algorithmic fairness
Machine learning, by its very nature, is always a form of statistical discrimination
Discrimination becomes objectionable when
it places certain privileged groups at
systematic advantage and certain
unprivileged groups at systematic
disadvantage
Illegal in certain contexts
34. 34
Defining Bias
!
There are at least 21 definitions of fairness
- No one definition applicable in all contexts
- Some definitions even conflict
Bias does not comes only from training data
- it can also be introduced with
- inappropriate data handling,
- as a result of inappropriate model selection
- incorrect algorithm design or application
Need a "comprehensive Bias pipeline" that fully integrates into the AI Lifecycle
Defining Bias!
36. AI Fairness 360
https://github.com/IBM/AIF360
AIF360AIF360 toolkit is an open-source library to
help detect and remove bias in machine
learning models.
The AI Fairness 360 Python package includes
a comprehensive set of metrics for datasets
and models to test for biases, explanations for
these metrics, and algorithms to mitigate bias
in datasets and models.
Toolbox
Fairness metrics (70+)
Fairness metric explanations
Bias mitigation algorithms (10)
36
Supported bias mitigation algorithms
Optimized Preprocessing (Calmon et al., 2017)
Disparate Impact Remover (Feldman et al., 2015)
Equalized Odds Postprocessing (Hardt et al., 2016)
Reweighing (Kamiran and Calders, 2012)
Reject Option Classification (Kamiran et al., 2012)
Prejudice Remover Regularizer (Kamishima et al., 2012)
Calibrated Equalized Odds Postprocessing (Pleiss et al.,
2017)
Learning Fair Representations (Zemel et al., 2013)
Adversarial Debiasing (Zhang et al., 2018)
Supported fairness metrics
Comprehensive set of group fairness metrics derived
from selection rates and error rates
Comprehensive set of sample distortion metrics
Generalized Entropy Index (Speicher et al., 2018)
37. (d’Alessandro et al., 2017)
Fairness in building and deploying models throughout AI Lifecycle!
52.
Training Pipe
Model
Validation
Pipe
KNATIVE
AI Pipelines- Logical Architecture!
Data Pipe
Model
Deployment
Pipe
Deployment
Analysis
Pipe
OPENWHISK
Pipeline (Python Definition – Orchestrate and Track)
AI Developer and Data Scientist
AIOps Developer and Operator
Python
Function
Python
Function
Python
Function
Python
Function
Python
Function
53. Open Source AIOps Platform!
AISphere Pipeline: Continued
# Include simple pipeline into a complicated pipeline
overallPipe = Pipe('OverallPipeline')
overallPipe.add_jobs([
Job(check_data_fairness),
training_pipe,
model_validation_pipe,
Job(s2i),
model_deployment_pipe,
Job(explain_model_predictions)
])
overallPipe.run()