Fair/Social good in AI
Formalizing notions of Fairness/Ethics in deployable systems
Exploring legal/semantic differences between western and Indian notions of Fairness.
Data, in the real world, often carries a lot of previously held racial, gender-specific, community-specific biases and these biases creep into the algorithms/models which train on the data. Furthermore, the biases in an Indian context might be very different from those in the western world. The goal of the project is to develop algorithmic fairness solutions that cater to the local needs keeping in mind local laws and regulations.
AI and Ethics for the Indian Context
An ethical framework in the field of AI should not be a mere appendage to its technological aspects. Rather, ethical understanding should be treated as an important aspect of the reasoning and rationalization of the impact of AI on society. Ethical violations in AI processes, especially in areas such as health, education, financial markets, economy, housing and transportation could result in massive disruptions in the lives of a large number of people on a scale, hitherto, unimagined. In the Indian context, AI is steadily entering various sectors and will soon be ubiquitous. Questions regarding the nature and scope of the impact of AI on the lives of ordinary citizens of India abound. While ethical concerns of AI is an active area of research in Europe, US and elsewhere, given the unique nature of society in India, we would need to initiate research ourselves. Some of the questions that we will address in this project are:
In a country as diverse as India with mind-boggling variations in language, culture, food, customs and manners, region, society and religion, can AI successfully negotiate the labyrinth without being unfair to any one group or an individual? While there have been attempts to make AI algorithms fair with respect to a small number of protected attributes, what mechanisms can we adopt to ensure more complex constraints are met. Typically we cannot expect the data to implicitly capture these notions of fairness and would need to operate with an exogenous knowledge source.
The concept of social justice has been India’s singular socio-economic initiative aimed towards the disadvantaged people of the society. Considerations of social justice have even been introduced in the Constitution of India to create a level playing field for those who have suffered social discrimination for millennia. Can AI based systems incorporate principles of social justice in their rationalizations for example in the admission and selection process? What would it take to build a system that can explain decisions on the basis of principles of social justice?
How much appreciation is there among the end users and general public about the wide-ranging impact of AI? Do they realize how the adoption of AI in certain crucial sectors can fundamentally change how things operate now? To this end, we will do a case study in the health sector. The objective of the case study is to evaluate the exposure of doctors, nurses and other health professionals to the concept of AI in the medical industry. The sample will be collected from one General Government Hospital, one specialized facility in cardio, neuro, or cancer research and a reputed private hospital in Chennai and will cover doctors, healthcare workers, patients/caregivers and administrators.
For addressing the first question we will adopt the following methodology. We will create synthetic datasets where there are several “protected” attributes, with regard to which we need to be fair. These datasets will be extensions of existing popular datasets from the UCI corpus. We will benchmark various fair ML approaches on this dataset and develop newer methods to address the shortcomings.
For the second problem, we need to create two pieces of data - the first is a representation of the social justice criteria in a machine comprehensible manner. While we need to explore what form this should take, some of the candidates are a rule base, a set of logical predicates, specification in probabilistic logic, etc. The second is a dataset in which the points have attributes that are relevant from the social justice viewpoint. Again, we will create an extension of the UCI repository for this purpose. The challenging part of the work is proposing classification methods that can quickly verify if the external criteria are met, and if not, readjust the classifier parameters efficiently.
For the third problem, the case study research method will be used to explore the application of the AI in the health sector. The objective of the case study is to evaluate the exposure of doctors, nurses and other health professionals to the concept of AI in the medical industry. The sample will be one General Government Hospital, one specialized facility in cardio/neuro/cancer research, subject to availability, and a reputed private hospital in Chennai. A cross-sectional survey of the doctors, nurses and other health professionals affiliated with the afore-mentioned medical facilities is planned.
Interpreting black box state of the art Deep Neural Networks
Understanding robustness of Deep Neural Networks to Adversarial attacks
Human in the Loop approach to ML to achieve better explainability
The recent tremendous success in ML/AI is attributed to the advent of deep learning algorithms. While these algorithms have performed extremely well in NLP, Vision, Speech and other domains, deploying them in several applications is often not a comfortable decision for industries. The fundamental reason for this is that deep neural models trade-off explainability with accuracy. The goal of the project is to contribute to the growing literature on explainable machine learning algorithms.
Paradigms, Interpretable Models and Algorithms for AI-based Human in the Loop Learning
Artificial intelligence/Data science systems with humans-in-the-loop (HIL) are increasing by the day with applications covering a broad spectrum of domains ranging from education to e-commerce. AI for HIL systems involves two kinds of continual learners namely humans and computers. The success of these systems critically depends on the interaction between these two learners. Towards this, we propose to investigate the following fundamental questions in AI-based HIL systems.
The data is typically not stationary because human behaviour changes over time. While some of the non-stationarity can be addressed via shifting distribution, an alternate yet important aspect is that humans are (possibly subconsciously) modifying their behaviour in a potentially non-stochastic or strategic (game-theoretic) manner to adapt to the underlying system experience. We are interested in investigating the fundamental paradigms of simultaneous learning in dynamic HIL systems.
In many scenarios of HIL systems, either data is missing or only could be obtained via additional interventions. For instance, an important application we are interested in is MOOCs in the Indian context where a large part of the data (e.g. video viewingtime data, etc) is available in aggregate and non-user specific form. We are interested in understanding principled approaches to handle missing/partial/aggregate data in HIL systems to prescribe user-specific interventions.
A major challenge is to develop explainable HIL based continual learning systems where the actions taken by the system should be interpretable. This has been a major bottleneck for the sophisticated state of the art deep learning-based reinforcement learning systems.
For instance, in the application of interest to us, i.e., MOOCs in the Indian context, if the system chooses a specific intervention to improve user retention, then it should back it up with easy to understand reasoning for making the decision. We are interested in developing interpretable models for user interventions in AI-based HIL systems. As an application of the above proposal, we are interested in looking at MOOCs in the Indian context (NPTEL). With our general approach, we hope the following interesting points can be addressed.
Analyse NPTEL datasets and explain the most common reasons for dropouts and draw actionable insights.
- Predict when a student/user would drop out by analysing her viewing activity.
- Incorporate potentially personalized intervention mechanisms (could be as simple as automatically linking relevant prerequisite videos at the right time in a stream, creating an automated list of user-specific FAQ, automatically splitting a video into logical chunks, etc.) and observe retention rates.
- Suggest potential widgets that NPTEL might incorporate for better feedback mechanisms (for instance, a user might be allowed to choose parts of the video which she does not understand well, etc.)
Several existing studies for user retention in online courses tackle the issue either from a sociological perspective or are too aligned to the western context. We wish to develop a first of its kind study that incorporates fundamental data science paradigms for dynamic AI-based HIL systems as building blocks and address potential challenges specific to the Indian context in such systems
Interpretability of Deep Learning Models in Healthcare
Interpretability of deep learning models is essential for widespread adoption of these techniques in the Medical image diagnosis community. Deep learning models have been phenomenally successful at beating state of the art in common medical image diagnosis tasks like segmentation as well as in screening applications, for e.g. classification of diabetic retinopathy, diagnosis of chest X-ray scans among others. While these successes have created huge interest in adopting these techniques in clinical practice, a huge barrier in adoption is the lack of interpretability of these models. Convolutional Neural Networks with hundreds of layers is the workhorse for medical image diagnosis. While the initial layers are typically edge detectors and shape detectors, as one goes deeper into the network it is fairly impossible to explain or interpret the feature maps. In order for clinicians to trust the output from these networks it is essential that a mechanism for explaining the output be present. Black box techniques will make it hard for clinicians to justify the diagnosis as well as for follow up procedures.
In this proposal we seek to extend activation maximation to problems in the medical image analysis domain by incorporating domain data specific constraints in the loss function for generating synthetic inputs. The feasibility of this approach stems from the fact that even though there are variations in human anatomy between individuals, overall the degree of variation is quite low and in most cases the shape and appearance of the anatomy in the image does not vary much between patients. One can exploit this low variability to constrain the synthesized image to match a shape template obtained from the training data set. Statistical shape and appearance models are often used in medical image analysis for segmentation tasks and methodology for estimating these models have been well studied.
We propose to incorporate shape and appearance models proposed recently. as a regularization term in the activation maximization model so as to constrain the synthesized images to be clinically meaningful. The regularization can be affected by directly using the L2 norm of the difference between the synthesized image and the model. Alternatively, an auto-encoder based on the training data can be used to extract low dimensional representation of the images and a statistical measure like KL divergence between the model and the synthesized image can act as a regularizer.
Initial studies on brain tumor segmentation using the naïve activation maximization have yielded promising results with early layers capturing the overall shape of the brain while successive layers focus on the tumor. However, the results are not consistent and, in some cases, not interpretable. We hypothesize that Incorporating shape and appearance priors in the loss function will lead to the synthesis of images that are physically meaningful and improve the interpretability of the network prediction. We will incorporate shape and appearance priors for both brain tumor segmentation and cardiac segmentation tasks. The work is expected to establish a technique for interpreting the outputs of deep neural networks which are beating state of the art for many routine medical image diagnosis tasks.
Efficient algorithms for dealing with drifting data distributions in deployable scenarios.
Dealing with unstable data, under utilized data, feedback loops, etc.
The performance of ML/AI algorithms depend crucially on the distributions from which data arrives. Real-world data often exhibit the phenomenon of ‘data drift’ and AI/ML algorithms must be equipped to handle these. The goal of the project is to develop deployable ML/AI algorithms that learn continuously in the field and adapt to changing data distribution/drifts.
Learning with Limited, Partial and Noisy data
An area of significant interest in data science is its use on extensively large data sets that are being thrown up due to the changing landscape of computational improvements and IoT. While this addresses a new and exciting arena of changing possibilities, there exist a wide range of real-world contexts, particularly in third-world environments, where this abundance of useful data is not the norm. This research seeks to study the use of machine learning in two such contexts, one where there are significant inadequacies in terms of the data quantity and the other in terms of data quality. On the quantity front, we look at the use of experimentation and information sharing as means of generating data when there is an absence of it. To this end, we intend to study the use of offline experimentation–inspired by traditional statistical designed experiments—and online experimentation—inspired by the Bandit framework from Reinforcement Learning. Specifically, we seek to understand real-world idiosyncrasies that may entail the use principles from both fields. We seek to also study the effect of information sharing and how it can help agents acquire data in a more efficient way. On the quality front, we look at a particularly recurring problem of limited and asymmetric noise. A few of our previous studies have indicated that in many sources of data, which carry implications for incentives or assessment, there exists a tendency for human intervention or behavior that instills a bias in the data, which is not truly reflective of the underlying phenomena. Broadly, we seek to look at the use of machine learning algorithms which account for this through modeling of latent variables and other approaches which explicitly look at the asymmetric noise in the variable of interest.
Incorporating Domain Knowledge into Machine Learning
There has been a tremendous shift in recent years in data analysis applications, where reasoning about systems is moving from the use of domain knowledge and data to the use of purely data-based analysis. This shift has been largely driven by the availability of tremendous amounts of data in all problem domains. This availability of data is poised to increase even further due to the pervasive nature of the Internet of Things (IoT). In parallel, vigorous research in data science algorithms have now generated a suite of apparently “all-weather” algorithms that can be used in conjunction with Big Data for reasoning and analysis in almost all problem domains. In such a scenario, it is important to step-back and assess the relevance of domain knowledge in data analysis. There are several fundamental questions that beg answers. Is there no need for domain knowledge at all in data analysis - in other words, can everything that can be uncovered be done so with just data and algorithms alone? Even if it were possible to work purely with data, would the models that are developed be interpretable enough to be used in contexts that were not considered at the time of analysis? An alternate question is if indeed domain knowledge is available, how would one incorporate that knowledge into data analysis algorithms to derive better and more interpretable models? Answering the last question would require fundamental research in techniques that incorporate domain knowledge into learning algorithms
We will pursue work in this area of hybridizing learning algorithms with prior knowledge. In order to be domain agnostic and for the algorithms to be “universal”, domain knowledge would be described as abstractions rather than specific discipline-based instances. We will categorize domain knowledge into abstract forms of different levels of granularity and types (for example functional vs spatial). Some of these abstractions are: functional relationships between variables with either known or unknown parameters, structural relationships between variables represented as graphs, sparsity constraints, spatial relationships between variables in terms of sub-partitions in the data space, distribution information for either the variables or the underlying stochastic processes. The question then is, how should learning algorithms be modified to optimally include these types of domain knowledge. An interesting follow-up question would relate to how to quantify the improvements that can be achieved by the inclusion of domain knowledge. A simple example of this improvement can be seen in a class of matrix factorization problems known as network component analysis (NCA), where with structural information (under certain conditions), factorization is unique up to diagonal scaling, whereas, without this information, factorization is unique only up to a rotation matrix. We will focus on domain knowledge integration in terms of known equations, network structural constraints, input space partitions and their incorporation into machine learning algorithms that identify the underlying model structures such as PCA, KPCA and GPCA.
Private and Safe AI
Cryptographic vs Statistical notions of guaranteeing privacy in deployable A.I systems
Understanding Accuracy-Privacy-Computation tradeoff
Every prediction of a machine learning algorithm can potentially leak information about the underlying model. Furthermore, even if the end-user is reliable, there may be several types of privacy attacks and breaches. It is thus important to develop safe and secure AIbefore they are deployed. The goal of the project is to develop private and safe AI algorithms
AI for Edge
- Incorporating large scale A.I algorithms into edge devices with limited computational resources and other constraints including bandwidth, latency, power/energy requirements.
Several user centric applications can benefit if deep learning algorithms can be run on the edge. However, at the moment this seems a challenge because of hardware and software restrictions. To achieve a truly wide scale deployment of deep learning algorithms, one needs to make it work seamlessly in the edge as well. The goal of the project is to develop AI algorithms that work well in edge applications.
Professor, Dept of CSE, Dept of Biomedical Informatics, The Ohio State UniversityWebpage
Director, Center for Machine Learning, The University of Texas at Dallaswebpage
International education programs, conferences/seminar/webinar links
Conference on Deployable AI(2021 Jun 16-18) link
Thematic Semester on Explainable A.I 2 day bootcamp - Student reading groups, faculty talks etc.(2021 July-Dec)
Invited Talks by Eminent Speakers(2021 July-Dec)
End of Semester Event(Dec-Jan)