Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

About me

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

Optimizing F-Measures by Cost-Sensitive Classification

Published in NIPS, 2014

We present a theoretical analysis of F-measures for binary, multiclass and multilabel classification. These performance measures are non-linear, but in many scenarios they are pseudo-linear functions of the per-class false negative/false positive rate. Based on this observation, we present a general reduction of F-measure maximization to cost-sensitive classification with unknown costs. We then propose an algorithm with provable guarantees to obtain an approximately optimal classifier for the F-measure by solving a series of cost-sensitive classification problems. The strength of our analysis is to be valid on any dataset and any class of classifiers, extending the existing theoretical results on F-measures, which are asymptotic in nature. We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various F-measure optimization tasks.

Download here

On Inferring the Time-Varying Traffic Connectivity Structures of an Urban Environment

Published in Proc. of the 4th International Workshop on Urban Computing (UrbComp 2015) in conjunction with KDD, 2015

Road networks shape traffic mobility in a city. These dynamics are often represented as traffic flows in and out of defined urban travel zones. The functional dynamics of traffic zones can be represented by time-dependant correlations between time series of traffic flows in and out of these zones. In this paper we address the question: given the dense timevarying functional correlations of traffic flow in a city, how can we derive a sparse representation that explains the timevarying structural connectivity of traffic zones in a city? We call this sparse representation the time-varying effective traffic connectivity of the city. We formulate an optimization problem to infer the sparse effective traffic network from dense functional correlations of traffic flow for arbitrary levels of temporal granularity, and demonstrate the results for the city of Doha, Qatar on data collected from several hundred bluetooth sensors deployed across the city to record vehicular activity through the city’s traffic zones. Preliminary experiments suggest that our framework can be used by urban transportation experts and policy specialists to take a real time data-driven approach towards urban planning and real time traffic planning in the city, especially at the level of administrative zones of a city.

Download here

A coverage-based approach to recommendation diversity on similarity graph

Published in RecSys, 2016

We consider the problem of generating diverse, personalized recommendations such that a small set of recommended items covers a broad range of the users interests. We represent items in a similarity graph, and we formulate the relevance/diversity trade-off as finding a small set of unrated items that best covers a subset of items positively rated by the user. In contrast to previous approaches, our method does not rely on an explicit trade-off between a relevance objective and a diversity objective, as the estimations of relevance and diversity are implicit in the coverage criterion. We show on several benchmark datasets that our approach compares favorably to the state-of-the-art diversification methods according to various relevance and diversity measures.

Download here

Effective urban structure inference from traffic flow dynamics

Published in IEEE Transactions on Big Data, 2017

—Mobility in a city is represented as traffic flows in and out of defined urban travel or administrative zones. While the zones and the road networks connecting them are fixed in space, traffic flows between pairs of zones are dynamic through the day. Understanding these dynamics in real time is crucial for real time traffic planning in the city. In this paper, we use real time traffic flow data to generate dense functional correlation matrices between zones during different times of the day. Then, we derive optimal sparse representations of these dense functional matrices, that accurately recover not only the existing road network connectivity between zones, but also reveal new latent links between zones that do not yet exist but are suggested by traffic flow dynamics. We call this sparse representation the time-varying effective traffic connectivity of the city. A convex optimization problem is formulated and used to infer the sparse effective traffic network from time series data of traffic flow for arbitrary levels of temporal granularity. We demonstrate the results for the city of Doha, Qatar on data collected from several hundred bluetooth sensors deployed across the city to record vehicular activity through the city’s traffic zones. While the static road network connectivity between zones is accurately inferred, other long range connections are also predicted that could be useful in planning future road linkages in the city. Further, the proposed model can be applied to socio-economic activity other than traffic, such as new housing, construction, or economic activity captured as functional correlations between zones, and can also be similarly used to predict new traffic linkages that are latently needed but as yet do not exist. Preliminary experiments suggest that our framework can be used by urban transportation experts and policy specialists to take a real time data-driven approach towards urban planning and real time traffic planning in the city, especially at the level of administrative zones of a city

Download here

SAGA: A Submodular Greedy Algorithm for Group Recommendation

Published in AAAI, 2018

In this paper, we propose a unified framework and an algorithm for the problem of group recommendation where a fixed number of items or alternatives can be recommended to a group of users. The problem of group recommendation arises naturally in many real world contexts, and is closely related to the budgeted social choice problem studied in economics. We frame the group recommendation problem as choosing a subgraph with the largest group consensus score in a completely connected graph defined over the item affinity matrix. We propose a fast greedy algorithm with strong theoretical guarantees, and show that the proposed algorithm compares favorably to the state-of-the-art group recommendation algorithms according to commonly used relevance and coverage performance measures on benchmark dataset.

Download here

Popularity Agnostic Evaluation of Knowledge Graph Embeddings

Published in Conference on Uncertainty in Artificial Intelligence, 2020

In this paper, we show that the distribution of entities and relations in common knowledge graphs is highly skewed, with some entities and relations being much more popular than the rest. We show that while knowledge graph embedding models give state-of-the-art performance in many relational learning tasks such as link prediction, current evaluation metrics like hits@ k and mrr are biased towards popular entities and relations. We propose two new evaluation metrics, strat-hits@ k and strat-mrr, which are unbiased estimators of the true hits@ k and mrr when the items follow a power-law distribution. Our new metrics are generalizations of hits@ k and mrr that take into account the popularity of the entities and relations in the data, with a tuning parameter determining how much emphasis the metric places on popular vs. unpopular items. Using our metrics, we run experiments on benchmark datasets to show that the performance of embedding models degrades as the popularity of the entities and relations decreases, and that current reported results overestimate the performance of these models by magnifying their accuracy on popular items.

Download here

Simple and effective neural-free soft-cluster embeddings for item cold-start recommendations

Published in Data Mining and Knowledge Discovery, 2020

Recommender systems are widely used in online platforms for easy exploration of personalized content. The best available recommendation algorithms are based on using the observed preference information among collaborating entities. A significant challenge in recommender system continues to be item cold-start recommendation: how to effectively recommend items with no observed or past preference information. Here we propose a two-stage algorithm based on soft clustering to provide an efficient solution to this problem. The crux of our approach lies in representing the items as soft-cluster embeddings in the space spanned by the side-information associated with the items. Though many item embedding approaches have been proposed for item cold-start recommendations in the past—and simple as they might appear—to the best of our knowledge, the approach based on soft-cluster embeddings has not been proposed in the research literature. Our experimental results on four benchmark datasets conclusively demonstrate that the proposed algorithm makes accurate recommendations in item cold-start settings compared to the state-of-the-art algorithms according to commonly used ranking metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Average Precision (MAP). The performance of our proposed algorithm on the MovieLens 20M dataset clearly demonstrates the scalability aspect of our algorithm compared to other popular algorithms. We also propose the metric Cold Items Precision (CIP) to quantify the ability of a system to recommend cold-start items. CIP can be used in conjunction with relevance ranking metrics like NDCG and MAP to measure the effectiveness of the cold-start recommendation algorithm.

Download here

talks

teaching

Database Theory & Applications

Database Theory & Applications, University of Glasgow, School of Computing Science, 2021

Co-Instructor for the Database Theory & Applications course. I delivered classes on SQL and applications to recommender systems