Post Doc researcher, Microsoft Research
Node Sampling from Large Networks with Restricted Access
Networks are a natural representation for data in a variety of settings, ranging from social to biological to information domains. However, many of these real-world networks are massive which makes it computationally infeasible to study the entire network. Thus, network sampling provides the foundation for analysis of such networks. Some networks such as Google+, Facebook, and LinkedIn are not completely visible to the public. Moreover, their restrictive local-neighborhood- only access interfaces make it extremely difficult to crawl all data, as a complete crawl requires as many queries as the number of nodes in the graph. Allowing only local neighborhood queries makes random walk based Monte Carlo Markov Chain (MCMC) methods an ideal fit for the sampling of node from such large networks. A critical problem with the existing random walk techniques, however, is the long burn-in period it requires and, therefore, the significant query cost it incurs for drawing a sample.
In this poster, I represent a general purpose, technique for faster sampling of nodes in a graph. Specifically, unlike traditional random walk, which waits for the convergence of sampling distribution to a predetermined target distribution, I develop WALK-ESTIMATE algorithm, which starts with a much shorter random walk, and then proactively estimate the sampling probability for the node taken before using acceptance-rejection sampling to adjust the sampling probability to the predetermined target distribution. I present a backward random walk technique, which provides provably unbiased estimations for the sampling probability. I demonstrate the superiority of WALK-ESTIMATE over traditional random walks through theoretical analysis and extensive experiments over real world networks.
Azade Nazi recently joined Microsoft research as a Post Doc researcher. She graduated and got her Ph.D. in computer science under the supervision of Dr. Gautam Das from University of Texas, Arlington. Her research interests are in the broad areas of social network analysis, large graph mining, algorithm design, data mining, and machine learning. Her Ph.D. research has focused on novel problems of large graph sampling, query hidden attributes, and personalized user feedback system. She was awarded as “Outstanding Ph.D.” student for her Ph.D works. She was a research intern at Microsoft Research two times during summer 2015 and 2016.