Jonathan Stokes

Research Overview

My PhD research developed estimates of the cost of using various algorithms to find high degree nodes in large graphs. This research is related to the problem of finding influential people in a social network or a particular file in a peer-to-peer network. A network is an example of a graph where the graph's nodes are the intersections of the network's links or edges and a high degree node is a large intersection of edges. A graph is a mathematical object just as a matrix is a mathematical object. In small graphs the problem of finding high degree nodes in a graph has a straightforward solution, count the edges connected to each node. In a graph with billions or trillions of nodes, however, the time required to examine each node may be cost prohibitive.

How should one then approach this problem? One option is to sample or search a small portion of the graph and return the highest degree nodes found. This may not return the highest degree nodes in the graph, but if we cannot search the entire graph it may be the best we can do.

This leads to the question, given a graph sampling algorithm $X$ how much of the graph should $X$ sample? How much of the graph should we sample with $X$ before we might expect to find the highest degree node in the graph? Graphs are complex objects with many properties including, number of nodes, number of edges, degree distribution, degree assortativity, etc. The size of the graph sample required for $X$ to find the highest degree nodes likely depends on these graph properties. But these properties of the graph may themselves be unknowable without first examining the entire graph.

So, we have an intractable problem? If the problem is stated in this form, yes. However, say we add restrictions to the class of graphs we are considering. Say we only consider Erdos Renyi (ER) graphs where given $N$ nodes, edges are placed between each node $n_i$ and every other node $n_j$ for $i$ not equal $j$ with probability $p$. Given an ER graph with $N$ nodes and $p$ edge probability many of the expected properties of the graph can be calculated. For instance, the graph has about $N(N-1)\frac{p}{2}$ edges, a binomial degree distribution, and zero degree assortativity.

My research estimated the average number of steps or samples required for two kinds of algorithms to find the highest degree nodes in an ER graph. The number of samples is another metric for the computational cost or how large a sample of the graph it is expected that algorithm $X$ will have to take to find a graphs highest degree nodes. I looked at two classes of graph search algorithms: random walk, and star sampling, the latter being a form of random sampling. I contributed to the state of the art by developing estimates of the expected number of samples required to find the maximum degree node in an ER graph given only parameters $N$ and $p$ for instances of random walk and star sampling algorithms.

Professional Interests

I have varying levels of expertise in the following, this is more of a list of fields I think would be interesting to work in or learn more about: Artificial Intelligence, Blockchain, Cryptography, Data Science, Graph Sampling, Graph Theory, Information Theory, Machine Learning, Optimization, Signal Processing.

Education

Drexel, PhD. in Electrical Engineering 2018

Drexel, MS in Electrical Engineering 2015

University of Rhode Island, BS in Electrical Engineering 2012

Reed College, BA in Philosophy 2007

Academic Papers

Jonathan Stokes

Research Overview

Professional Interests

Education

Academic Papers

Conference Papers

Workshop Papers

Journal Letters and Papers

PhD. Thesis

Assorted Links