Jonathan Stokes

Research Overview


My PhD research developed estimates of the cost of using various algorithms to find high degree nodes in large graphs. This research is related to the problem of finding influential people in a social network or a particular file in a peer-to-peer network. A network is an example of a graph where the graph's nodes are the intersections of the network's links or edges and a high degree node is a large intersection of edges. A graph is a mathematical object just as a matrix is a mathematical object. In small graphs the problem of finding high degree nodes in a graph has a straightforward solution, count the edges connected to each node. In a graph with billions or trillions of nodes, however, the time required to examine each node may be cost prohibitive.

How should one then approach this problem? One option is to sample or search a small portion of the graph and return the highest degree nodes found. This may not return the highest degree nodes in the graph, but if we cannot search the entire graph it may be the best we can do.

This leads to the question, given a graph sampling algorithm $X$ how much of the graph should $X$ sample? How much of the graph should we sample with $X$ before we might expect to find the highest degree node in the graph? Graphs are complex objects with many properties including, number of nodes, number of edges, degree distribution, degree assortativity, etc. The size of the graph sample required for $X$ to find the highest degree nodes likely depends on these graph properties. But these properties of the graph may themselves be unknowable without first examining the entire graph.

So, we have an intractable problem? If the problem is stated in this form, yes. However, say we add restrictions to the class of graphs we are considering. Say we only consider Erdos Renyi (ER) graphs where given $N$ nodes, edges are placed between each node $n_i$ and every other node $n_j$ for $i$ not equal $j$ with probability $p$. Given an ER graph with $N$ nodes and $p$ edge probability many of the expected properties of the graph can be calculated. For instance, the graph has about $N(N-1)\frac{p}{2}$ edges, a binomial degree distribution, and zero degree assortativity.

My research estimated the average number of steps or samples required for two kinds of algorithms to find the highest degree nodes in an ER graph. The number of samples is another metric for the computational cost or how large a sample of the graph it is expected that algorithm $X$ will have to take to find a graphs highest degree nodes. I looked at two classes of graph search algorithms: random walk, and star sampling, the latter being a form of random sampling. I contributed to the state of the art by developing estimates of the expected number of samples required to find the maximum degree node in an ER graph given only parameters $N$ and $p$ for instances of random walk and star sampling algorithms.

Professional Interests


I have varying levels of expertise in the following, this is more of a list of fields I think would be interesting to work in or learn more about: Artificial Intelligence, Blockchain, Cryptography, Data Science, Graph Sampling, Graph Theory, Information Theory, Machine Learning, Optimization, Signal Processing.

Education


Drexel, PhD. in Electrical Engineering 2018

Drexel, MS in Electrical Engineering 2015

University of Rhode Island, BS in Electrical Engineering 2012

Reed College, BA in Philosophy 2007

Academic Papers


Conference Papers

A Markov chain model for the search time for max degree nodes in a graph using a biased random walk - Stokes and Weber (CISS 2016)

On random walks and random sampling to find max degree nodes in assortative Erdos Renyi graphs - Stokes and Weber (Globecom 2016)

The self-avoiding walk-jump (SAWJ) algorithm for finding maximum degree nodes in large graphs - Stokes and Weber (Big Data 2016)

On the number of star samples to find a vertex or edge with given degree in a graph - Stokes and Weber (CISS 2017)

Online estimation for finding a near-maximum value in a large list of numerical data - Stokes and Weber (CISS 2018)

Workshop Papers

Star sampling with and without replacement - Stokes and Weber (KDD Workshop 2017)

Journal Letters and Papers

Common greedy wiring and rewiring heuristics do not guarantee maximum assortative graphs of given degree - Stokes and Weber (IPL 2018)

Graph search via sampling with and without replacement - Stokes and Weber (Journal of Internet Mathematics 2020)

PhD. Thesis

Performance of random walks and sampling for graph search (2018)

Assorted Links


My code is on Github

My resume

Crypto notes

Maze generation notes

The transition matrix problem

Renters and Homeowners

Pricing Related Goods

Expected Improvement Algorithm

XGBoost Test

LSTM Document Classification