# Node Embedding

## Overview

A graph is defined by $$G = V \times E$$, where $$V = \{v_i\}$$ is the set of all vertex (nodes), and $$E=\{e_i\}$$ is the set of all edges.

For directed graph, the task is often to learn one embedding for a particular node $$v_i$$, namely, $$\mathbf{u}_i$$.

For undirected graph, the task is often to learn two embeddings for a particular node $$v_i$$, the out-going (source) embedding $$\mathbf{s}_i$$, and the in-going (destination) embedding $$\mathbf{t}_i$$.

## Objective for Learning the Embedding

### Learning Embedding that Represents the Local Connectivity (homophily) of Nodes

#### Undirected Graph

For undirected graph, the connectivity from $$v_i$$ to $$v_j$$ can be defined as the probability of predicting $$v_j$$ given $$v_i$$: $p(v_j|v_i) = \frac{\exp(\mathbf{u}_i \cdot \mathbf{u}_j)}{\sum_{v_n \in G} \exp(\mathbf{u}_i \cdot \mathbf{u}_n)}$ The dot product can be seen as a score to indicate the connectivity from $$v_i$$ to $$v_j$$. The training objective is to maximize $$p(v_j|v_i)$$ for every $$v_j$$ among the neighbors of $$v_i$$ (i.e., $$N(v_i)$$, and all the nodes $$v_i$$ in $$V$$. $\mathcal{L} = \mathbb{E}_{v_i \in V} \mathbb{E}_{v_j \in N(v_i)} \left[ \log p(v_j|v_i) \right] = \mathbb{E}_{v_i \in V} \mathbb{E}_{v_j \in N(v_i)} \left[ \log\frac{\exp(\mathbf{u}_i \cdot \mathbf{u}_j)}{\sum_{v_n \in G} \exp(\mathbf{u}_i \cdot \mathbf{u}_n)} \right]$ The partition function of $$p(v_j|v_i)$$ is hard to evaluate. For learning an embedding, negative sampling could be adopted to train the above objective. If $$p(v_i)$$ and $$p(v_j|v_i)$$ are uniform, we have: $\tilde{\mathcal{L}} = \sum_{v_i \in V} \sum_{v_j \in N(v_i)} \left[ \log \frac{\exp(\mathbf{u}_i \cdot \mathbf{u}_j)}{\exp(\mathbf{u}_i \cdot \mathbf{u}_j) + 1} + \sum_{\tilde{v}_n \sim q(v) \\ n = 1 \dots k} \log\frac{1}{\exp(\mathbf{u}_i \cdot \tilde{\mathbf{u}}_n) + 1} \right]$ where $$q(v)$$ is a noise distribution that samples negative node samples for a given pair $$(v_i,v_j)$$. $$k$$ controls the number of negative samples for each pair $$(v_i,v_j)$$. The noise distribution is usually also uniform.

#### Directed Graph

In directed graph, the connectivity from $$v_i$$ to $$v_j$$ can be defined as the probability of predicting $$v_j$$ given $$v_i$$: $p(v_j|v_i) = \frac{\exp(\mathbf{s}_i \cdot \mathbf{t}_j)}{\sum_{v_n \in G} \exp(\mathbf{s}_i \cdot \mathbf{t}_n)}$

A relational matrix $$\mathbf{R}_r, \, r \in R$$ can be used to replace the dot-product in homogeneous graphs, which brings the probability of predicting $$v_j$$ given $$v_i$$:

$p(v_j|v_i,r) = \frac{\exp(\mathbf{s}_i^{\top} \mathbf{R}_r \mathbf{t}_j)}{\sum_{\tilde{r} \in R, v_n \in G} \exp(\mathbf{s}_i^{\top} \mathbf{R}_{\tilde{r}} \mathbf{t}_n)}$

The negative-sampling based training objective then becomes:

$\tilde{\mathcal{L}} = \sum_{v_i \in V} \sum_{r \in R} \sum_{v_j \in N_r(v_i)} \left[ \log \frac{\exp(\mathbf{u}_i^{\top} \mathbf{R}_r \mathbf{u}_j)}{\exp(\mathbf{u}_i^{\top} \mathbf{R}_r \mathbf{u}_j) + 1} + \sum_{\tilde{r}_n \in R \\ \tilde{v}_n \sim q_{\tilde{r}_n}(v) \\ n = 1 \dots k} \log\frac{1}{\exp(\mathbf{u}_i^{\top} \mathbf{R}_{\tilde{r}_n} \tilde{\mathbf{u}}_n) + 1} \right]$

where Yang et al. (2014) and Schlichtkrull et al. (2017) used diagonal $$\mathbf{R}_r$$ to reduce the parameters.

## Random Walk

• Monte-Carlo End-Point sampling method
• Balancing DFS and BFS
• Path sharing techniques: for a sampled random-walk path $$v_1, \dots, v_L$$, consider that
• Suffixes: $$v_i, \dots, v_L$$ are valid paths.
• Prefixes: $$v_1, \dots, v_i$$ are valid paths. [Powerwalk: Scalable personalized pagerank via random walks with vertexcentric decomposition.]
• Sliding windows: $$v_i, \dots, v_{i+W}$$ are valid paths. (Grover and Leskovec 2016)

# References

Grover, Aditya, and Jure Leskovec. 2016. “Node2vec: Scalable Feature Learning for Networks.” arXiv:1607.00653 [Cs, Stat], July. http://arxiv.org/abs/1607.00653.

Schlichtkrull, Michael, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2017. “Modeling Relational Data with Graph Convolutional Networks.” arXiv:1703.06103 [Cs, Stat], October. http://arxiv.org/abs/1703.06103.

Yang, Bishan, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014. “Embedding Entities and Relations for Learning and Inference in Knowledge Bases.” arXiv Preprint arXiv:1412.6575.