Node Embedding

Overview

A graph is defined by \(G = V \times E\), where \(V = \{v_i\}\) is the set of all vertex (nodes), and \(E=\{e_i\}\) is the set of all edges.

For directed graph, the task is often to learn one embedding for a particular node \(v_i\), namely, \(\mathbf{u}_i\).

For undirected graph, the task is often to learn two embeddings for a particular node \(v_i\), the out-going (source) embedding \(\mathbf{s}_i\), and the in-going (destination) embedding \(\mathbf{t}_i\).

Objective for Learning the Embedding

Learning Embedding that Represents the Local Connectivity (homophily) of Nodes

Undirected Graph

For undirected graph, the connectivity from \(v_i\) to \(v_j\) can be defined as the probability of predicting \(v_j\) given \(v_i\): \[ p(v_j|v_i) = \frac{\exp(\mathbf{u}_i \cdot \mathbf{u}_j)}{\sum_{v_n \in G} \exp(\mathbf{u}_i \cdot \mathbf{u}_n)} \] The dot product can be seen as a score to indicate the connectivity from \(v_i\) to \(v_j\). The training objective is to maximize \(p(v_j|v_i)\) for every \(v_j\) among the neighbors of \(v_i\) (i.e., \(N(v_i)\), and all the nodes \(v_i\) in \(V\). \[ \mathcal{L} = \mathbb{E}_{v_i \in V} \mathbb{E}_{v_j \in N(v_i)} \left[ \log p(v_j|v_i) \right] = \mathbb{E}_{v_i \in V} \mathbb{E}_{v_j \in N(v_i)} \left[ \log\frac{\exp(\mathbf{u}_i \cdot \mathbf{u}_j)}{\sum_{v_n \in G} \exp(\mathbf{u}_i \cdot \mathbf{u}_n)} \right] \] The partition function of \(p(v_j|v_i)\) is hard to evaluate. For learning an embedding, negative sampling could be adopted to train the above objective. If \(p(v_i)\) and \(p(v_j|v_i)\) are uniform, we have: \[ \tilde{\mathcal{L}} = \sum_{v_i \in V} \sum_{v_j \in N(v_i)} \left[ \log \frac{\exp(\mathbf{u}_i \cdot \mathbf{u}_j)}{\exp(\mathbf{u}_i \cdot \mathbf{u}_j) + 1} + \sum_{\tilde{v}_n \sim q(v) \\ n = 1 \dots k} \log\frac{1}{\exp(\mathbf{u}_i \cdot \tilde{\mathbf{u}}_n) + 1} \right] \] where \(q(v)\) is a noise distribution that samples negative node samples for a given pair \((v_i,v_j)\). \(k\) controls the number of negative samples for each pair \((v_i,v_j)\). The noise distribution is usually also uniform.

Directed Graph

In directed graph, the connectivity from \(v_i\) to \(v_j\) can be defined as the probability of predicting \(v_j\) given \(v_i\): \[ p(v_j|v_i) = \frac{\exp(\mathbf{s}_i \cdot \mathbf{t}_j)}{\sum_{v_n \in G} \exp(\mathbf{s}_i \cdot \mathbf{t}_n)} \]

A relational matrix \(\mathbf{R}_r, \, r \in R\) can be used to replace the dot-product in homogeneous graphs, which brings the probability of predicting \(v_j\) given \(v_i\):

\[ p(v_j|v_i,r) = \frac{\exp(\mathbf{s}_i^{\top} \mathbf{R}_r \mathbf{t}_j)}{\sum_{\tilde{r} \in R, v_n \in G} \exp(\mathbf{s}_i^{\top} \mathbf{R}_{\tilde{r}} \mathbf{t}_n)} \]

The negative-sampling based training objective then becomes:

\[ \tilde{\mathcal{L}} = \sum_{v_i \in V} \sum_{r \in R} \sum_{v_j \in N_r(v_i)} \left[ \log \frac{\exp(\mathbf{u}_i^{\top} \mathbf{R}_r \mathbf{u}_j)}{\exp(\mathbf{u}_i^{\top} \mathbf{R}_r \mathbf{u}_j) + 1} + \sum_{\tilde{r}_n \in R \\ \tilde{v}_n \sim q_{\tilde{r}_n}(v) \\ n = 1 \dots k} \log\frac{1}{\exp(\mathbf{u}_i^{\top} \mathbf{R}_{\tilde{r}_n} \tilde{\mathbf{u}}_n) + 1} \right] \]

where Yang et al. (2014) and Schlichtkrull et al. (2017) used diagonal \(\mathbf{R}_r\) to reduce the parameters.

Random Walk

  • Monte-Carlo End-Point sampling method
  • Balancing DFS and BFS
  • Path sharing techniques: for a sampled random-walk path \(v_1, \dots, v_L\), consider that
    • Suffixes: \(v_i, \dots, v_L\) are valid paths.
    • Prefixes: \(v_1, \dots, v_i\) are valid paths. [Powerwalk: Scalable personalized pagerank via random walks with vertexcentric decomposition.]
    • Sliding windows: \(v_i, \dots, v_{i+W}\) are valid paths. (Grover and Leskovec 2016)

References

Grover, Aditya, and Jure Leskovec. 2016. “Node2vec: Scalable Feature Learning for Networks.” arXiv:1607.00653 [Cs, Stat], July. http://arxiv.org/abs/1607.00653.

Schlichtkrull, Michael, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2017. “Modeling Relational Data with Graph Convolutional Networks.” arXiv:1703.06103 [Cs, Stat], October. http://arxiv.org/abs/1703.06103.

Yang, Bishan, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014. “Embedding Entities and Relations for Learning and Inference in Knowledge Bases.” arXiv Preprint arXiv:1412.6575.