Load built-in and ported datasets from TGB¶

This tutorial shows you how to load built-in datasets

In [ ]:

Copied!

import tgx
import tgx

Access TGB datasets¶

In order to load TGB datasets you should first install the TGB package:

pip install py-tgb

Then write name of the dataset in the parantheses:

tgx.data.tgb("name")

The dataset names are as follow

tgbl-wiki, tgbl-review, tgbl-coin, tgbl-comment, tgbl-flight

tgbn-trade, tgbn-genre, tgbn-reddit

In [2]:

Copied!

data_name = "tgbl-wiki" 
dataset = tgx.tgb_data(data_name) #tgb datasets
ctdg = tgx.Graph(dataset)
data_name = "tgbl-wiki" 
dataset = tgx.tgb_data(data_name) #tgb datasets
ctdg = tgx.Graph(dataset)

raw file found, skipping download
Dataset directory is  /mnt/f/code/TGB/tgb/datasets/tgbl_wiki
loading processed file
Number of loaded edges: 157474
Number of unique edges:18257
Available timestamps:  152757

Access other datasets¶

To load built-in TGX datasets (from Poursafaei et al. 2022). You can write the name of the dataset instead of datasest_name:

tgx.data.dataset_name

The dataset names are as:

mooc, uci, uslegis, unvote, untrade, flight, wikipedia, reddit, lastfm, contact, canparl, socialevo, enron

In [3]:

Copied!

dataset = tgx.builtin.uci()
ctdg = tgx.Graph(dataset)
dataset = tgx.builtin.uci()
ctdg = tgx.Graph(dataset)

Number of loaded edges: 59835
Number of unique edges:20296
Available timestamps:  58911

Custom Datasets¶

You can load your own custom dataset from .csv files and read it into a tgx.Graph object

Let's start by loading a toy dataset into pandas and then visualize the rows

In [4]:

Copied!





import pandas as pd
toy_fname = 'toy_data.csv'
df = pd.read_csv(toy_fname)
df
import pandas as pd
toy_fname = 'toy_data.csv'
df = pd.read_csv(toy_fname)
df

Out[4]:

	time	source	destination
0	0	1	2
1	0	2	1
2	0	3	1
3	1	2	2
4	1	1	2
5	1	3	1

In [5]:

Copied!





from tgx.io.read import read_csv
# header indicates if there is a header row at the top
# index whether the first column is row indices
# t_col indicates which column corresponds to timestamps
edgelist = read_csv(toy_fname, 
         header=True,
         index=False,
         t_col=0,)
tgx.Graph(edgelist=edgelist)
from tgx.io.read import read_csv
# header indicates if there is a header row at the top
# index whether the first column is row indices
# t_col indicates which column corresponds to timestamps
edgelist = read_csv(toy_fname, 
         header=True,
         index=False,
         t_col=0,)
tgx.Graph(edgelist=edgelist)

Number of loaded edges: 5
Number of unique edges: 4
Available timestamps:  2

Out[5]:

<tgx.classes.graph.Graph at 0x7fde4755aca0>

Subsampling graphs¶

To perform subsmpling graphs you should follow these steps:

descritize the data
create a graph object of data (G)
subsample the graph by tgx.utils.graph_utils.subsampling
create a new graph from the subsampled subgraph

In [6]:

Copied!

from tgx.utils.graph_utils import subsampling

sub_edges = subsampling(ctdg, selection_strategy="random", N=1000) #N is # of nodes to be sampled 
subgraph = tgx.Graph(edgelist=sub_edges)
from tgx.utils.graph_utils import subsampling

sub_edges = subsampling(ctdg, selection_strategy="random", N=1000) #N is # of nodes to be sampled 
subgraph = tgx.Graph(edgelist=sub_edges)

Generate graph subsample...