Abstract
creating effective visualizations using expressive grammars

End-to-End Trainable Neural Translation Model
visualization generation === language translation problem
formulation
Vega-Lite
- mapping : data specifications → visualization specifications
- in a declarative language
training
a multilayered, attention-based
encoder-decoder
networkwith long short-term memory units (
LSTM
)on a corpus of visualization specifications
learns…
- the
vocabulary
andsyntax
for a valid visualization specification - appropriate
transformations
- e.g. count, bins, mean
- how to use common
data selection patterns
that occur within data visualizations
Problem Formulation
applies deep learning for translation and synthesis
data visualization problem === sequence to sequence models (
seq2seq
)input
sequence : dataset- e.g. fields, values in json format
output
sequence : validVega-Lite
visualization specification
sequence translation === encoder-decoder networks
encoder
: reads and encodes a source sequence intoa fixed length vector
decoder
: outputs atranslation
based on this vector
👉 jointly trained to maximize the probability of outputting a correct translation
seq2seq
generates data that is sequential or temporally dependent
- e.g. language translation
finds applications for problems where the output or input is non-sequential
- e.g. text summarization, image captioning
👉 bidirectional RNN units
👉 attention mechanisms
Details
ordinary RNN (
unidirectional
)
reads an input sequence x from the first token x_1 to the last x_m
generates an encoding only based on the preceding tokens it has seen
Bidirectional RNN (
BiRNN
)
consists of a forward RNN + a backward RNN
enables an encoding generation based on both the preceding and following tokens
a forward RNN (
→f
)
- reads the input sequence as it is ordered from x_1 to x_m
- calculates a sequence of forward hidden states (→h_1, …, →h_m)
a backward RNN (
←f
)
- reads the sequence in the reverse order from x_m to x_1
- calculates a sequence of backward hidden states (←h_1, …, ←h_m)
generates a hidden state
→h_j
→h_j
- a concatenation of both the forward and backward RNNs (
h_j
:[→h_j(T); ←h_j(T)]T
)- contains summaries of both the preceeding and following tokens
attention mechanism
- focuses on aspects of an input sequence
- generates output tokens
- makes translation models robust to performance degradation
- generates lengthy sequences
- enables model to learn mappings between source and target sequences of different lengths
- improves the ability to interpret and debug sequence to sequence models as providing valuable insights on why a given token is generated at each step
e.g.
image captioning
- model focuses on specific parts of objects in an image
- generates each word or token in the image caption
👇 enables us to use a sequence translation model which…
- first takes into consideration the entire data input (dataset)
- and then focuses on aspects of the input (fields) in generating a visualization specification
Model
Encoder-Decoder Architecture with Attention mechanism
2-layer
Encoder
- bidirectional recurrent neural network (RNN)
- takes in an input sequence of source tokens x
- and outputs a sequence of states h
- bidirectional recurrent neural network (RNN)
2-layer
Decoder
RNN
- computes the probability of a target sequence y based on the hidden state h
probability
- generated based on the recurrent state of the decoder RNN, previous tokens in the target sequence and a context vector
c_i
context vector
===attention vector
- a weighted average of the source states
- and designed to capture the context of source sequence that help predict the current target token
each with 512
Long Short-Term Memory
(LSTM) units (cells)- better than Gated Recurrent Unit (GRU)
Data and Preprocessing
Learning Objectives
model should…
select a subset of fields to focus on, when creating visualizations
- most datasets have multiple fields which cannot all be simultaneously visualized
learn differences in data types across the data fields
- e.g. numeric, string, temporal, ordinal, categorical, etc.
learn the appropriate transformations to apply to a field given its data type
- e.g. aggregate transform does not apply to string fields
👇
Vega-Lite Grammer
- view-level transforms (aggregate, bin, calculate, filter, timeUnit)
- field-level transforms (aggregate, bin, sort, timeUnit)
Achieving Objectives
character based sequence model
Challenge : a character tokenization strategy requires
- more units to represent a sequence
- a large amount of hidden layers
- a large amount of parameters to model long term dependencies
Solution
replace string and numeric field names using a short notation
- e.g.* “str”, “num” in the source sequence (dataset)
a similar backward transformation is done in the target sequence
- to maintain consistency in field names
👇
- scaffolds the learning process by reducing the vocabulary size
- prevents the LSTM from learning field names (which are not needed to be memorized)