Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks 논문 리뷰

Abstract

creating effective visualizations using expressive grammars

End-to-End Trainable Neural Translation Model

visualization generation === language translation problem

formulation

Vega-Lite
- mapping : data specifications → visualization specifications
- in a declarative language
training
- a multilayered, attention-based encoder-decoder network
- with long short-term memory units ( LSTM )
- on a corpus of visualization specifications

learns…

the vocabulary and syntax for a valid visualization specification
appropriate transformations
- e.g. count, bins, mean
how to use common data selection patterns that occur within data visualizations

Problem Formulation

applies deep learning for translation and synthesis

data visualization problem === sequence to sequence models ( seq2seq )
- input sequence : dataset
  - e.g. fields, values in json format
- output sequence : valid Vega-Lite visualization specification
sequence translation === encoder-decoder networks
- encoder : reads and encodes a source sequence into a fixed length vector
- decoder : outputs a translation based on this vector
👉 jointly trained to maximize the probability of outputting a correct translation

seq2seq

generates data that is sequential or temporally dependent

e.g. language translation

finds applications for problems where the output or input is non-sequential

e.g. text summarization, image captioning

👉 bidirectional RNN units

👉 attention mechanisms

Details

ordinary RNN ( unidirectional )

reads an input sequence x from the first token x_1 to the last x_m

generates an encoding only based on the preceding tokens it has seen

Bidirectional RNN ( BiRNN )

consists of a forward RNN + a backward RNN

enables an encoding generation based on both the preceding and following tokens

a forward RNN ( →f )

reads the input sequence as it is ordered from x_1 to x_m

calculates a sequence of forward hidden states (→h_1, …, →h_m)

a backward RNN ( ←f )

reads the sequence in the reverse order from x_m to x_1

calculates a sequence of backward hidden states (←h_1, …, ←h_m)

generates a hidden state →h_j

→h_j

a concatenation of both the forward and backward RNNs ( h_j : [→h_j(T); ←h_j(T)]T )

contains summaries of both the preceeding and following tokens

attention mechanism

focuses on aspects of an input sequence

generates output tokens

makes translation models robust to performance degradation

generates lengthy sequences

enables model to learn mappings between source and target sequences of different lengths

improves the ability to interpret and debug sequence to sequence models as providing valuable insights on why a given token is generated at each step

e.g.

image captioning

model focuses on specific parts of objects in an image

generates each word or token in the image caption

👇 enables us to use a sequence translation model which…

first takes into consideration the entire data input (dataset)
and then focuses on aspects of the input (fields) in generating a visualization specification

Model

Encoder-Decoder Architecture with Attention mechanism

2-layer Encoder
- bidirectional recurrent neural network (RNN)
  - takes in an input sequence of source tokens x
  - and outputs a sequence of states h
2-layer Decoder
- RNN
  - computes the probability of a target sequence y based on the hidden state h
  probability
  - generated based on the recurrent state of the decoder RNN, previous tokens in the target sequence and a context vector c_i
  context vector === attention vector
  - a weighted average of the source states
  - and designed to capture the context of source sequence that help predict the current target token
each with 512 Long Short-Term Memory (LSTM) units (cells)
- better than Gated Recurrent Unit (GRU)

Data and Preprocessing

Learning Objectives

model should…

select a subset of fields to focus on, when creating visualizations
- most datasets have multiple fields which cannot all be simultaneously visualized
learn differences in data types across the data fields
- e.g. numeric, string, temporal, ordinal, categorical, etc.
learn the appropriate transformations to apply to a field given its data type
- e.g. aggregate transform does not apply to string fields
👇

Vega-Lite Grammer

view-level transforms (aggregate, bin, calculate, filter, timeUnit)
- field-level transforms (aggregate, bin, sort, timeUnit)

Achieving Objectives

character based sequence model

Challenge : a character tokenization strategy requires
- more units to represent a sequence
- a large amount of hidden layers
- a large amount of parameters to model long term dependencies
Solution
1. replace string and numeric field names using a short notation
  - e.g.* “str”, “num” in the source sequence (dataset)
2. a similar backward transformation is done in the target sequence
  - to maintain consistency in field names
👇
- scaffolds the learning process by reducing the vocabulary size
- prevents the LSTM from learning field names (which are not needed to be memorized)