0%

Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks 논문 리뷰

Abstract

creating effective visualizations using expressive grammars


Screenshot 2021-05-20 at 17 05 21

End-to-End Trainable Neural Translation Model

  • visualization generation === language translation problem

    formulation

    Vega-Lite

    • mapping : data specifications → visualization specifications
    • in a declarative language

    training

    • a multilayered, attention-based encoder-decoder network

    • with long short-term memory units ( LSTM )

      LSTM

    • on a corpus of visualization specifications


learns…

  • the vocabulary and syntax for a valid visualization specification
  • appropriate transformations
    • e.g. count, bins, mean
  • how to use common data selection patterns that occur within data visualizations

Problem Formulation

applies deep learning for translation and synthesis


  1. data visualization problem === sequence to sequence models ( seq2seq )

    • input sequence : dataset
      • e.g. fields, values in json format
    • output sequence : valid Vega-Lite visualization specification
  2. sequence translation === encoder-decoder networks

    • encoder : reads and encodes a source sequence into a fixed length vector
    • decoder : outputs a translation based on this vector

    👉 jointly trained to maximize the probability of outputting a correct translation

seq2seq

  1. generates data that is sequential or temporally dependent

    • e.g. language translation
  2. finds applications for problems where the output or input is non-sequential

    • e.g. text summarization, image captioning

    👉 bidirectional RNN units

    👉 attention mechanisms


Details

ordinary RNN ( unidirectional )

  1. reads an input sequence x from the first token x_1 to the last x_m

  2. generates an encoding only based on the preceding tokens it has seen


Bidirectional RNN ( BiRNN )

  • consists of a forward RNN + a backward RNN

  • enables an encoding generation based on both the preceding and following tokens

  1. a forward RNN ( →f )

    1. reads the input sequence as it is ordered from x_1 to x_m
    2. calculates a sequence of forward hidden states (→h_1, …, →h_m)
  2. a backward RNN ( ←f )

    1. reads the sequence in the reverse order from x_m to x_1
    2. calculates a sequence of backward hidden states (←h_1, …, ←h_m)
  3. generates a hidden state →h_j

    →h_j

    • a concatenation of both the forward and backward RNNs ( h_j : [→h_j(T); ←h_j(T)]T )
    • contains summaries of both the preceeding and following tokens

attention mechanism

  • focuses on aspects of an input sequence
  • generates output tokens
  • makes translation models robust to performance degradation
  • generates lengthy sequences
  • enables model to learn mappings between source and target sequences of different lengths
  • improves the ability to interpret and debug sequence to sequence models as providing valuable insights on why a given token is generated at each step

  • e.g.

    image captioning

    • model focuses on specific parts of objects in an image
    • generates each word or token in the image caption

👇 enables us to use a sequence translation model which…

  1. first takes into consideration the entire data input (dataset)
  2. and then focuses on aspects of the input (fields) in generating a visualization specification

Model

Encoder-Decoder Architecture with Attention mechanism

  1. 2-layer Encoder

    • bidirectional recurrent neural network (RNN)
      • takes in an input sequence of source tokens x
      • and outputs a sequence of states h
  2. 2-layer Decoder

    • RNN

      • computes the probability of a target sequence y based on the hidden state h

      probability

      • generated based on the recurrent state of the decoder RNN, previous tokens in the target sequence and a context vector c_i

      context vector === attention vector

      • a weighted average of the source states
      • and designed to capture the context of source sequence that help predict the current target token
  3. each with 512 Long Short-Term Memory (LSTM) units (cells)

    • better than Gated Recurrent Unit (GRU)

Data and Preprocessing

Learning Objectives

model should…

  1. select a subset of fields to focus on, when creating visualizations

    • most datasets have multiple fields which cannot all be simultaneously visualized
  2. learn differences in data types across the data fields

    • e.g. numeric, string, temporal, ordinal, categorical, etc.
  3. learn the appropriate transformations to apply to a field given its data type

    • e.g. aggregate transform does not apply to string fields

    👇

    Vega-Lite Grammer

  • view-level transforms (aggregate, bin, calculate, filter, timeUnit)
    • field-level transforms (aggregate, bin, sort, timeUnit)

Achieving Objectives

character based sequence model

  • Challenge : a character tokenization strategy requires

    • more units to represent a sequence
    • a large amount of hidden layers
    • a large amount of parameters to model long term dependencies
  • Solution

    1. replace string and numeric field names using a short notation

      • e.g.* “str”, “num” in the source sequence (dataset)
    2. a similar backward transformation is done in the target sequence

      • to maintain consistency in field names

    👇

    • scaffolds the learning process by reducing the vocabulary size
    • prevents the LSTM from learning field names (which are not needed to be memorized)