Difference between revisions of "Pix2code"

From edegan.com
Jump to navigation Jump to search
(Created page with "{{Project |Has title=Pix2code experimentation |Has owner=Hiep Nguyen, }} ==Brief Introduction== Pix2code is an AI model that can convert GUI images to DSL codes and then uses...")
(No difference)

Revision as of 17:41, 5 April 2019


Project
Pix2code
Project logo 02.png
Project Information
Has title Pix2code experimentation
Has owner Hiep Nguyen
Has start date
Has deadline date
Has project status
Copyright © 2019 edegan.com. All Rights Reserved.


Brief Introduction

Pix2code is an AI model that can convert GUI images to DSL codes and then uses a compiler to convert DSL code to HTML, Android XML, and iOS Storyboard. More details can be found here in the original paper. Instructions to train and use the models can be found on the original github page. There is an improved version of pix2code, which is pix2code2. It uses a Convolutional Neural Network (CNN) as an autoencoder for the GUI before training. The users also include a pre-trained model to experiment with.

Usage on RDP

Currently, source code and pre-trained model for pix2code are living on

E:/projects/pix2code_test

To generate DSL code from specific GUI images, first place the image on pix2code_test directory, then do the following

cd pix2code_test/model
./sample.py ../bin pix2code2 ../test_img.png ../code greedy #to use greedy algorithm, replace greedy with 1,2,3..,k to use beam search with size k.

The GUI code will be inside the pix2code_test/code directory

To generate GUI to HTML:

cd compiler
./web_compiler.py ./code/test_img.gui

Discussion

While pix2code can preserve the structure of the HTML page quite well, it cannot preserve the contents of the websites. Most of the texts from the original page are distorted in the generated DSL. Moreover, pix2code is extremely expensive to train and the current model only works for very simple GUIs that are similar to ones in the training set. Hence, pix2code model would not be suited for building an information extractor. However, we can learn from the source code how to input and structure GUI data and construct LSTM networks on top of GUI and output DSL code.