The current scripts that I wrote by following pix2code source code are living on
E:/projects/embedding
So far, I have been experimenting with only one simple DSL file, which is '00CDC9A8-3D73-4291-90EF-49178E408797.gui'. To see the current output (not yet one-hot), write
python convert_gui.py
What we just did is opening a DSL file, going through every single line, stripping some symbols and store all the tokens in a list. The ''tokens'' variable now looks something like this
tokens
['header ',
'btn-inactive, btn-active, btn-inactive, btn-inactive, btn-inactive',
''
]
Now, based on this list, to see the total number of tokens we can do
chars = sorted(list(set(tokens)))
which results in
['',
'btn-inactive, btn-active, btn-inactive, btn-inactive, btn-inactive',
'header ',
'quadruple ',
'row ',
'single ',
'small-title, text, btn-green',
'small-title, text, btn-orange',
'small-title, text, btn-red']
As we can see, we have 9 elements in this example, which means the length of each vector would be 9. Now, we need to assign a number for each of the symbol, and the number will indicate the index of that element in the vector.
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))
This results in
char_indices
{'': 0,
'btn-inactive, btn-active, btn-inactive, btn-inactive, btn-inactive': 1,
'header ': 2,
'quadruple ': 3,
'row ': 4,
'single ': 5,
'small-title, text, btn-green': 6,
'small-title, text, btn-orange': 7,
'small-title, text, btn-red': 8}
Hence, if we have a line with token 'header', the one-hot representation of it is [0,0,1,0,0,0,0,0,0]. There is a '1' at index 3, which indicates that 3 is there.
Now, let's apply this embedding rule to our GUI file
sentences=[]
for i in range(0, len(tokens)):
sentences.append(tokens[i])
one_hot_vector = np.zeros((len(sentences),len(chars)))
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentences):
one_hot_vector[t, char_indices[char]] = 1
The vector that represents our GUI will be something like this.
array([[0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1.],
[1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0.]])