So far, I have been experimenting with only one DSL file, which is '00CDC9A8-3D73-4291-90EF-49178E408797.gui'. To see the current output (not yet one-hot), write
python convert_gui.py
==Implementation==
One-hot-encoding can be understood as representing a word or token as a vector with a lot of zeroes, where the number of zeroes is equal to the number of unique tokens in the DSL file. Let's look at a concrete DSL file from pix2code as example. The process is as follows
gui = open('00CDC9A8-3D73-4291-90EF-49178E408797.gui')
tokens=[]
for line in gui:
line=line.strip('\n').strip('}').strip('{')
tokens.append(line)
print(line)