Features, filters and labels are engineered as they were in the previous analysis. The final volumes at different prices was converted into a matrix (image) shaped to match the price levels. For each day the ticker with the most data was chosen for modeling
Hypothesis
The image shown in previous slide suggests that change in level 1 volume is likely to change the VWAP. The objective is to test whether a stack of images of states of volumes can be used to capture the signal of increase / decrease in VWAP. As mentioned above, appropriate filters are used on the returns to remove noise
Problem
In the previous analysis the data was considerably small in size: limited to < 100,000 examples for training and testing. This was because the total data used for modeling summed up to 1 day (86400 seconds). The approximate preprocessing time per day was ~ 6 mins. In the current setting we have ~ 220 files, which suggests a total preprocessing time of ~ 22 hours. Also some of the hyperparameters are associated with the preprocessing - therefore the preprocessing may be repeated with each experiment, which is not acceptable. Therefore, multiprocessing and MPI (local network) were used to reduce the computing time per experiment