I work every day as a programmer on a quantitative stock analysis team that manages billions of dollars in assets. That position has given me a lot of insight into how such a professional quantitative stock analysis process works. ZCA is usually used as normalization. Rotation does affect the RF. By the way, Your blog is really informative. Thanks for sharing. The results for PCA on 0-to-1 normalized data also looks encouraging and I am wondering why it wasn't mentioned. It achieves much better dimensionality reduction with relatively comparable accuracy rates (to ZCA+PCA). It´s like blurring the "good" variable
I generated a toy dataset which illustrates that problem (btw. my post was inspired by yours):
http://machine-master.blogspot.de/2012/08/pca-or-polluting-your-clever-analysis.html an unregularized linear classifier will not be affected by rotations and scaling. however! when using regularization, the same regularization parameter may yield better or worse results. if you cross-validate thoroughly, i think you should be able to get nearly identical performance.

just for fun though, i tried a multi-class regularized ridge regression classifier before and after ZCA (rotation + scaling only). the results are very close, even though i used the same default regularization parameter. Piotr: On the other hand, PCA might combine several noisy redundant features into a single axis, which could potentially be beneficial. I don't think it's possible to say what effect PCA will have without reference to particular data.

Sergey: It *seems* (very dangerous word) that linear classifiers should be affected by rotations in the feature space differently that axis-aligned thresholders (aka, decision trees). Any chance you'll try the same experiments with a linear SVM? It is well known that PCA can remove the data that contains the features which are essential for classification. PCA dimensionality reduction maintains what is common in data and not what differentiates them. maverick: it's kind of a mess! i can post individual functions or datasets, if there's something in particular you're interested in.

brooks: good question. i'll run some experiments later & make an addendum to the post.

You reduce the dimension using PCA by keeping only as many eigenvectors as needed to explain 99% of the variance -- what's the dimensionality, then, of the transformed data? How much lower is it than dimensionality of the greedy forward feature selection in your last post?

would you mind posting your code?