Random Indexing
Ross Gayler pointed out two interesting statements in An Introduction to Random Indexing (Magnus Sahlgren):
Both these are very powerful properties of vector space representations, and these are particularly nice statemnts of these properties.
- “Hecht-Nielsen (1994), … demonstrated that there are many more nearly orthogonal than truly orthogonal directions in a high-dimensional space.” (Hecht-Nielsen, R.; Context vectors; general purpose approximate meaning representations self-organized from raw data. In Zurada, J. M.; R. J. Marks II; C. J. Robinson; Computational intelligence: imitating life. IEEE Press, 1994.)
- “the Johnson-Lindenstrauss lemma (Johnson & Lindenstrauss, 1984) – that states that if we project points in a vector space into a randomly selected subspace of sufficiently high dimensionality, the distances between the points are approximately preserved.” (Johnson, W. B. & J. Lindenstrauss; Extensions to Lipshitz mapping into Hilbert space. Contemporary mathematics. 26, 1984.)
Both these are very powerful properties of vector space representations, and these are particularly nice statemnts of these properties.

1 Comments:
The "curse of dimensionality" refers to the difficulty of making infernces from high-dimensional data (because there are so many possible combinatorial features). So, my take on it would be that high dimensionality is a curse when we have few data points, many dimensions, and little prior knowledge about which combinatorial features might be useful.
Isn't it quite easy to run the J-L lemma backwards, just by concatenating projections onto lower dimensional spaces? E.g., project a 10-d space onto 4 5-d spaces, and then concatenate to get a 20-d space. I'd also be suprised if it wasn't quite easy to prove the result more directly.
I think I mention this in my book -- the idea of projecting low-dimensional information into high-dimensional spaces in order to be able to work with HRRs.
By
Tony Plate, at 8:19 AM, December 17, 2005
Post a Comment
<< Home