Vector Symbolic Architectures

Friday, October 14, 2005

Random Indexing

Ross Gayler pointed out two interesting statements in An Introduction to Random Indexing (Magnus Sahlgren):

  • “Hecht-Nielsen (1994), … demonstrated that there are many more nearly orthogonal than truly orthogonal directions in a high-dimensional space.” (Hecht-Nielsen, R.; Context vectors; general purpose approximate meaning representations self-organized from raw data. In Zurada, J. M.; R. J. Marks II; C. J. Robinson; Computational intelligence: imitating life. IEEE Press, 1994.)
  • “the Johnson-Lindenstrauss lemma (Johnson & Lindenstrauss, 1984) – that states that if we project points in a vector space into a randomly selected subspace of sufficiently high dimensionality, the distances between the points are approximately preserved.” (Johnson, W. B. & J. Lindenstrauss; Extensions to Lipshitz mapping into Hilbert space. Contemporary mathematics. 26, 1984.)


Both these are very powerful properties of vector space representations, and these are particularly nice statemnts of these properties.

1 Comments:

  • The "curse of dimensionality" refers to the difficulty of making infernces from high-dimensional data (because there are so many possible combinatorial features). So, my take on it would be that high dimensionality is a curse when we have few data points, many dimensions, and little prior knowledge about which combinatorial features might be useful.

    Isn't it quite easy to run the J-L lemma backwards, just by concatenating projections onto lower dimensional spaces? E.g., project a 10-d space onto 4 5-d spaces, and then concatenate to get a 20-d space. I'd also be suprised if it wasn't quite easy to prove the result more directly.

    I think I mention this in my book -- the idea of projecting low-dimensional information into high-dimensional spaces in order to be able to work with HRRs.

    By Blogger Tony Plate, at 8:19 AM, December 17, 2005  

Post a Comment

<< Home