Some notes on the format of the MATLAB data for NIPS text.

Sam Roweis                   http://www.cs.toronto.edu/~roweis/
-------------------------------------------------------------------

nips12raw_strxxx.mat (main data)

counts   -- main data, holding the sparse raw counts 
            for each word in each paper.
wl       -- wordlist
wc       -- total counts for each word.
ptitles  -- paper titles
plengths -- number of words in each paper
anames   -- author names
apapers  -- which papers each author is on
            (in the form on an index matching the row numbers in "count")
panum    -- number of authors each paper had
docnames -- raw file corresponding to each paper.


nips12ac_strxxx.mat (author counts)

acounts -- co-author weighted total word usage for each author
           (papers with k co-authors give fractional word counts
            evenly to all authors)
           add up counts(paper)/panum(paper) for each paper they are on
acountsr -- raw total word usage for each author, just add up counts
            for all papers they were on, ignore co-authorship


nips12pages_strxxx.mat (page/conference info)

Of questionable value, unless you have a real
thing about distinguishing one NIPS volume from another
or a thing about original page numbers:

confpids -- the papers that appeared in each nips 0-12
confpages -- the page numbers at which papaers begin in each volume
apages   -- nips volumes and page numbers (as a 2d array)
            for all the papers published by each author
pclass   -- which nips each paper is in
pauthnames -- raw, uncorrected authorlist from contents files
              (not sure why you would need these)


