- >>> docs = [["hello","world","hello"], ["goodbye","cruel","world"]]
- >>> indptr = [0]
- >>> indices = []
- >>> data = []
- >>> vocabulary = {}
- >>>fordin docs:
- ... fortermin d:
- ... index = vocabulary.setdefault(term, len(vocabulary))
- ... indices.append(index)
- ... data.append(1)
- ... indptr.append(len(indices))
- ...
- >>> csr_matrix((data, indices, indptr), dtype=int).toarray()
- array([[2, 1, 0, 0],
- [0, 1, 1, 1]])
来源: http://www.bubuko.com/infodetail-2025610.html