Move PCA and TSVD from cuml to raft#2952
Move PCA and TSVD from cuml to raft#2952rapids-bot[bot] merged 18 commits intorapidsai:release/26.04from
Conversation
cpp/include/raft/linalg/pca.cuh
Outdated
|
|
||
| template <typename math_t, typename enum_solver = solver> | ||
| void truncCompExpVars(const raft::handle_t& handle, | ||
| math_t* in, |
There was a problem hiding this comment.
Need to use mdspan here- we've deprecated all the pointer APIs.
cpp/include/raft/linalg/pca.cuh
Outdated
| math_t* input, | ||
| math_t* components, | ||
| math_t* explained_var, | ||
| math_t* explained_var_ratio, |
There was a problem hiding this comment.
Input order should match the other (newer APIs). handle, params, input, output, free params. Also "stream" is in the handle now, and we use device_resources not raft::hande.
| /** | ||
| * @brief perform fit operation for the pca. Generates eigenvectors, explained vars, singular vals, | ||
| * etc. | ||
| * @param[in] handle: cuml handle object |
There was a problem hiding this comment.
doc mentioning cuml handle, not raft device_resources! (same for other docs too)
We will still have the same python and cpp apis in cuml too! |
|
@aamijar we will probably expose a preprocessing api through python for purposes of users who need to write scripts (for example Jinsol's new dataset gen requires PCA and it would be a circular dependency if we included cuml in cuVS) or have databases written in python. But- like I mentioned to Simon, the users are very diffeeent between the two. Same thing with kmeans- kmeans clusters is the equivalent of "lexicograph ordering" in the vector world. Pca is another way to reduce footprint of vectors without losing quality. Data science users will continue to use cuml. Vector databases will continue to use cuVS. It's important we don't duplicate code across the two... and since cuml is already using cuVS, it can continue to use the c++ api like you mentioned. |
| void truncCompExpVars(raft::resources const& handle, | ||
| math_t* in, | ||
| math_t* components, | ||
| math_t* explained_var, | ||
| math_t* explained_var_ratio, | ||
| math_t* noise_vars, | ||
| const paramsTSVD& prms, | ||
| cudaStream_t stream) |
There was a problem hiding this comment.
why do we need both the stream and the handle here? can't we use raft::resource::get_cuda_stream(handle)? Same for other functions too!
cpp/include/raft/linalg/pca.cuh
Outdated
| auto stream = resource::get_cuda_stream(handle); | ||
|
|
||
| paramsPCA prms_with_dims = prms; | ||
| prms_with_dims.n_rows = static_cast<std::size_t>(input.extent(0)); | ||
| prms_with_dims.n_cols = static_cast<std::size_t>(input.extent(1)); | ||
|
|
||
| detail::pcaFit(handle, | ||
| input.data_handle(), | ||
| components.data_handle(), | ||
| explained_var.data_handle(), | ||
| explained_var_ratio.data_handle(), | ||
| singular_vals.data_handle(), | ||
| mu.data_handle(), | ||
| noise_vars.data_handle(), | ||
| prms_with_dims, | ||
| stream, | ||
| flip_signs_based_on_U); | ||
| } |
There was a problem hiding this comment.
similar to previous comment. suggesting we don't need to pass the stream separately
| struct paramsTSVD { | ||
| std::size_t n_rows = 0; | ||
| std::size_t n_cols = 0; | ||
| int gpu_id = 0; |
There was a problem hiding this comment.
are we using this gpu_id anywhere?
| void truncCompExpVars(raft::resources const& handle, | ||
| math_t* in, | ||
| math_t* components, | ||
| math_t* explained_var, | ||
| math_t* explained_var_ratio, | ||
| math_t* noise_vars, | ||
| const paramsTSVD& prms, |
There was a problem hiding this comment.
As Corey mentioned, suggesting we change the order to have handle-> params-> other stuff. Same for other functions.
cpp/include/raft/linalg/pca.cuh
Outdated
| paramsPCA prms_with_dims = prms; | ||
| prms_with_dims.n_rows = static_cast<std::size_t>(input.extent(0)); | ||
| prms_with_dims.n_cols = static_cast<std::size_t>(input.extent(1)); |
There was a problem hiding this comment.
thinking if it would be better to make a new params here VS use RAFT_EXPECTS to check if we have the right rows/cols.... what do you think!?
|
Hi @jinsolp, thanks for the review! I guess my original goal was to keep as much of the existing code from cuml as possible, so that's why I am using the old pointer based apis for the detail namespace. The public API has been changed to use the mdspan, no stream usage, and correct ordering of params. That being said, we should probably redo some of the detail implementation to match the modern conventions. I'll take a look at it! |
Co-authored-by: Jinsol Park <jinsolp@nvidia.com>
@jinsolp @aamijar pointer-based APIs are okay in detail namespace. Never okay in public APIs. We do need to make sure the public APIs are ordered appropriately (consistently w/ the other public APIs). |
| namespace raft::linalg { | ||
|
|
||
| /** | ||
| * @defgroup pca PCA operations |
| }; | ||
|
|
||
| /** @brief Parameters for TSVD (and base for PCA). */ | ||
| struct paramsTSVD { |
There was a problem hiding this comment.
When the switch to mdspan happen the n_rows, n_cols and n_components will become useless in this struct.
I think it's worth to switch to mdspan now in order to avoid follow-ups in all other repos
There was a problem hiding this comment.
Okay, working on that now
| namespace raft::linalg { | ||
|
|
||
| /** | ||
| * @defgroup tsvd Truncated SVD operations |
jinsolp
left a comment
There was a problem hiding this comment.
LGTM, Approving with small doc change request!
Co-authored-by: Jinsol Park <jinsolp@nvidia.com>
|
/merge |
c172b3f
into
rapidsai:release/26.04
Resolves #1207. Depends on rapidsai/raft#2952 This PR introduces the `cuvs::preprocessing::pca` with `float` support. The following APIs are supported: `fit`, `transform`, `fit_transform`, `inverse_transform`. Authors: - Anupam (https://github.com/aamijar) - Divye Gala (https://github.com/divyegala) Approvers: - Divye Gala (https://github.com/divyegala) URL: #1808
Resolves rapidsai#1207. Depends on rapidsai/raft#2952 This PR introduces the `cuvs::preprocessing::pca` with `float` support. The following APIs are supported: `fit`, `transform`, `fit_transform`, `inverse_transform`. Authors: - Anupam (https://github.com/aamijar) - Divye Gala (https://github.com/divyegala) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1808
Resolves #7801. Depends on rapidsai/raft#2952. Remove code duplication, since PCA and TSVD are in raft now. Authors: - Anupam (https://github.com/aamijar) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: #7802
Resolves rapidsai#1207. Depends on rapidsai/raft#2952 This PR introduces the `cuvs::preprocessing::pca` with `float` support. The following APIs are supported: `fit`, `transform`, `fit_transform`, `inverse_transform`. Authors: - Anupam (https://github.com/aamijar) - Divye Gala (https://github.com/divyegala) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1808
Resolves rapidsai#1207. Depends on rapidsai/raft#2952 This PR introduces the `cuvs::preprocessing::pca` with `float` support. The following APIs are supported: `fit`, `transform`, `fit_transform`, `inverse_transform`. Authors: - Anupam (https://github.com/aamijar) - Divye Gala (https://github.com/divyegala) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1808
Required for rapidsai/cuvs#1207 and rapidsai/cuml#7802.
This PR moves
pca.cuh,tsvd.cuh, and gtests into raft.Resolves #2978