Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 177 additions & 0 deletions IMPLEMENTATION_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Implementation Notes: Replication API Naming Unification

## Objective

Unify the naming scheme for "host" vs "node" replication across WorldContainer and derived classes, add string-to-enum converters, and maintain backward compatibility.

## Design Decisions

### 1. Naming Convention Choice

**Selected:** `node` over `host` terminology

**Rationale:**
- The enum is already `NodeReplicated` (not `HostReplicated`)
- Consistency with existing enum naming
- "Node" is the more common term in HPC literature for compute nodes
- "Host" can be ambiguous (host vs client, host system, etc.)

**Canonical Method Names:**
- `rank_replication()` - Query method for rank-level replication
- `node_replication()` - Query method for node-level replication
- `replicate_on_ranks()` - Action method for rank-level replication
- `replicate_on_nodes()` - Action method for node-level replication

### 2. String Converter Design

**Pattern Used:** Class-specific converter names to avoid overload ambiguity

**Exchange::Algorithm:**
- Primary: `Exchange<T,NDIM>::from_string_algorithm(std::string)`
- Deprecated: `Exchange<T,NDIM>::from_string(std::string)`
- Free function: `algorithm_from_string<T,NDIM>(std::string)`
- Wrapper: `AlgorithmFromString<T,NDIM>`
- User literal: `operator"" _alg` (for double,3D)

**DistributionType:**
- Primary: `distribution_type_from_string(std::string)`
- Wrapper: `DistributionTypeFromString`

**Normalization Strategy:**
All converters apply consistent normalization:
1. Convert to lowercase
2. Replace spaces with underscores
3. Replace dashes with underscores
4. Match against known variants

This allows inputs like:
- "Small-Memory" → `small_memory`
- "rank replicated" → `RankReplicated`
- "NodeReplicated" → `NodeReplicated`

### 3. Backward Compatibility

**Approach:** Deprecated aliases with `[[deprecated]]` attribute

**Advantages:**
- Existing code continues to work
- Compiler warnings guide users to new names
- Clean migration path
- Can be removed in future major version

**Implementation:**
```cpp
// New canonical method
bool rank_replication() const { ... }

// Deprecated alias
[[deprecated("Use rank_replication() instead")]]
bool is_replicated() const { return rank_replication(); }
```

### 4. Internal Implementation

**WorldContainerImpl Methods:** Kept unchanged

**Rationale:**
- Private implementation details
- No API breakage
- Updated comments for consistency
- Public API provides the unified interface

## Testing Strategy

### Unit Tests Created

1. **test_naming_unification.cc**
- Tests DistributionType converter with all variants
- Tests WorldContainer query methods (old and new)
- Tests WorldContainer replication methods
- Validates deprecated aliases work correctly

2. **test_algorithm_converter.cc**
- Tests Exchange::Algorithm converter with all variants
- Tests deprecated from_string wrapper
- Tests free function and wrapper classes
- Tests user-defined literal
- Validates exception on invalid input

### Syntax Validation

Created standalone test programs:
- `/tmp/test_syntax_worlddc.cc` - Validates worlddc.h changes
- `/tmp/test_syntax_exchange.cc` - Validates SCFOperators.h changes

Both compiled successfully with g++ -std=c++17.

## Documentation

### Files Created/Updated

1. **REPLICATION_API_MIGRATION.md** - Comprehensive user guide
- Migration strategy
- Code examples
- Terminology clarification
- Troubleshooting

2. **parallel_runtime.dox** - Developer documentation
- Added note about API changes
- References migration guide

3. **Test files** - Inline documentation
- Usage examples
- Edge case testing

## Known Usages of Deprecated APIs

Found in existing codebase:
- `src/madness/mra/test6.cc` - Uses `replicate()`
- `src/madness/mra/funcimpl.h` - Uses `replicate()`, `replicate_on_hosts()`, `is_replicated()`, `is_host_replicated()`
- `src/madness/mra/macrotaskq.h` - Uses `replicate()`
- `src/madness/world/cloud.h` - Uses `replicate_on_hosts()`

**Impact:** These will generate deprecation warnings but continue to work.

## Security Analysis

**CodeQL Scan:** No issues detected

**Considerations:**
- String converters throw `std::invalid_argument` on invalid input
- No buffer overflows (using std::string)
- No injection vulnerabilities
- Input validation via enum matching

## Future Work

### Phase 1 (Current Release)
- [x] Add canonical APIs
- [x] Add deprecated aliases
- [x] Add string converters
- [x] Add tests
- [x] Add documentation

### Phase 2 (Next Release)
- [ ] Migrate internal usage to canonical names
- [ ] Update examples and tutorials
- [ ] Consider adding similar converters for other enums

### Phase 3 (Future Major Release)
- [ ] Remove deprecated aliases
- [ ] Update ABI version
- [ ] Final cleanup

## Lessons Learned

1. **Class-specific converter names** prevent overload ambiguity issues
2. **Comprehensive normalization** improves user experience
3. **[[deprecated]] attribute** provides clean migration path
4. **Wrapper classes** enable implicit conversion where needed
5. **User-defined literals** offer convenience for common cases

## References

- Problem statement: GitHub issue #XXX
- WorldContainer API: `src/madness/world/worlddc.h`
- Exchange operator: `src/madness/chem/SCFOperators.h`
- Migration guide: `doc/REPLICATION_API_MIGRATION.md`
157 changes: 157 additions & 0 deletions doc/REPLICATION_API_MIGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Replication API Naming Migration Guide

## Overview

The replication-related APIs in MADNESS have been unified to use consistent "node" and "rank" terminology instead of the previously mixed "host"/"node" naming. This document provides guidance for migrating existing code.

## Changes Summary

### WorldContainer API Changes

#### New Canonical Method Names

**Query Methods:**
- `rank_replication()` - Check if container is replicated across all ranks (new canonical name)
- `node_replication()` - Check if container is replicated across all nodes/hosts (new canonical name)

**Replication Methods:**
- `replicate_on_ranks()` - Replicate container to all ranks (new canonical name)
- `replicate_on_nodes()` - Replicate container to all nodes/hosts (new canonical name)

#### Deprecated Method Names (Still Supported)

The following methods are deprecated but remain functional with backward compatibility:

- `is_replicated()` → Use `rank_replication()` instead
- `is_host_replicated()` → Use `node_replication()` instead
- `replicate()` → Use `replicate_on_ranks()` instead
- `replicate_on_hosts()` → Use `replicate_on_nodes()` instead

### String-to-Enum Converters

#### DistributionType Converter

New function: `distribution_type_from_string(std::string s)`

Supported string variants (case-insensitive, accepts spaces/dashes/underscores):
- **Distributed**: "distributed"
- **RankReplicated**: "rank_replicated", "rankreplicated", "rank"
- **NodeReplicated**: "node_replicated", "nodereplicated", "node", "host_replicated", "hostreplicated", "host"

Example usage:
```cpp
auto dist_type = distribution_type_from_string("rank-replicated");
// Or using the wrapper for implicit conversion:
DistributionTypeFromString wrapper("node");
DistributionType dt = wrapper;
```

#### Exchange::Algorithm Converter

New method: `Exchange<T,NDIM>::from_string_algorithm(std::string s)`

Supported string variants (case-insensitive, accepts spaces/dashes/underscores):
- **small_memory**: "small_memory", "smallmemory", "small"
- **large_memory**: "large_memory", "largememory", "large"
- **multiworld_efficient**: "multiworld_efficient", "multiworldefficient", "multiworld"
- **multiworld_efficient_row**: "multiworld_efficient_row", "multiworld_efficientrow"
- **fetch_compute**: "fetch_compute", "fetchcompute", "fetch"

Example usage:
```cpp
using AlgType = Exchange<double, 3>::Algorithm;
auto alg = Exchange<double, 3>::from_string_algorithm("small-memory");

// Or using the deprecated wrapper:
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
auto alg2 = Exchange<double, 3>::from_string("large");

// Free function:
auto alg3 = algorithm_from_string<double, 3>("multiworld");

// User-defined literal (for double,3D):
auto alg4 = "small"_alg;

// Implicit conversion wrapper:
AlgorithmFromString<double, 3> wrapper("fetch");
AlgType alg5 = wrapper;
```

## Migration Strategy

### Phase 1: Update to New Names (Recommended)

1. Replace deprecated method calls with canonical names:
```cpp
// Old code
if (container.is_replicated()) { ... }
container.replicate();

// New code
if (container.rank_replication()) { ... }
container.replicate_on_ranks();
```

2. Update string-based configuration parsing to use new converters:
```cpp
// Old code
std::string mode = get_config("replication_mode");
if (mode == "host") { /* replicate on hosts */ }

// New code
auto dist_type = distribution_type_from_string(get_config("replication_mode"));
if (dist_type == NodeReplicated) { /* replicate on nodes */ }
```

### Phase 2: Continue Using Deprecated APIs (Temporary)

If you need time to migrate, you can suppress deprecation warnings:

```cpp
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"

// Use deprecated APIs here
container.replicate();
if (container.is_replicated()) { ... }

#pragma GCC diagnostic pop
```

### Phase 3: Clean Up (Future)

In a future major release, the deprecated methods may be removed. Migrate all code to use canonical names before then.

## Terminology Clarification

- **Rank**: A single MPI process (ProcessID)
- **Node/Host**: A physical compute node that may contain multiple ranks
- **Rank Replication**: Data is replicated to every rank in the world
- **Node Replication**: Data is replicated once per node (to the lowest rank on each node)

## Testing

Two new test files have been added to validate the changes:
- `src/madness/world/test_naming_unification.cc` - Tests WorldContainer naming and DistributionType converter
- `src/madness/chem/test_algorithm_converter.cc` - Tests Exchange::Algorithm converter

Run these tests to verify correct behavior:
```bash
cd build
make test_naming_unification test_algorithm_converter
./src/madness/world/test_naming_unification
./src/madness/chem/test_algorithm_converter
```

## Questions or Issues?

If you encounter any problems during migration, please file an issue on the MADNESS GitHub repository with:
1. The deprecated API you're using
2. The context in which it's used
3. Any error messages or unexpected behavior

## See Also

- WorldContainer API documentation: `src/madness/world/worlddc.h`
- Exchange operator documentation: `src/madness/chem/SCFOperators.h`
- DistributionType enum: `src/madness/world/worlddc.h` (lines 81-86)
14 changes: 14 additions & 0 deletions doc/libraries/parallel_runtime.dox
Original file line number Diff line number Diff line change
Expand Up @@ -205,4 +205,18 @@ and other resources necessary to build new distributed capabilities.
The distributed container class (madness::WorldContainer) actually inherits
most of its functionality from madness::WorldObject.

\par Replication API Updates

The WorldContainer replication APIs have been updated to use consistent "node" and "rank"
terminology. The new canonical method names are:
- \c rank_replication() - Check if container is replicated across all ranks
- \c node_replication() - Check if container is replicated across all nodes/hosts
- \c replicate_on_ranks() - Replicate container to all ranks
- \c replicate_on_nodes() - Replicate container to all nodes/hosts

The old method names (\c is_replicated(), \c is_host_replicated(), \c replicate(),
\c replicate_on_hosts()) are deprecated but still supported for backward compatibility.

For a complete migration guide, see doc/REPLICATION_API_MIGRATION.md.

*/
Loading