Skip to content

Commit a179487

Browse files
authored
Improves lute runtime performance when scheduling many threads (#706)
The lute runtime currently schedules coroutines in a std::vector, and we always pop off the first thread in the vector to run. For small numbers of coroutines, this is okay, but when you get to 10k+ this becomes very inefficient as we have to move all the other elements one over `O(n)`. This PR just replaces the `std::vector` with a `Luau::VecDeque`, making the pop operation `O(1)`. I've also included a profiling script that I used to profile lute since I keep forgetting the `xctrace` incantation - it can be invoked by: ``` ./tools/profile.sh /path/to/.luau(defaults to profile.luau) ```
1 parent 157a116 commit a179487

File tree

4 files changed

+39
-2
lines changed

4 files changed

+39
-2
lines changed

lute/runtime/include/lute/runtime.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
#include "lute/ref.h"
44

55
#include "Luau/Variant.h"
6+
#include "Luau/VecDeque.h"
67

78
#include <atomic>
89
#include <condition_variable>
@@ -80,7 +81,7 @@ struct Runtime
8081
std::mutex dataCopyMutex;
8182
std::unique_ptr<lua_State, void (*)(lua_State*)> dataCopy;
8283

83-
std::vector<ThreadToContinue> runningThreads;
84+
Luau::VecDeque<ThreadToContinue> runningThreads;
8485

8586
private:
8687
std::mutex continuationMutex;

lute/runtime/src/runtime.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ RuntimeStep Runtime::runOnce()
6767
return StepEmpty{};
6868

6969
auto next = std::move(runningThreads.front());
70-
runningThreads.erase(runningThreads.begin());
70+
runningThreads.pop_front();
7171

7272
next.ref->push(GL);
7373
lua_State* L = lua_tothread(GL, -1);

tools/profile.luau

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
local task = require("@lute/task")
2+
3+
local started = os.clock()
4+
5+
local amount = 100000
6+
local batches = 5
7+
local per_batch = amount / batches
8+
9+
for current = 1, batches do
10+
local thread = coroutine.running()
11+
12+
print(`Batch {current} / {batches}`)
13+
14+
for i = 1, per_batch do
15+
--print("Spawning thread #" .. i)
16+
task.spawn(function()
17+
task.wait(0.1)
18+
--_TEST_ASYNC_WORK(0.1)
19+
if i == per_batch then
20+
print("Last thread in batch #" .. current)
21+
assert(coroutine.status(thread) == "suspended", `Thread {i} has status {coroutine.status(thread)}`)
22+
task.spawn(thread)
23+
end
24+
end)
25+
end
26+
27+
coroutine.yield()
28+
end
29+
local took = os.clock() - started
30+
print(`Spawned {amount} sleeping threads in {took}s`)
31+
32+
return -1

tools/profile.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
#!/usr/bin/env bash
2+
DEFAULT_SCRIPTNAME=${1:-profile.luau}
3+
echo "profiling $DEFAULT_SCRIPTNAME"
4+
xctrace record --template 'Time Profiler' --launch -- ./build/xcode/debug/lute/cli/lute $DEFAULT_SCRIPTNAME

0 commit comments

Comments
 (0)