We use Connection#copy_data to stream large volumes of data into a temporary table. We recently observed significant performance degradation and increased memory use for this system. Here's a minimal reproduction:
require "securerandom"
require "objspace"
require "bundler/inline"
PG_VERSION = "1.4.2"
gemfile do
source 'https://rubygems.org'
gem 'pg', PG_VERSION
end
puts "PG::version #{PG::VERSION}"
def memory_use
3.times { GC.start }
objspace_size_mb = ObjectSpace.memsize_of_all / 1024 / 1024
rss_mb = `ps -p #{Process.pid} -o rss`.split("\n")[1].to_i / 1024
"objspace:#{objspace_size_mb}mb; rss:#{rss_mb}mb"
end
puts "Before: #{memory_use}"
start_at = Time.now()
connection = PG.connect(dbname: 'discourse_development')
table_name = "my_temp_table"
connection.exec("CREATE TEMP TABLE #{table_name}(url text UNIQUE)")
connection.copy_data("COPY #{table_name} FROM STDIN CSV") do
1_000_000.times do |i|
connection.put_copy_data("#{SecureRandom.hex(100)}\n")
end
puts "After loop, inside copy_data: #{memory_use}"
end
puts "After: #{memory_use}"
puts "Took #{Time.now - start_at}s"
With version 1.3.5, this script takes ~10s on my machine, and reports ~47mb RSS at the end. With version 1.4.0 (and 1.4.1, 1.4.2), it takes ~80s and reports ~182mb RSS at the end. The RSS appears to scale with the amount of data being copied.
We use
Connection#copy_datato stream large volumes of data into a temporary table. We recently observed significant performance degradation and increased memory use for this system. Here's a minimal reproduction:With version
1.3.5, this script takes ~10s on my machine, and reports ~47mb RSS at the end. With version1.4.0(and1.4.1,1.4.2), it takes ~80s and reports ~182mb RSS at the end. The RSS appears to scale with the amount of data being copied.