Skip to content

copy_data memory bloat in v1.4 #473

@davidtaylorhq

Description

@davidtaylorhq

We use Connection#copy_data to stream large volumes of data into a temporary table. We recently observed significant performance degradation and increased memory use for this system. Here's a minimal reproduction:

require "securerandom"
require "objspace"
require "bundler/inline"

PG_VERSION = "1.4.2"
gemfile do
  source 'https://rubygems.org'
  gem 'pg', PG_VERSION
end

puts "PG::version #{PG::VERSION}"

def memory_use
  3.times { GC.start }
  objspace_size_mb = ObjectSpace.memsize_of_all / 1024 / 1024
  rss_mb = `ps  -p #{Process.pid} -o rss`.split("\n")[1].to_i / 1024
  "objspace:#{objspace_size_mb}mb; rss:#{rss_mb}mb"
end

puts "Before: #{memory_use}"

start_at = Time.now()

connection = PG.connect(dbname: 'discourse_development')
table_name = "my_temp_table"
connection.exec("CREATE TEMP TABLE #{table_name}(url text UNIQUE)")
connection.copy_data("COPY #{table_name} FROM STDIN CSV") do
  1_000_000.times do |i|
    connection.put_copy_data("#{SecureRandom.hex(100)}\n")
  end
  puts "After loop, inside copy_data: #{memory_use}"
end

puts "After: #{memory_use}"
puts "Took #{Time.now - start_at}s"

With version 1.3.5, this script takes ~10s on my machine, and reports ~47mb RSS at the end. With version 1.4.0 (and 1.4.1, 1.4.2), it takes ~80s and reports ~182mb RSS at the end. The RSS appears to scale with the amount of data being copied.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions