Skip to content

Commit 132e91b

Browse files
Gargronhiyuki2578
authored andcommitted
Add a spam check (mastodon#11217)
* Add a spam check * Use Nilsimsa to generate locality-sensitive hashes and compare using Levenshtein distance * Add more tests * Add exemption when the message is a reply to something that mentions the sender * Use Nilsimsa Compare Value instead of Levenshtein distance * Use MD5 for messages shorter than 10 characters * Add message to automated report, do not add non-public statuses to automated report, add trust level to accounts and make unsilencing raise the trust level to prevent repeated spam checks on that account * Expire spam check data after 3 months * Add support for local statuses, reduce expiration to 1 week, always create a report * Add content warnings to the spam check and exempt empty statuses * Change Nilsimsa threshold to 95 and make sure removed statuses are removed from the spam check * Add all matched statuses into automatic report
1 parent 23946b8 commit 132e91b

10 files changed

Lines changed: 377 additions & 5 deletions

File tree

Gemfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ gem 'idn-ruby', require: 'idn'
5858
gem 'kaminari', '~> 1.1'
5959
gem 'link_header', '~> 0.0'
6060
gem 'mime-types', '~> 3.2', require: 'mime/types/columnar'
61+
gem 'nilsimsa', git: 'https://github.com/witgo/nilsimsa', ref: 'fd184883048b922b176939f851338d0a4971a532'
6162
gem 'nokogiri', '~> 1.10'
6263
gem 'nsa', '~> 0.2'
6364
gem 'oj', '~> 3.7'

Gemfile.lock

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,13 @@ GIT
1212
specs:
1313
http_parser.rb (0.6.1)
1414

15+
GIT
16+
remote: https://github.com/witgo/nilsimsa
17+
revision: fd184883048b922b176939f851338d0a4971a532
18+
ref: fd184883048b922b176939f851338d0a4971a532
19+
specs:
20+
nilsimsa (1.1.2)
21+
1522
GEM
1623
remote: https://rubygems.org/
1724
specs:
@@ -704,6 +711,7 @@ DEPENDENCIES
704711
microformats (~> 4.1)
705712
mime-types (~> 3.2)
706713
net-ldap (~> 0.10)
714+
nilsimsa!
707715
nokogiri (~> 1.10)
708716
nsa (~> 0.2)
709717
oj (~> 3.7)

app/lib/activitypub/activity/create.rb

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ def process_status
4141

4242
resolve_thread(@status)
4343
fetch_replies(@status)
44+
check_for_spam
4445
distribute(@status)
4546
forward_for_reply if @status.distributable?
4647
end
@@ -406,6 +407,18 @@ def addresses_local_accounts?
406407
Account.local.where(username: local_usernames).exists?
407408
end
408409

410+
def check_for_spam
411+
spam_check = SpamCheck.new(@status)
412+
413+
return if spam_check.skip?
414+
415+
if spam_check.spam?
416+
spam_check.flag!
417+
else
418+
spam_check.remember!
419+
end
420+
end
421+
409422
def forward_for_reply
410423
return unless @json['signature'].present? && reply_to_local?
411424
ActivityPub::RawDistributionWorker.perform_async(Oj.dump(@json), replied_to_status.account_id, [@account.preferred_inbox_url])

app/lib/spam_check.rb

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# frozen_string_literal: true
2+
3+
class SpamCheck
4+
include Redisable
5+
include ActionView::Helpers::TextHelper
6+
7+
NILSIMSA_COMPARE_THRESHOLD = 95
8+
NILSIMSA_MIN_SIZE = 10
9+
EXPIRE_SET_AFTER = 1.week.seconds
10+
11+
def initialize(status)
12+
@account = status.account
13+
@status = status
14+
end
15+
16+
def skip?
17+
already_flagged? || trusted? || no_unsolicited_mentions? || solicited_reply?
18+
end
19+
20+
def spam?
21+
if insufficient_data?
22+
false
23+
elsif nilsimsa?
24+
any_other_digest?('nilsimsa') { |_, other_digest| nilsimsa_compare_value(digest, other_digest) >= NILSIMSA_COMPARE_THRESHOLD }
25+
else
26+
any_other_digest?('md5') { |_, other_digest| other_digest == digest }
27+
end
28+
end
29+
30+
def flag!
31+
auto_silence_account!
32+
auto_report_status!
33+
end
34+
35+
def remember!
36+
# The scores in sorted sets don't actually have enough bits to hold an exact
37+
# value of our snowflake IDs, so we use it only for its ordering property. To
38+
# get the correct status ID back, we have to save it in the string value
39+
40+
redis.zadd(redis_key, @status.id, digest_with_algorithm)
41+
redis.zremrangebyrank(redis_key, '0', '-10')
42+
redis.expire(redis_key, EXPIRE_SET_AFTER)
43+
end
44+
45+
def reset!
46+
redis.del(redis_key)
47+
end
48+
49+
def hashable_text
50+
return @hashable_text if defined?(@hashable_text)
51+
52+
@hashable_text = @status.text
53+
@hashable_text = remove_mentions(@hashable_text)
54+
@hashable_text = strip_tags(@hashable_text) unless @status.local?
55+
@hashable_text = normalize_unicode(@status.spoiler_text + ' ' + @hashable_text)
56+
@hashable_text = remove_whitespace(@hashable_text)
57+
end
58+
59+
def insufficient_data?
60+
hashable_text.blank?
61+
end
62+
63+
def digest
64+
@digest ||= begin
65+
if nilsimsa?
66+
Nilsimsa.new(hashable_text).hexdigest
67+
else
68+
Digest::MD5.hexdigest(hashable_text)
69+
end
70+
end
71+
end
72+
73+
def digest_with_algorithm
74+
if nilsimsa?
75+
['nilsimsa', digest, @status.id].join(':')
76+
else
77+
['md5', digest, @status.id].join(':')
78+
end
79+
end
80+
81+
private
82+
83+
def remove_mentions(text)
84+
return text.gsub(Account::MENTION_RE, '') if @status.local?
85+
86+
Nokogiri::HTML.fragment(text).tap do |html|
87+
mentions = @status.mentions.map { |mention| ActivityPub::TagManager.instance.url_for(mention.account) }
88+
89+
html.traverse do |element|
90+
element.unlink if element.name == 'a' && mentions.include?(element['href'])
91+
end
92+
end.to_s
93+
end
94+
95+
def normalize_unicode(text)
96+
text.unicode_normalize(:nfkc).downcase
97+
end
98+
99+
def remove_whitespace(text)
100+
text.gsub(/\s+/, ' ').strip
101+
end
102+
103+
def auto_silence_account!
104+
@account.silence!
105+
end
106+
107+
def auto_report_status!
108+
status_ids = Status.where(visibility: %i(public unlisted)).where(id: matching_status_ids).pluck(:id) + [@status.id] if @status.distributable?
109+
ReportService.new.call(Account.representative, @account, status_ids: status_ids, comment: I18n.t('spam_check.spam_detected_and_silenced'))
110+
end
111+
112+
def already_flagged?
113+
@account.silenced?
114+
end
115+
116+
def trusted?
117+
@account.trust_level > Account::TRUST_LEVELS[:untrusted]
118+
end
119+
120+
def no_unsolicited_mentions?
121+
@status.mentions.all? { |mention| mention.silent? || (!@account.local? && !mention.account.local?) || mention.account.following?(@account) }
122+
end
123+
124+
def solicited_reply?
125+
!@status.thread.nil? && @status.thread.mentions.where(account: @account).exists?
126+
end
127+
128+
def nilsimsa_compare_value(first, second)
129+
first = [first].pack('H*')
130+
second = [second].pack('H*')
131+
bits = 0
132+
133+
0.upto(31) do |i|
134+
bits += Nilsimsa::POPC[255 & (first[i].ord ^ second[i].ord)].ord
135+
end
136+
137+
128 - bits # -128 <= Nilsimsa Compare Value <= 128
138+
end
139+
140+
def nilsimsa?
141+
hashable_text.size > NILSIMSA_MIN_SIZE
142+
end
143+
144+
def other_digests
145+
redis.zrange(redis_key, 0, -1)
146+
end
147+
148+
def any_other_digest?(filter_algorithm)
149+
other_digests.any? do |record|
150+
algorithm, other_digest, status_id = record.split(':')
151+
152+
next unless algorithm == filter_algorithm
153+
154+
yield algorithm, other_digest, status_id
155+
end
156+
end
157+
158+
def matching_status_ids
159+
if nilsimsa?
160+
other_digests.select { |record| record.start_with?('nilsimsa') && nilsimsa_compare_value(digest, record.split(':')[1]) >= NILSIMSA_COMPARE_THRESHOLD }.map { |record| record.split(':')[2] }.compact
161+
else
162+
other_digests.select { |record| record.start_with?('md5') && record.split(':')[1] == digest }.map { |record| record.split(':')[2] }.compact
163+
end
164+
end
165+
166+
def redis_key
167+
@redis_key ||= "spam_check:#{@account.id}"
168+
end
169+
end

app/models/account.rb

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
# also_known_as :string is an Array
4747
# silenced_at :datetime
4848
# suspended_at :datetime
49+
# trust_level :integer
4950
#
5051

5152
class Account < ApplicationRecord
@@ -63,6 +64,11 @@ class Account < ApplicationRecord
6364
include AccountCounters
6465
include DomainNormalizable
6566

67+
TRUST_LEVELS = {
68+
untrusted: 0,
69+
trusted: 1,
70+
}.freeze
71+
6672
enum protocol: [:ostatus, :activitypub]
6773

6874
validates :username, presence: true
@@ -164,6 +170,10 @@ def possibly_stale?
164170
last_webfingered_at.nil? || last_webfingered_at <= 1.day.ago
165171
end
166172

173+
def trust_level
174+
self[:trust_level] || 0
175+
end
176+
167177
def refresh!
168178
ResolveAccountService.new.call(acct) unless local?
169179
end
@@ -172,21 +182,19 @@ def silenced?
172182
silenced_at.present?
173183
end
174184

175-
def silence!(date = nil)
176-
date ||= Time.now.utc
185+
def silence!(date = Time.now.utc)
177186
update!(silenced_at: date)
178187
end
179188

180189
def unsilence!
181-
update!(silenced_at: nil)
190+
update!(silenced_at: nil, trust_level: trust_level == TRUST_LEVELS[:untrusted] ? TRUST_LEVELS[:trusted] : trust_level)
182191
end
183192

184193
def suspended?
185194
suspended_at.present?
186195
end
187196

188-
def suspend!(date = nil)
189-
date ||= Time.now.utc
197+
def suspend!(date = Time.now.utc)
190198
transaction do
191199
user&.disable! if local?
192200
update!(suspended_at: date)

app/services/remove_status_service.rb

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ def call(status, **options)
2323
remove_from_hashtags
2424
remove_from_public
2525
remove_from_media if status.media_attachments.any?
26+
remove_from_spam_check
2627

2728
@status.destroy!
2829
else
@@ -142,6 +143,10 @@ def remove_from_media
142143
redis.publish('timeline:public:local:media', @payload) if @status.local?
143144
end
144145

146+
def remove_from_spam_check
147+
redis.zremrangebyscore("spam_check:#{@status.account_id}", @status.id, @status.id)
148+
end
149+
145150
def lock_options
146151
{ redis: Redis.current, key: "distribute:#{@status.id}" }
147152
end

config/locales/en.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -876,6 +876,8 @@ en:
876876
profile: Profile
877877
relationships: Follows and followers
878878
two_factor_authentication: Two-factor Auth
879+
spam_check:
880+
spam_detected_and_silenced: This is an automated report. Spam has been detected and the sender has been silenced automatically. If this is a mistake, please unsilence the account.
879881
statuses:
880882
attached:
881883
description: 'Attached: %{attached}'
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
class AddTrustLevelToAccounts < ActiveRecord::Migration[5.2]
2+
def change
3+
add_column :accounts, :trust_level, :integer
4+
end
5+
end

db/schema.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,7 @@
149149
t.string "also_known_as", array: true
150150
t.datetime "silenced_at"
151151
t.datetime "suspended_at"
152+
t.integer "trust_level"
152153
t.index "(((setweight(to_tsvector('simple'::regconfig, (display_name)::text), 'A'::\"char\") || setweight(to_tsvector('simple'::regconfig, (username)::text), 'B'::\"char\")) || setweight(to_tsvector('simple'::regconfig, (COALESCE(domain, ''::character varying))::text), 'C'::\"char\")))", name: "search_index", using: :gin
153154
t.index "lower((username)::text), lower((domain)::text)", name: "index_accounts_on_username_and_domain_lower", unique: true
154155
t.index ["moved_to_account_id"], name: "index_accounts_on_moved_to_account_id"

0 commit comments

Comments
 (0)