Discussion:
[maildropl] .mailfilter file stopped working with 2.7.1
Nerijus Baliunas
2014-01-15 23:12:06 UTC
Permalink
Hello,

My .mailfilter file works with 2.6.0-2.fc19, but does not with 2.7.1-1.fc19 Fedora package.
More info at https://bugzilla.redhat.com/show_bug.cgi?id=1053313.
I suspect it happens because the message has MIME unencoded Subject,
and with mixed encodings. Is it true?

Regards,
Nerijus
Sam Varshavchik
2014-01-16 00:56:09 UTC
Permalink
Post by Nerijus Baliunas
Hello,
My .mailfilter file works with 2.6.0-2.fc19, but does not with 2.7.1-1.fc19 Fedora package.
More info at https://bugzilla.redhat.com/show_bug.cgi?id=1053313.
I suspect it happens because the message has MIME unencoded Subject,
and with mixed encodings. Is it true?
Correct. maildrop is now MIME-aware. The search patterns should simply be
specified in UTF-8 (noted in the maildropfilter man page), and MIME-encoded
content will be properly decoded, transcoded to UTF-8, and compared. Mixed
encoding should be fine, they'll be transcoded to UTF-8, as long as it's an
encoding that's known to iconv. However, it does mean that messages must be
properly MIME-encoded.

Looks like from the log files shows that the sample message appears to have
raw ISO-8859-1 content in the headers, which is invalid.

Also, looks like there's a tiny logging bug, that needs to be squished,
which garbles logged patterns. That's just a formatting bug, unrelated to
searching.
Nerijus Baliunas
2014-01-16 01:10:52 UTC
Permalink
Post by Sam Varshavchik
Correct. maildrop is now MIME-aware. The search patterns should simply be
specified in UTF-8 (noted in the maildropfilter man page), and MIME-encoded
content will be properly decoded, transcoded to UTF-8, and compared. Mixed
encoding should be fine, they'll be transcoded to UTF-8, as long as it's an
encoding that's known to iconv. However, it does mean that messages must be
properly MIME-encoded.
Looks like from the log files shows that the sample message appears to have
raw ISO-8859-1 content in the headers, which is invalid.
Yes, even worse - subject has both ISO-8859-13 and UTF-8 raw characters without
any MIME encoding.

How can I filter such badly formatted messages? Note, that I filter by "From" header,
which does not have illegal characters. I can send sample message if needed.

Regards,
Nerijus
Sam Varshavchik
2014-01-16 01:43:49 UTC
Permalink
Post by Nerijus Baliunas
Post by Sam Varshavchik
Correct. maildrop is now MIME-aware. The search patterns should simply be
specified in UTF-8 (noted in the maildropfilter man page), and MIME-encoded
content will be properly decoded, transcoded to UTF-8, and compared. Mixed
encoding should be fine, they'll be transcoded to UTF-8, as long as it's an
encoding that's known to iconv. However, it does mean that messages must be
properly MIME-encoded.
Looks like from the log files shows that the sample message appears to have
raw ISO-8859-1 content in the headers, which is invalid.
Yes, even worse - subject has both ISO-8859-13 and UTF-8 raw characters without
any MIME encoding.
How can I filter such badly formatted messages? Note, that I filter by "From" header,
which does not have illegal characters. I can send sample message if needed.
If you're filtering by the From: header, which is properly encoded, you just
specify your search pattern directly in UTF-8.

So, if the message has:

From: =?iso-8859-1?Q?H=F3la!?= <***@example.com>

This should be matched by maildrop's search pattern:

if (/From:.*Hóla/)

The fact that Subject might be broken and not encoded properly doesn't
matter. You're matching on the From: header.

Note that the search pattern is always in UTF-8.
Nerijus Baliunas
2014-01-16 02:35:15 UTC
Permalink
Post by Sam Varshavchik
The fact that Subject might be broken and not encoded properly doesn't
matter. You're matching on the From: header.
The rule which does not work is:
if (/^From: *"System Anti-Virus Administrator" <***@xxx.lt>/)
to "./Maildir/.avirus"

As you can see, it does not have any 8 bit chars. It works with 2.6.0, and
does not work with 2.7.1.

Regards,
Nerijus
Sam Varshavchik
2014-01-16 03:40:14 UTC
Permalink
Post by Nerijus Baliunas
Post by Sam Varshavchik
The fact that Subject might be broken and not encoded properly doesn't
matter. You're matching on the From: header.
to "./Maildir/.avirus"
As you can see, it does not have any 8 bit chars. It works with 2.6.0, and
does not work with 2.7.1.
Narrow it down. Start by verifying that /From:,*Administrator/ pattern
matches. Then, expand the pattern until you find exactly what doesn't match,
and go from there.
Nerijus Baliunas
2014-01-16 11:42:11 UTC
Permalink
Post by Sam Varshavchik
Post by Nerijus Baliunas
to "./Maildir/.avirus"
As you can see, it does not have any 8 bit chars. It works with 2.6.0, and
does not work with 2.7.1.
Narrow it down. Start by verifying that /From:,*Administrator/ pattern
matches. Then, expand the pattern until you find exactly what doesn't match,
and go from there.
Does not work:
if (/^From: *"System Anti-Virus Administrator" <***@xxx.lt>/)
to "./Maildir/.avirus"

Works:
if (/^From: *System Anti-Virus Administrator <***@xxx.lt>/)
to "./Maildir/.avirus"

Regards,
Nerijus
Nerijus Baliunas
2014-01-16 14:05:38 UTC
Permalink
Post by Nerijus Baliunas
to "./Maildir/.avirus"
to "./Maildir/.avirus"
And with 2.6.0 it's vice versa - works with quotes and does not work without.
The raw header line in the message is:

From: "System Anti-Virus Administrator" <***@xxx.lt>

Regards,
Nerijus
Doug Barton
2014-01-16 20:06:24 UTC
Permalink
Post by Nerijus Baliunas
Post by Nerijus Baliunas
to "./Maildir/.avirus"
to "./Maildir/.avirus"
And with 2.6.0 it's vice versa - works with quotes and does not work without.
Did you try
if (/^From: \"System Anti-Virus Administrator\" <***@xxx.lt>/) to
"./Maildir/.avirus"
Nerijus Baliunas
2014-01-16 21:40:18 UTC
Permalink
Post by Doug Barton
Post by Nerijus Baliunas
And with 2.6.0 it's vice versa - works with quotes and does not work without.
Did you try
"./Maildir/.avirus"
Yes - it's the same as with unescaped quotes - 2.6.0 works, 2.7.1 does not.

Regards,
Nerijus
Sam Varshavchik
2014-01-16 23:02:06 UTC
Permalink
On Thu, 16 Jan 2014 13:42:11 +0200 Nerijus Baliunas
Post by Nerijus Baliunas
to "./Maildir/.avirus"
to "./Maildir/.avirus"
And with 2.6.0 it's vice versa - works with quotes and does not work without.
Ok. This is a result of the switch to canonical UTF-8 pattern matching. The
message gets transcoded to a canonical UTF-8 format, internally, before it
gets searched. This involves converting all headers to a canonical format.
As part of this process, all headers get reparsed and reformatted. Message
text encoded with quoted-printable or base64 gets decoded. Before the switch
to UTF-8, it was not really possible to search base64-encoded content, so
this was a pretty big deal.

In the case here, the quotes are redundant, so they're removed before the
actual search takes place. So now, the same search pattern will match a
From: header with or without quotes, around the names. It should also match
a From: header that uses obsolete syntax, like "From: ***@xxx.lt (System
Anti-Virus Administrator)", using the same pattern.

You can see what exactly gets searched by executing "reformime -u" with the
message on standard input.

I agree that this is confusing, but this is the right thing to do. Being
able to correctly match non-Latin search patterns is more important. So, the
only result from this, is that better documentation is needed. Generally,
the patterns should be written against the output from reformime -u.
Dimitri Maziuk
2014-01-16 23:20:33 UTC
Permalink
Post by Sam Varshavchik
In the case here, the quotes are redundant, so they're removed before
the actual search takes place. So now, the same search pattern will
match a From: header with or without quotes, around the names.
And if I wanted to

if( /^From:.*good guys/ )
{
to "Maildir"
}

if( /^From:.*"good" guys/ )
{
to "/dev/null"
}

???
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Sam Varshavchik
2014-01-17 00:16:40 UTC
Permalink
Post by Dimitri Maziuk
Post by Sam Varshavchik
In the case here, the quotes are redundant, so they're removed before
the actual search takes place. So now, the same search pattern will
match a From: header with or without quotes, around the names.
And if I wanted to
if( /^From:.*good guys/ )
{
to "Maildir"
}
if( /^From:.*"good" guys/ )
{
to "/dev/null"
}
Why would you want to do that?

If you want to filter messages from a specific source, use reformime -u to
see how the headers look in canonical form, then write a filter for that.
Doug Barton
2014-01-17 00:34:41 UTC
Permalink
Post by Sam Varshavchik
Why would you want to do that?
All questions of this form are out of scope by definition. :) The thing
about tools is that the people who use them always put them to odd
purposes that the creator of the tools never intended.

Doug

Loading...