Differences

This shows you the differences between two versions of the page.

--- doc:appunti:linux:sa:sanitizer [2023/01/19 10:16] – created niccolo
+++ doc:appunti:linux:sa:sanitizer [2023/01/19 12:11] (current) – [Perl Unescaped left brace warning] niccolo
@@ Line 6: / Line 6: @@
 I use it as a personal mail filter in GNU/Linux mail servers, because it can be activated on a per-user basis, by the **Local Delivery Agent** called by **Postfix**. The LDA can be as simple as **procmail** or the more complex **Dovecot LDA with Pigeonhole Sieve Interpreter**.
+===== Perl unescaped left brace warning =====
+The Sanitizer version included in Debian Bullseye contains a deprecated syntax into the Perl code, which triggers the warning message:
+<code>
+Unescaped left brace in regex is passed through in regex;
+</code>
+It turned out to be into the file **/usr/share/perl5/Anomy/Sanitizer/MacroScanner.pm**, at lines 120 and 127. Here the fix:
+<code perl>
+$score +=  4 while ($buff =~ s/\000(ID="\{[-0-9A-F]+)$/x$1/i);
+</code>
+<code perl>
+$score +=  1 while ($buff =~ s/\000(ID="\{[-0-9A-F]+\}"|ThisWorkbook\000|PrivateProfileString)/x$1/i);
+</code>
+===== The HTML MIME multipart problem =====
+Several mail user agents nowaday compose email messages in HTML format, sometimes without including a text-only copy of the same message. Some agents include the HTML as a part of multipart [[wp>MIME]] message, correctly marked as text/html. Other agents compose the message body directly in HTML, without using the MIME multipart system.
+In some circumstances Sanitizer defang the HTML message or the HTML part (changing its content type); thus a modern email reader does not display it correctly. In the best case an **anonymous attachment** is shown, in the worst case **an empty message** is shown.
+The Anomy Sanitizer uses several methods to detect the HTML parts into a message, relaying on the **Content-Type: text/html** or the **filename** of the MIME part (if specified). Once it detects an HTML part, it performs some operations on it, one of them is the match with a **regular expression** to confirm that it is actually an HTML text. If that regex test fails, the Sanitizer neutralizes (defang) such part changing its content type from **text/html** to something like **application/DEFANGED-14789** (the type name is composed using the **msg_defanged** configuration option).
+That behaviour is triggered by the **feat_files = 1** configuration option (enable filename-based policy decisions).
+Unfortunately the regex used by Sanitizer to detect an HTML part is very naive: it simply must contain this expression:
+<file>
+<html|<body|<p>|<b>|<i>|<br>|</a>
+</file>
+Notably the **Gmail** application nowaday (Jan 2023) composes the mail messages using only a **%%<div>%%** tag, thus fooling Sanitizer into //defanging// that part.
+I fixed the Perl code into **/usr/share/perl5/Anomy/Sanitizer/FileTypes.pm**, changing the regular expression in this way:
+<code perl>
+my $HTML = {
+    id         => "html",
+    risk       => $low,
+    name       => "HTML text file",
+    extensions => [ "html", "htm", "shtml" ],
+    mime_types => [ 'text/html' ],
+    magic      => [ ],
+    regexp     => '<html|<body|<div|<span|<p>|<b>|<i>|<br>|</a>',
+};
+</code>
+It is also possibile to remove the ''regexp'' element of the dictionary, in this case Sanitizer will recognize an HTML part only by the content type or the filename.
+The customized perl module can be installed into **/etc/perl/Anomy/Sanitizer/FileTypes.pm**, without changing the file installed by the Debian package.