User Tools

Site Tools


doc:appunti:linux:sa:sanitizer

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
doc:appunti:linux:sa:sanitizer [2023/01/19 11:03] – [The HTML MIME multipart problem] niccolodoc:appunti:linux:sa:sanitizer [2023/01/19 12:11] (current) – [Perl Unescaped left brace warning] niccolo
Line 7: Line 7:
 I use it as a personal mail filter in GNU/Linux mail servers, because it can be activated on a per-user basis, by the **Local Delivery Agent** called by **Postfix**. The LDA can be as simple as **procmail** or the more complex **Dovecot LDA with Pigeonhole Sieve Interpreter**. I use it as a personal mail filter in GNU/Linux mail servers, because it can be activated on a per-user basis, by the **Local Delivery Agent** called by **Postfix**. The LDA can be as simple as **procmail** or the more complex **Dovecot LDA with Pigeonhole Sieve Interpreter**.
  
-===== Perl Syntax Warning =====+===== Perl unescaped left brace warning =====
  
-The version included in Debian Bullseye contains a bug into the Perl code, which triggers the warning message:+The Sanitizer version included in Debian Bullseye contains a deprecated syntax into the Perl code, which triggers the warning message:
  
 <code> <code>
 +Unescaped left brace in regex is passed through in regex;
 +</code>
  
 +It turned out to be into the file **/usr/share/perl5/Anomy/Sanitizer/MacroScanner.pm**, at lines 120 and 127. Here the fix:
 +
 +<code perl>
 +$score +=  4 while ($buff =~ s/\000(ID="\{[-0-9A-F]+)$/x$1/i);
 </code> </code>
 +
 +<code perl>
 +$score +=  1 while ($buff =~ s/\000(ID="\{[-0-9A-F]+\}"|ThisWorkbook\000|PrivateProfileString)/x$1/i);
 +</code>
 +
  
 ===== The HTML MIME multipart problem ===== ===== The HTML MIME multipart problem =====
Line 19: Line 30:
 Several mail user agents nowaday compose email messages in HTML format, sometimes without including a text-only copy of the same message. Some agents include the HTML as a part of multipart [[wp>MIME]] message, correctly marked as text/html. Other agents compose the message body directly in HTML, without using the MIME multipart system. Several mail user agents nowaday compose email messages in HTML format, sometimes without including a text-only copy of the same message. Some agents include the HTML as a part of multipart [[wp>MIME]] message, correctly marked as text/html. Other agents compose the message body directly in HTML, without using the MIME multipart system.
  
-The Anomy Sanitizer uses several methods to detect the HTML parts into a message, relaying on the **Content-Type: text/html** or the **filename** of the MIME part (if specified). Once it detects an HTML part, it performs some operations on it, one of them is the match with a **regular expression** to confirm that it is actually an HTML text. If that regex test fails, the Sanitizer neutralizes (defang) such part changing its content type from **text/html** to something like **application/ANTIVIRUS- 14789** (the type name is composed using the **msg_defanged** configuration option).+In some circumstances Sanitizer defang the HTML message or the HTML part (changing its content type); thus a modern email reader does not display it correctly. In the best case an **anonymous attachment** is shown, in the worst case **an empty message** is shown. 
 + 
 +The Anomy Sanitizer uses several methods to detect the HTML parts into a message, relaying on the **Content-Type: text/html** or the **filename** of the MIME part (if specified). Once it detects an HTML part, it performs some operations on it, one of them is the match with a **regular expression** to confirm that it is actually an HTML text. If that regex test fails, the Sanitizer neutralizes (defang) such part changing its content type from **text/html** to something like **application/DEFANGED-14789** (the type name is composed using the **msg_defanged** configuration option).
  
 That behaviour is triggered by the **feat_files = 1** configuration option (enable filename-based policy decisions). That behaviour is triggered by the **feat_files = 1** configuration option (enable filename-based policy decisions).
Line 45: Line 58:
 </code> </code>
  
 +It is also possibile to remove the ''regexp'' element of the dictionary, in this case Sanitizer will recognize an HTML part only by the content type or the filename.
 +
 +The customized perl module can be installed into **/etc/perl/Anomy/Sanitizer/FileTypes.pm**, without changing the file installed by the Debian package.
  
doc/appunti/linux/sa/sanitizer.1674122629.txt.gz · Last modified: 2023/01/19 11:03 by niccolo