c++.chat - interesting spam trap
- roland (2/2) Jun 02 2003 http://www.unclebobsuncle.com/antispam.html
- Greg Peet (4/6) Jun 02 2003 Wow thanks for bringing that to our attention. I can't wait to put that ...
- Jan Knepper (63/65) Jun 02 2003 Interesting indeed, but it does not work. Besides most of the
- KarL (3/14) Jun 02 2003 And are you run sendmail or qmail or postfix?
- Jan Knepper (3/17) Jun 03 2003 Definitely not sendmail...
- Walter (4/13) Jun 02 2003 I use a javascript generated mailto: on the digitalmars web pages. Are t...
- Jan Knepper (3/16) Jun 03 2003 Yes! My crawler will pick those up with out ANY problem.
- Walter (4/8) Jun 03 2003 the
- Jan Knepper (2/10) Jun 03 2003 No, I can provide you with that, if you want...
- roland (5/89) Jun 03 2003 hello
- roland (24/108) Jun 04 2003 hi
- Jan Knepper (70/85) Jun 05 2003 500 websites (pages) in a webring would take a decent crawler no more th...
- roland (6/112) Jun 05 2003 ok
- Jan Knepper (5/116) Jun 05 2003 ;-)
- roland (3/7) Jun 06 2003 oops 8-(
- Greg Peet (4/6) Jun 06 2003 a) Logic, b) Didn't Nostradamus say something about
- roland (6/16) Jun 06 2003 lets talk something else .. spam are not so bad after all
- Scott Dale Robison (21/31) Jun 07 2003 I agree with 99.99% of what you wrote, this being the one part I
- gf (4/27) Jun 07 2003 You sure fooled me! :)))))
- Jan Knepper (10/40) Jun 07 2003 I know... I have experienced that as well.
- Scott Dale Robison (17/23) Jun 07 2003 I've never heard complaints from a user of SpamCop, to be fair. Only a
- Jan Knepper (13/29) Jun 09 2003 Oh, I have seen those complaints MANY times. People that actually opted-...
- Scott Dale Robison (4/6) Jun 09 2003 I think I was running Xmail at the time, though I'm not 100% certain.
http://www.unclebobsuncle.com/antispam.html roland :-)
Jun 02 2003
Wow thanks for bringing that to our attention. I can't wait to put that on my site. Quite funny too. "roland" <--nancyetroland free.fr> wrote in message news:bbgc75$2339$1 digitaldaemon.com...http://www.unclebobsuncle.com/antispam.html roland :-)
Jun 02 2003
Interesting indeed, but it does not work. Besides most of the statements on the page have no ground. First of all, any decent spider or crawler would keep track of URL's it has processed. I mean think about it, every decent website probable has circular references in the form of x.html -> y.html -> z.html -> x.html. I know for a fact that quite a few of my sites have many of these. Obviously this is something anyone developing a spider or crawler, which I have done ;-), will run into. So the idea is cute, but I don't think it really works. Second, quite a bit of the page is generated through JavaScript. Many spiders or crawlers do NOT run JavaScript. I know for a fact that JavaScript is a serious challenge for many of the search engines on the internet. Third, some, more advanced spiders or crawlers do not just look at mailto: tags, but recorgnize a ' ' and check the prefix and suffix. Run the complete string through an email syntax checker, to make sure the address only contains legal email address characters and such and actually ends with an existing Top Level Domain (TLD) such as .com, .net. .com, etc and later match check the domain through DNS and/or Whois. Fourth, the invalid email addresses have no effect on spammers. They will burn some more bandwidth, but as they usually use non-existent From: and Return-Path: in their messages anyone, but not the spammer will receive the bounces. Fifth, if the spammer would actually have some form of decency and bulk mail to a list and honor a removal mechanism the mechanism usually is intelligent enough to keep track of bounces, probe them and next remove them from the list automagically. Check here for instance http://www.ezmlm.org/ which works with MySQL http://www.mysql.com/ through which it is rather easy to maintain a database with millions of email addresses. To actually *fight* SPAM what would make sence is report SPAM ASAP at http://www.spamcop.net/ as that results into more than just reporting. One of the great features is that once a lot people start reporting a certain SPAM spamcop will at the originating IP address to bl.spamcop.net which can be used by email receiving servers (SMTP servers) to block incoming email if it comes from one of the many blocked IP addresses. Unfortunately, most people just seem to delete SPAM and most email providers do not seem to use bl.spamcop.net for email blocking. Of course, not publishing you email address ANYWHERE on the internet would help the most! ;-) However, I have noticed that quite a few company's that collect email addresses with online sales or other forms of subscription also sell those email addresses to others... Just my 2 cents... Oh, in case there is any doubt... ;-) I have written a couple of crawlers and actually also crawlers that do handle JavaScripts very well. I have been hosting Internet services for 3 years. I do report almost all spam at http://www.spamcop.net/ and yes, the mail servers here do check bl.spamcop.net (and a few others) before they actually receive the email, well that is if the domain owners want it. Check http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml for some statistics on SPAM blocking... Recently I patched the SMTP server again so it does block all non-existent email adresses on local domains. roland wrote:http://www.unclebobsuncle.com/antispam.html roland :-)-- ManiaC++ Jan Knepper
Jun 02 2003
And are you run sendmail or qmail or postfix? "Jan Knepper" <jan smartsoft.us> wrote in message news:3EDBFCDB.9D953019 smartsoft.us...Just my 2 cents... Oh, in case there is any doubt... ;-) I have written a couple of crawlers and actually also crawlers that do handle JavaScripts very well. I have been hosting Internet services for 3 years. I do report almost all spam at http://www.spamcop.net/ and yes, the mail servers here do check bl.spamcop.net (and a few others) before they actually receive the email, well that is if the domain owners want it. Check http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml for some statistics on SPAM blocking... Recently I patched the SMTP server again so it does block all non-existent email adresses on local domains.
Jun 02 2003
Definitely not sendmail... Patched qmail... KarL wrote:And are you run sendmail or qmail or postfix? "Jan Knepper" <jan smartsoft.us> wrote in message news:3EDBFCDB.9D953019 smartsoft.us...Just my 2 cents... Oh, in case there is any doubt... ;-) I have written a couple of crawlers and actually also crawlers that do handle JavaScripts very well. I have been hosting Internet services for 3 years. I do report almost all spam at http://www.spamcop.net/ and yes, the mail servers here do check bl.spamcop.net (and a few others) before they actually receive the email, well that is if the domain owners want it. Check http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml for some statistics on SPAM blocking... Recently I patched the SMTP server again so it does block all non-existent email adresses on local domains.
Jun 03 2003
"Jan Knepper" <jan smartsoft.us> wrote in message news:3EDBFCDB.9D953019 smartsoft.us...Just my 2 cents... Oh, in case there is any doubt... ;-) I have written a couple of crawlers and actually also crawlers that do handle JavaScripts very well. I have been hosting Internet services for 3 years. I do report almost all spam at http://www.spamcop.net/ and yes, the mail servers here do check bl.spamcop.net (and a few others) before they actually receive the email, well that is if the domain owners want it. Check http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml for some statistics on SPAM blocking...I use a javascript generated mailto: on the digitalmars web pages. Are the javascript aware scrapers able to figure those out?
Jun 02 2003
Walter wrote:"Jan Knepper" <jan smartsoft.us> wrote in message news:3EDBFCDB.9D953019 smartsoft.us...Yes! My crawler will pick those up with out ANY problem. JanJust my 2 cents... Oh, in case there is any doubt... ;-) I have written a couple of crawlers and actually also crawlers that do handle JavaScripts very well. I have been hosting Internet services for 3 years. I do report almost all spam at http://www.spamcop.net/ and yes, the mail servers here do check bl.spamcop.net (and a few others) before they actually receive the email, well that is if the domain owners want it. Check http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml for some statistics on SPAM blocking...I use a javascript generated mailto: on the digitalmars web pages. Are the javascript aware scrapers able to figure those out?
Jun 03 2003
"Jan Knepper" <jan smartsoft.us> wrote in message news:3EDC8F17.9616A80A smartsoft.us...Walter wrote:theI use a javascript generated mailto: on the digitalmars web pages. AreDoes that mean I have to write a cgi program to do it? <g>javascript aware scrapers able to figure those out?Yes! My crawler will pick those up with out ANY problem.
Jun 03 2003
Walter wrote:"Jan Knepper" <jan smartsoft.us> wrote in message news:3EDC8F17.9616A80A smartsoft.us...No, I can provide you with that, if you want...Walter wrote:theI use a javascript generated mailto: on the digitalmars web pages. AreDoes that mean I have to write a cgi program to do it? <g>javascript aware scrapers able to figure those out?Yes! My crawler will pick those up with out ANY problem.
Jun 03 2003
hello thanks for the interesting information cheers roland Jan Knepper wrote:Interesting indeed, but it does not work. Besides most of the statements on the page have no ground. First of all, any decent spider or crawler would keep track of URL's it has processed. I mean think about it, every decent website probable has circular references in the form of x.html -> y.html -> z.html -> x.html. I know for a fact that quite a few of my sites have many of these. Obviously this is something anyone developing a spider or crawler, which I have done ;-), will run into. So the idea is cute, but I don't think it really works. Second, quite a bit of the page is generated through JavaScript. Many spiders or crawlers do NOT run JavaScript. I know for a fact that JavaScript is a serious challenge for many of the search engines on the internet. Third, some, more advanced spiders or crawlers do not just look at mailto: tags, but recorgnize a ' ' and check the prefix and suffix. Run the complete string through an email syntax checker, to make sure the address only contains legal email address characters and such and actually ends with an existing Top Level Domain (TLD) such as .com, .net. .com, etc and later match check the domain through DNS and/or Whois. Fourth, the invalid email addresses have no effect on spammers. They will burn some more bandwidth, but as they usually use non-existent From: and Return-Path: in their messages anyone, but not the spammer will receive the bounces. Fifth, if the spammer would actually have some form of decency and bulk mail to a list and honor a removal mechanism the mechanism usually is intelligent enough to keep track of bounces, probe them and next remove them from the list automagically. Check here for instance http://www.ezmlm.org/ which works with MySQL http://www.mysql.com/ through which it is rather easy to maintain a database with millions of email addresses. To actually *fight* SPAM what would make sence is report SPAM ASAP at http://www.spamcop.net/ as that results into more than just reporting. One of the great features is that once a lot people start reporting a certain SPAM spamcop will at the originating IP address to bl.spamcop.net which can be used by email receiving servers (SMTP servers) to block incoming email if it comes from one of the many blocked IP addresses. Unfortunately, most people just seem to delete SPAM and most email providers do not seem to use bl.spamcop.net for email blocking. Of course, not publishing you email address ANYWHERE on the internet would help the most! ;-) However, I have noticed that quite a few company's that collect email addresses with online sales or other forms of subscription also sell those email addresses to others... Just my 2 cents... Oh, in case there is any doubt... ;-) I have written a couple of crawlers and actually also crawlers that do handle JavaScripts very well. I have been hosting Internet services for 3 years. I do report almost all spam at http://www.spamcop.net/ and yes, the mail servers here do check bl.spamcop.net (and a few others) before they actually receive the email, well that is if the domain owners want it. Check http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml for some statistics on SPAM blocking... Recently I patched the SMTP server again so it does block all non-existent email adresses on local domains. roland wrote:http://www.unclebobsuncle.com/antispam.html roland :-)-- ManiaC++ Jan Knepper
Jun 03 2003
Jan Knepper wrote:Interesting indeed, but it does not work. Besides most of the statements on the page have no ground. First of all, any decent spider or crawler would keep track of URL's it has processed. I mean think about it, every decent website probable has circular references in the form of x.html -> y.html -> z.html -> x.html. I know for a fact that quite a few of my sites have many of these. Obviously this is something anyone developing a spider or crawler, which I have done ;-), will run into. So the idea is cute, but I don't think it really works. Second, quite a bit of the page is generated through JavaScript. Many spiders or crawlers do NOT run JavaScript. I know for a fact that JavaScript is a serious challenge for many of the search engines on the internet. Third, some, more advanced spiders or crawlers do not just look at mailto: tags, but recorgnize a ' ' and check the prefix and suffix. Run the complete string through an email syntax checker, to make sure the address only contains legal email address characters and such and actually ends with an existing Top Level Domain (TLD) such as .com, .net. .com, etc and later match check the domain through DNS and/or Whois. Fourth, the invalid email addresses have no effect on spammers. They will burn some more bandwidth, but as they usually use non-existent From: and Return-Path: in their messages anyone, but not the spammer will receive the bounces. Fifth, if the spammer would actually have some form of decency and bulk mail to a list and honor a removal mechanism the mechanism usually is intelligent enough to keep track of bounces, probe them and next remove them from the list automagically. Check here for instance http://www.ezmlm.org/ which works with MySQL http://www.mysql.com/ through which it is rather easy to maintain a database with millions of email addresses. To actually *fight* SPAM what would make sence is report SPAM ASAP at http://www.spamcop.net/ as that results into more than just reporting. One of the great features is that once a lot people start reporting a certain SPAM spamcop will at the originating IP address to bl.spamcop.net which can be used by email receiving servers (SMTP servers) to block incoming email if it comes from one of the many blocked IP addresses. Unfortunately, most people just seem to delete SPAM and most email providers do not seem to use bl.spamcop.net for email blocking. Of course, not publishing you email address ANYWHERE on the internet would help the most! ;-) However, I have noticed that quite a few company's that collect email addresses with online sales or other forms of subscription also sell those email addresses to others... Just my 2 cents... Oh, in case there is any doubt... ;-) I have written a couple of crawlers and actually also crawlers that do handle JavaScripts very well. I have been hosting Internet services for 3 years. I do report almost all spam at http://www.spamcop.net/ and yes, the mail servers here do check bl.spamcop.net (and a few others) before they actually receive the email, well that is if the domain owners want it. Check http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml for some statistics on SPAM blocking... Recently I patched the SMTP server again so it does block all non-existent email adresses on local domains. roland wrote:hi jan: an opinion on that ? << yep, thats the reason why i suggested a webring of spamtraps would do better and the addresses be generated from a wide list of word combination. just imagine the how many combination could be done with this set of data rule: [ a | a+b | a+b+c | a+c | ... | b+a ] + + [ domain ].[level] where: a, b ,c ..: this, that, free, sun, ram, bot, mail, fish, stick, 33, big, flower domain : big, stick, homer, biz, temp, duch, pleht level : com, biz, net, org, mil the list could be customized per each website. i dont see how the crawler could take all those words into consideration. they can remove the invalid mails when it bounce but i think the one we are discussing right now will guarantee that they will have an adequate supply for a very long time. imagine a webring of 500 sites linking one another. ciao! _________________ You have read a post from a newbie. Take everything with a grain of salt. The user formerly known as ramfree17 (oh,im still ramfree17 ?!?!)http://www.unclebobsuncle.com/antispam.html roland :-)-- ManiaC++ Jan Knepper
Jun 04 2003
roland wrote:yep, thats the reason why i suggested a webring of spamtraps would do better and the addresses be generated from a wide list of word combination. just imagine the how many combination could be done with this set of data rule: [ a | a+b | a+b+c | a+c | ... | b+a ] + + [ domain ].[level] where: a, b ,c ..: this, that, free, sun, ram, bot, mail, fish, stick, 33, big, flower domain : big, stick, homer, biz, temp, duch, pleht level : com, biz, net, org, mil the list could be customized per each website. i dont see how the crawler could take all those words into consideration. they can remove the invalid mails when it bounce but i think the one we are discussing right now will guarantee that they will have an adequate supply for a very long time. imagine a webring of 500 sites linking one another.500 websites (pages) in a webring would take a decent crawler no more than 2 hours to process. Believe me, they are NOT using DSL or Cable!!! Serial processing of 500 web pages 10 seconds per page (boy is that long!) is 5000 seconds, that's not more than 2 hours! Than they match what ever they found against a local DNS server with enough cache. Try: If you have a Unix/BSD/Linux box one line somewhere: ; <<>> DiG 8.3 <<>> mx free.fr ;; res options: init recurs defnam dnsrch ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 2, ADDITIONAL: 12 ;; QUERY SECTION: ;; free.fr, type = MX, class = IN ;; ANSWER SECTION: free.fr. 1D IN MX 10 mx.free.fr. free.fr. 1D IN MX 20 mrelay2-1.free.fr. free.fr. 1D IN MX 20 mrelay2-2.free.fr. free.fr. 1D IN MX 20 mx1-1.free.fr. free.fr. 1D IN MX 50 mrelay3-2.free.fr. free.fr. 1D IN MX 50 mrelay4-2.free.fr. free.fr. 1D IN MX 50 mrelay1-1.free.fr. free.fr. 1D IN MX 50 mrelay1-2.free.fr. free.fr. 1D IN MX 60 mrelay3-1.free.fr. free.fr. 1D IN MX 90 ns1.proxad.net. ;; AUTHORITY SECTION: free.fr. 1D IN NS ns0.proxad.net. free.fr. 1D IN NS ns1.proxad.net. ;; ADDITIONAL SECTION: mx.free.fr. 15M IN A 213.228.0.1 mx.free.fr. 15M IN A 213.228.0.129 mx.free.fr. 15M IN A 213.228.0.13 mx.free.fr. 15M IN A 213.228.0.131 mx.free.fr. 15M IN A 213.228.0.166 mx.free.fr. 15M IN A 213.228.0.175 mx.free.fr. 15M IN A 213.228.0.65 mrelay2-1.free.fr. 1D IN A 213.228.0.13 mrelay2-2.free.fr. 1D IN A 213.228.0.131 mx1-1.free.fr. 1D IN A 213.228.0.65 mrelay3-2.free.fr. 1D IN A 213.228.0.166 mrelay4-2.free.fr. 1D IN A 213.228.0.175 ;; Total query time: 127 msec ;; FROM: digitaldaemon.com to SERVER: default -- 63.105.9.35 ;; WHEN: Thu Jun 5 10:04:24 2003 ;; MSG SIZE sent: 25 rcvd: 502 This is done with the 'dig' progam total query time is 127 msec's!!!! Now they know whether or not the found domain actually has a MX record, or not... If not, they can just drop the address from the list. Also crawlers do *not* browse the web like we do. They just process 'text' oriented files and run several (read hundreds or thousands) of threads/processes at the same time. So, the only thing you would be able to actually make a difference with is using existing domain names. Not a good idea as owners of those domains might have a catch all and than receive the same SPAM over and over again. Soon, the providers will all change their systems so their SMTP servers only accept email to addresses that actually do exist and *deny* receipt of anything else with the usual 550 error. So, in the end, what are we actually creating with stuff like this??? Nothing, we just have crawlers/spiders consume more bandwidth to read all the pages. The crawlers' DNS matcher consume more bandwidth to check for DNS. The bulk mailer consume more bandwidth to send all the email. The internet consume more bandwidth to deal with all the bounces, double bounces, etc. Last, 500 pages with each a 1,000 email addresses is 500,000 email addresses. I hate to tell you, but that's only 1.5% of the total email addresses I have... <sigh> Would you honestly think that anyone would process the bounces for numbers like that manually??? ManiaC++ Jan Knepper
Jun 05 2003
Jan Knepper wrote:roland wrote:ok i'm afraid i'm consuming _your_ bandwidth .. ;-) a last question: what happen a) to the crawlers, b) to the internet, if 100000 sites have 10000 (=10e9) e-mail addresse ? rolandyep, thats the reason why i suggested a webring of spamtraps would do better and the addresses be generated from a wide list of word combination. just imagine the how many combination could be done with this set of data rule: [ a | a+b | a+b+c | a+c | ... | b+a ] + + [ domain ].[level] where: a, b ,c ..: this, that, free, sun, ram, bot, mail, fish, stick, 33, big, flower domain : big, stick, homer, biz, temp, duch, pleht level : com, biz, net, org, mil the list could be customized per each website. i dont see how the crawler could take all those words into consideration. they can remove the invalid mails when it bounce but i think the one we are discussing right now will guarantee that they will have an adequate supply for a very long time. imagine a webring of 500 sites linking one another.500 websites (pages) in a webring would take a decent crawler no more than 2 hours to process. Believe me, they are NOT using DSL or Cable!!! Serial processing of 500 web pages 10 seconds per page (boy is that long!) is 5000 seconds, that's not more than 2 hours! Than they match what ever they found against a local DNS server with enough cache. Try: If you have a Unix/BSD/Linux box one line somewhere: ; <<>> DiG 8.3 <<>> mx free.fr ;; res options: init recurs defnam dnsrch ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 2, ADDITIONAL: 12 ;; QUERY SECTION: ;; free.fr, type = MX, class = IN ;; ANSWER SECTION: free.fr. 1D IN MX 10 mx.free.fr. free.fr. 1D IN MX 20 mrelay2-1.free.fr. free.fr. 1D IN MX 20 mrelay2-2.free.fr. free.fr. 1D IN MX 20 mx1-1.free.fr. free.fr. 1D IN MX 50 mrelay3-2.free.fr. free.fr. 1D IN MX 50 mrelay4-2.free.fr. free.fr. 1D IN MX 50 mrelay1-1.free.fr. free.fr. 1D IN MX 50 mrelay1-2.free.fr. free.fr. 1D IN MX 60 mrelay3-1.free.fr. free.fr. 1D IN MX 90 ns1.proxad.net. ;; AUTHORITY SECTION: free.fr. 1D IN NS ns0.proxad.net. free.fr. 1D IN NS ns1.proxad.net. ;; ADDITIONAL SECTION: mx.free.fr. 15M IN A 213.228.0.1 mx.free.fr. 15M IN A 213.228.0.129 mx.free.fr. 15M IN A 213.228.0.13 mx.free.fr. 15M IN A 213.228.0.131 mx.free.fr. 15M IN A 213.228.0.166 mx.free.fr. 15M IN A 213.228.0.175 mx.free.fr. 15M IN A 213.228.0.65 mrelay2-1.free.fr. 1D IN A 213.228.0.13 mrelay2-2.free.fr. 1D IN A 213.228.0.131 mx1-1.free.fr. 1D IN A 213.228.0.65 mrelay3-2.free.fr. 1D IN A 213.228.0.166 mrelay4-2.free.fr. 1D IN A 213.228.0.175 ;; Total query time: 127 msec ;; FROM: digitaldaemon.com to SERVER: default -- 63.105.9.35 ;; WHEN: Thu Jun 5 10:04:24 2003 ;; MSG SIZE sent: 25 rcvd: 502 This is done with the 'dig' progam total query time is 127 msec's!!!! Now they know whether or not the found domain actually has a MX record, or not... If not, they can just drop the address from the list. Also crawlers do *not* browse the web like we do. They just process 'text' oriented files and run several (read hundreds or thousands) of threads/processes at the same time. So, the only thing you would be able to actually make a difference with is using existing domain names. Not a good idea as owners of those domains might have a catch all and than receive the same SPAM over and over again. Soon, the providers will all change their systems so their SMTP servers only accept email to addresses that actually do exist and *deny* receipt of anything else with the usual 550 error. So, in the end, what are we actually creating with stuff like this??? Nothing, we just have crawlers/spiders consume more bandwidth to read all the pages. The crawlers' DNS matcher consume more bandwidth to check for DNS. The bulk mailer consume more bandwidth to send all the email. The internet consume more bandwidth to deal with all the bounces, double bounces, etc. Last, 500 pages with each a 1,000 email addresses is 500,000 email addresses. I hate to tell you, but that's only 1.5% of the total email addresses I have... <sigh> Would you honestly think that anyone would process the bounces for numbers like that manually??? ManiaC++ Jan Knepper
Jun 05 2003
roland wrote:Jan Knepper wrote:Don't worry.roland wrote:ok i'm afraid i'm consuming _your_ bandwidth .. ;-)yep, thats the reason why i suggested a webring of spamtraps would do better and the addresses be generated from a wide list of word combination. just imagine the how many combination could be done with this set of data rule: [ a | a+b | a+b+c | a+c | ... | b+a ] + + [ domain ].[level] where: a, b ,c ..: this, that, free, sun, ram, bot, mail, fish, stick, 33, big, flower domain : big, stick, homer, biz, temp, duch, pleht level : com, biz, net, org, mil the list could be customized per each website. i dont see how the crawler could take all those words into consideration. they can remove the invalid mails when it bounce but i think the one we are discussing right now will guarantee that they will have an adequate supply for a very long time. imagine a webring of 500 sites linking one another.500 websites (pages) in a webring would take a decent crawler no more than 2 hours to process. Believe me, they are NOT using DSL or Cable!!! Serial processing of 500 web pages 10 seconds per page (boy is that long!) is 5000 seconds, that's not more than 2 hours! Than they match what ever they found against a local DNS server with enough cache. Try: If you have a Unix/BSD/Linux box one line somewhere: ; <<>> DiG 8.3 <<>> mx free.fr ;; res options: init recurs defnam dnsrch ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 2, ADDITIONAL: 12 ;; QUERY SECTION: ;; free.fr, type = MX, class = IN ;; ANSWER SECTION: free.fr. 1D IN MX 10 mx.free.fr. free.fr. 1D IN MX 20 mrelay2-1.free.fr. free.fr. 1D IN MX 20 mrelay2-2.free.fr. free.fr. 1D IN MX 20 mx1-1.free.fr. free.fr. 1D IN MX 50 mrelay3-2.free.fr. free.fr. 1D IN MX 50 mrelay4-2.free.fr. free.fr. 1D IN MX 50 mrelay1-1.free.fr. free.fr. 1D IN MX 50 mrelay1-2.free.fr. free.fr. 1D IN MX 60 mrelay3-1.free.fr. free.fr. 1D IN MX 90 ns1.proxad.net. ;; AUTHORITY SECTION: free.fr. 1D IN NS ns0.proxad.net. free.fr. 1D IN NS ns1.proxad.net. ;; ADDITIONAL SECTION: mx.free.fr. 15M IN A 213.228.0.1 mx.free.fr. 15M IN A 213.228.0.129 mx.free.fr. 15M IN A 213.228.0.13 mx.free.fr. 15M IN A 213.228.0.131 mx.free.fr. 15M IN A 213.228.0.166 mx.free.fr. 15M IN A 213.228.0.175 mx.free.fr. 15M IN A 213.228.0.65 mrelay2-1.free.fr. 1D IN A 213.228.0.13 mrelay2-2.free.fr. 1D IN A 213.228.0.131 mx1-1.free.fr. 1D IN A 213.228.0.65 mrelay3-2.free.fr. 1D IN A 213.228.0.166 mrelay4-2.free.fr. 1D IN A 213.228.0.175 ;; Total query time: 127 msec ;; FROM: digitaldaemon.com to SERVER: default -- 63.105.9.35 ;; WHEN: Thu Jun 5 10:04:24 2003 ;; MSG SIZE sent: 25 rcvd: 502 This is done with the 'dig' progam total query time is 127 msec's!!!! Now they know whether or not the found domain actually has a MX record, or not... If not, they can just drop the address from the list. Also crawlers do *not* browse the web like we do. They just process 'text' oriented files and run several (read hundreds or thousands) of threads/processes at the same time. So, the only thing you would be able to actually make a difference with is using existing domain names. Not a good idea as owners of those domains might have a catch all and than receive the same SPAM over and over again. Soon, the providers will all change their systems so their SMTP servers only accept email to addresses that actually do exist and *deny* receipt of anything else with the usual 550 error. So, in the end, what are we actually creating with stuff like this??? Nothing, we just have crawlers/spiders consume more bandwidth to read all the pages. The crawlers' DNS matcher consume more bandwidth to check for DNS. The bulk mailer consume more bandwidth to send all the email. The internet consume more bandwidth to deal with all the bounces, double bounces, etc. Last, 500 pages with each a 1,000 email addresses is 500,000 email addresses. I hate to tell you, but that's only 1.5% of the total email addresses I have... <sigh> Would you honestly think that anyone would process the bounces for numbers like that manually??? ManiaC++ Jan Kneppera last question: what happen a) to the crawlers, b) to the internet, if 100000 sites have 10000 (=10e9) e-mail addresse ?;-) Internet Meltdown... Jan
Jun 05 2003
Jan Knepper wrote:Internet Meltdown... Janoops 8-( roland
Jun 06 2003
"roland" wrote:a last question: what happen a) to the crawlers, b) to the internet, if 100000 sites have 10000 (=10e9) e-mail addresse ?a) Logic, b) Didn't Nostradamus say something about this...hmm...armageddon...bill gates...something around those lines i think =P
Jun 06 2003
Greg Peet wrote:"roland" wrote:lets talk something else .. spam are not so bad after all you can buy a master degree without studying, improve sexual satisfaction, earn thousan of cash without working ... ;-) by rolanda last question: what happen a) to the crawlers, b) to the internet, if 100000 sites have 10000 (=10e9) e-mail addresse ?a) Logic, b) Didn't Nostradamus say something about this...hmm...armageddon...bill gates...something around those lines i think =P
Jun 06 2003
Jan Knepper wrote:To actually *fight* SPAM what would make sence is report SPAM ASAP at http://www.spamcop.net/ as that results into more than just reporting. One of the great features is that once a lot people start reporting a certain SPAM spamcop will at the originating IP address to bl.spamcop.net which can be used by email receiving servers (SMTP servers) to block incoming email if it comes from one of the many blocked IP addresses. Unfortunately, most people just seem to delete SPAM and most email providers do not seem to use bl.spamcop.net for email blocking.I agree with 99.99% of what you wrote, this being the one part I (partially) disagree with. Sure, SpamCop (and other similar services) can prove valuable, but they have some serious potential downfalls. The single biggest one, IMO, is that many spam-blocking services don't care about the source of an email. If it is reported as spam, they have no obligation to confirm it. I personally know of cases where actual *documentation* of a persons opt-in was completely and utterly ignored. The person in question didn't bother trying to opt-out (note: after having opt'ed-in), they just reported the 'spam' to SpamCop and the 'offending' mail server was black-holed. Note: I realize this is just my word against their's, and I don't expect anyone to just assume I'm right. I'm just sharing a personal experience and it's worth exactly what you're paying for it. I guess the point I'm trying to make is, if you want to use SpamCop or any other similar service, feel free. Just realize that these entities are no more regulated than the spammers they claim to want to stop, and sometimes an agenda may slip through. After all, their value is in blocking email. So what if sometimes legitimate email gets blocked? No, I'm not a spammer. Just a person with opinions. :) Scott Dale Robison
Jun 07 2003
Scott Dale Robison <scott-news.digitalmars.com isdr.net> wrote in news:bbsff6$1nvl$1 digitaldaemon.com:I agree with 99.99% of what you wrote, this being the one part I (partially) disagree with. Sure, SpamCop (and other similar services) can prove valuable, but they have some serious potential downfalls. The single biggest one, IMO, is that many spam-blocking services don't care about the source of an email. If it is reported as spam, they have no obligation to confirm it. I personally know of cases where actual *documentation* of a persons opt-in was completely and utterly ignored. The person in question didn't bother trying to opt-out (note: after having opt'ed-in), they just reported the 'spam' to SpamCop and the 'offending' mail server was black-holed. Note: I realize this is just my word against their's, and I don't expect anyone to just assume I'm right. I'm just sharing a personal experience and it's worth exactly what you're paying for it. I guess the point I'm trying to make is, if you want to use SpamCop or any other similar service, feel free. Just realize that these entities are no more regulated than the spammers they claim to want to stop, and sometimes an agenda may slip through. After all, their value is in blocking email. So what if sometimes legitimate email gets blocked? No, I'm not a spammer. Just a person with opinions. :) Scott Dale RobisonYou sure fooled me! :))))) /gf
Jun 07 2003
Scott Dale Robison wrote:Jan Knepper wrote:I know... I have experienced that as well. That is indeed one of the unfortunate sides of spamcop.netTo actually *fight* SPAM what would make sence is report SPAM ASAP at http://www.spamcop.net/ as that results into more than just reporting. One of the great features is that once a lot people start reporting a certain SPAM spamcop will at the originating IP address to bl.spamcop.net which can be used by email receiving servers (SMTP servers) to block incoming email if it comes from one of the many blocked IP addresses. Unfortunately, most people just seem to delete SPAM and most email providers do not seem to use bl.spamcop.net for email blocking.I agree with 99.99% of what you wrote, this being the one part I (partially) disagree with. Sure, SpamCop (and other similar services) can prove valuable, but they have some serious potential downfalls. The single biggest one, IMO, is that many spam-blocking services don't care about the source of an email. If it is reported as spam, they have no obligation to confirm it. I personally know of cases where actual *documentation* of a persons opt-in was completely and utterly ignored. The person in question didn't bother trying to opt-out (note: after having opt'ed-in), they just reported the 'spam' to SpamCop and the 'offending' mail server was black-holed. Note: I realize this is just my word against their's, and I don't expect anyone to just assume I'm right. I'm just sharing a personal experience and it's worth exactly what you're paying for it.I guess the point I'm trying to make is, if you want to use SpamCop or any other similar service, feel free. Just realize that these entities are no more regulated than the spammers they claim to want to stop, and sometimes an agenda may slip through. After all, their value is in blocking email. So what if sometimes legitimate email gets blocked?I *only* use spamcop for those email that are *SPAM*. Legitimate email never got blocked as spamcop only begins blocking after a certain treshold has been reached, at least I have never heard complaints about it...No, I'm not a spammer. Just a person with opinions. :)I agree. I just stated that spamcop provides a service, not that it is perfect ;-) Jan
Jun 07 2003
Jan Knepper wrote:I *only* use spamcop for those email that are *SPAM*. Legitimate email never got blocked as spamcop only begins blocking after a certain treshold has been reached, at least I have never heard complaints about it...I've never heard complaints from a user of SpamCop, to be fair. Only a person who had a newsletter stopped going to an entire domain because of SpamCop. Yet another note: I will admit that it is *possible* that the email in question was technically spam ... unfortunately, we can never know as no one would even *try* to opt out or followup the original opt in. In any case, I'm convinced that *if* the offender was guilty, it was unintentional. Also, to be fair, I was once guilty of unintentionally running an open relay, but the 'good samaritan' that caught me at it was nice enough to remove me from their open relay database once I closed it. I do recognize that most of these people are good guys ... I'm just concerned when so many people on the net don't think through the potential problems of taking someone elses word on what is or is not a spamming IP.I agree. I just stated that spamcop provides a service, not that it is perfect ;-)Fair enough. Apologies if I was offensive in any way. :) </soapbox> SDR
Jun 07 2003
Scott Dale Robison wrote:I've never heard complaints from a user of SpamCop, to be fair. Only a person who had a newsletter stopped going to an entire domain because of SpamCop. Yet another note: I will admit that it is *possible* that the email in question was technically spam ... unfortunately, we can never know as no one would even *try* to opt out or followup the original opt in. In any case, I'm convinced that *if* the offender was guilty, it was unintentional.Oh, I have seen those complaints MANY times. People that actually opted-in themselves and than in time get sick of SPAM, find spamcop and start reporting everything that comes into their mailbox not remembering wether or not they subscribed for it or not. Spamcop is very aware of this as well.Also, to be fair, I was once guilty of unintentionally running an open relay, but the 'good samaritan' that caught me at it was nice enough to remove me from their open relay database once I closed it. I do recognize that most of these people are good guys ... I'm just concerned when so many people on the net don't think through the potential problems of taking someone elses word on what is or is not a spamming IP.The internet professionals are usually very tolerant and helpful, at least, that's my experience. What did you use? sendmail??? Well, that's exactly the problem with the Internet at this moment. It's like trying to drive your car on the hiway with people around you that do not have a license... <sigh>Nag! ManiaC++ Jan KnepperI agree. I just stated that spamcop provides a service, not that it is perfect ;-)Fair enough. Apologies if I was offensive in any way. :)
Jun 09 2003
Jan Knepper wrote:The internet professionals are usually very tolerant and helpful, at least, that's my experience. What did you use? sendmail???I think I was running Xmail at the time, though I'm not 100% certain. It's been a long time... SDR
Jun 09 2003