ICANN authorized a study conducted by the National Physical Laboratory (NPL) in the United Kingdom, to analyze gTLD domain names to measure whether the percentage of privacy/proxy use among domains engaged in illegal or harmful Internet activities is significantly greater than among domain names used for lawful Internet activities.
Furthermore, this study compares these privacy/proxy percentages to other methods used to obscure identity – notably, Whois phone numbers that are invalid.
These findings will help the community understand the role that privacy and proxy service abuse plays in obscuring the identities of parties engaged in illegal or harmful activities, including phishing, cybersquatting, hosting child abuse sexual images, advanced fee fraud, online sale of counterfeit pharmaceuticals, and more.
NPL will consider all comments submitted to this Public Comment forum during the comment period, incorporate any needed clarifications, and then publish a final version of this Whois Privacy and Proxy Service Abuse study report. It is expected that this report will inform future GNSO policy development in relation to the Whois system.
Here are the conclusions of the study:
The people who maliciously registered domains for phishing chose privacy and proxy services somewhat more than people who registered domains for legitimate purposes. However, when a privacy or proxy service was not chosen for a malicious registration a workable contact phone number was seldom given – and even if the number was apparently valid, we almost never managed to make contact with the registrant for our survey.
Conversely, even entirely legitimate ‘third party’ businesses that provide services to the law- abiding public – and occasionally for malicious purposes – use privacy and proxy services to a certain extent, and for almost half of the domains these businesses use there is no possibility of using the phone to reach the domain registrant. Of course there are many other ways of making contact with such businesses, and they would doubtless want people to use the information about contact pathways on their websites, rather than consulting Whois
The compromised website category falls between the two extremes – these domain registrants use privacy and proxy services a quarter of the time (a higher proportion than the NORC study measured). Nearly two thirds of these registrants are impossible to contact by phone, and we reached only a quarter of them for our survey.
19.2 Other categories of criminal or harmful activity
In WP2, we looked at domains registered for advance fee fraud and other scams using data collated by the aa419.org project and found a similar result to the maliciously registered phishing domains in WP1 in that 88.9% of domain registrants were not contactable by phone, albeit 46.5% of them chose to privacy or proxy services to achieve this.
In WP3 we examined the domains used for unlicensed pharmacies, finding that 91.8% of the domain registrants were not contactable by phone with 54.8% of them choosing to use privacy or proxy services.
In WP5 we looked at the Whois for domains used for websites containing child sexual abuse images – 29.5% of these use privacy or proxy services and it is widely believed that where contact phone numbers are given for the registrant all of this information is false. That is 100% of these domain registrants cannot be contacted by phone.
19.3 Lawful and harmless activity
We also looked, within WP6 at the domains used for a number of different types of lawful and harmless activity. We found quite large variations in the usage of privacy or proxy services with legal pharmacies (documented on the LegitScript website) at 8.8% and websites listed in the Yahoo! directory as hosting adult material at 44.2% – the later percentage being somewhat higher than several types of criminal activity.
However, the WP6 domain registrants were, at least to some extent, contactable by phone. Our success rate was highest for law firms (WP6.3) at 33.4% and banks (WP6.1) at 29.0%, but many calls were unanswered, went to voicemail, or we talked to colleagues of the registrant without them being able to assist us in our survey. If all of these call attempts which neither totally failed nor totally succeeded had worked out for us then our success rate would have doubled.
The lowest success rate in WP6 was in making calls to the registrations of domains used for adult websites (WP6.5) where only 5.7% of registrants were reached and 55.1% of the domain registrants were impossible to reach by phone.
The data from the other work packages is a little harder to interpret. When we look at the results from WP7 (domains listed by SURBL to assist in spam blocking) and WP8 (domains listed by StopBadware which contain various varieties of malware) we find that the WP7 domains have a high usage of privacy and proxy services (44.1%) but WP8 domains use these services less often (20.4%) than the compromised websites from WP1.
Conversely, WP8 domain registrants can be reached by phone 32.1% of the time whereas the figure for WP7 is 1.0%. However, when we look at the “impossible to reach by phone” measure both WP7 and WP8 have similar figures (58.5% and 51.4%) suggesting that we’re seeing similar levels of criminality – both lists are a mixture of maliciously registered domains and legitimately registered domains where a website has been compromised and used to host malicious content.
Significant caution is called for in reading too much into the WP7 data since there are some very high error bounds associated with the WP7 figures. The WP7 data contains a number of groups of domains with the same contact phone number – there are 19 groups of more than 100 domains, and the largest grouping contains 947 domains. These groupings mean that how a handful of registrants respond can substantially affect the results of our survey – and the error bounds reflect this uncertainty.
We suspect that there are some “report inflation” effects occurring in the SURBL data (as we discussed in the detailed account of processing the WP1 data) and in order to best protect the people who use their data they have identified all the domains that could be used to mount an attack rather than just the one that that is currently in use.
Unfortunately, because the datasets we received for WP7 and WP8 only contained domain names and not full URLs, it was not possible to remove the excess domain names we believe are present in the WP7 data. This is also the reason why we were not able to split these lists to distinguish between maliciously registered domains and legitimate domains. If we had been able to do this, then we would expect to see the sort of differences in the results that we saw in WP1.
Typosquatting – mixed results
We conclude our review of the work package results by considering WP4 – the typosquatting work package and WP9, the domains involved in UDRP disputes. Almost every dispute in WP9 concerned the type of activity that the WP4 domains are engaged in – with the exception of a handful of cases where brand owners were trying to wrest control of domains away from firms where there was once a close commercial relationship.
As we have noted, typosquatting is a civil matter not a criminal matter, so it might be expected that domain registrants were not quite so cautious about revealing their identity; and conversely that it mattered less anyway – the UDRP process also works with domains that use privacy and proxy services. However, the incentive here for the domain registrant to obscure their identity appears to be the preventing of a brand owner from discerning that a single action could deal with a large number of domains – viz: it’s not exactly anonymity that the registrants seek but unlinkability.
The figures here show that privacy and proxy services are used rather more than average (WP4: 48.2%, WP9: 39.7%) but that where domain registrants did provide contact details then in WP7 (we made no phone calls in WP9) we reached the domain registrant for 10.6% of the domains – distinctly more often than the 1%–2% that we measured for domains associated with criminal activities.
However, once again (as in WP7) the data for WP4 has very wide error ranges – many of the domains have the same contact details. Indeed, the original academic paper by Moore and Edelmann found that 63% of typosquatting domains displaying Google ads used just five advert IDs, that is only a handful of people are responsible for a great deal of this activity.
A final note of caution applies to all of the data we have presented – we have just been looking at domains within biz, com, info, net and org, and for many work packages there are substantial amounts of activity that use other TLDs. We suspect that our results are widely applicable but we have not demonstrated this.
To summarise the whole project and to return at the end to our original hypotheses – we DID find clear evidence that:
A significant percentage of the domain names used to conduct illegal or harmful Internet activities are registered via privacy or proxy services to obscure the perpetrator’s identity”.
But, although we did find that it was often true, we DID NOT find that in all cases:
The percentage of domain names used to conduct illegal or harmful Internet activities that are registered via privacy or proxy services is significantly greater than the percentage of domain names used for lawful Internet activities that employ privacy or proxy services.”
Additionally, we learnt (sic) that these statements ARE correct:
“When domain names are registered with the intent of conducting illegal or harmful Internet activities then a range of different methods are used to avoid providing viable contact information – with a consistent outcome no matter which method is used.
However, although many more domains registered for entirely lawful Internet activities have viable telephone contact information recorded within the Whois system, a great percentage of them do not.”