Anonymizing the logs

7 posts / 0 new

Last post

March 30, 2006 - 3:34pm

Tom Owad

Offline

Last seen: 3 days 7 hours ago

Joined: Dec 16 2003 - 15:14

Posts: 3396

Anonymizing the logs

Hi,
I'm a grad student working with David Patterson at the RAD Lab at the CS
department at UC Berkeley. Our lab is interested in building more reliable
internet services. You can read more about our lab here:

* our wiki page: http://radlab.cs.berkeley.edu
* slashdot article: http://slashdot.org/articles/05/12/15/1426223.shtml

To do interesting research, it's extremely useful to have traces/logs from
real-world internet sites that we can use in our experiments. For example,
here's a paper where we used data from Ebates.com to evaluate new algorithms
for detecting and localizing failures in Internet services:
http://www.cs.berkeley.edu/~bodikp/publications/icac05.pdf

I'm wondering whether you have any logs from the Applefritter website that
you could make available to our research group. For example, web server
access logs are very useful for understanding the traffic to a site.

Of course, we could help you to anonymize the data in case it contains any
confidential information.

If you cannot provide such data, could you recommend other people or
websites that could make their logs available or simply forward this email
to them?

Thank you!
Peter

I'd like to help them out, but obviously need to ensure first that we can anonymize the logs. So far, we're looking at replacing IP addresses with a hash, as well as User IDs in /user accesses and message IDs in /privatemessage. Anything else you'd like to see anonymized? BDub and I will be taking a closer look at the logs, for other potentially identifying information, but we'd also like to get a few more opinions.

March 30, 2006 - 4:21pm

moosemanmoo

Offline

Last seen: 11 years 3 months ago

Joined: Aug 17 2004 - 15:24

Posts: 686

I don't know what kind of log

I don't know what kind of logs you're talking about, but the kind of logs that my Apache server gives don't have a whole lot of information in them. What I think would be better than hashing the IPs is to resolve them to a hostname, then just omit everything but the last part of the hostname. For example, 24.14.2.91 might resolve to 24-14-2-91.hr.hr.cox.net, but you would only put cox.com. That is a fair amount of work for the nameservers, though, and it's probably not feasable.

March 30, 2006 - 4:34pm

(Reply to #2) #3

Tom Owad

Offline

Last seen: 3 days 7 hours ago

Joined: Dec 16 2003 - 15:14

Posts: 3396

Yes - Apache logs. Could you

Yes - Apache logs. Could you explain your reservations to hashing the IPs?

March 30, 2006 - 4:47pm

(Reply to #3) #4

moosemanmoo

Offline

Last seen: 11 years 3 months ago

Joined: Aug 17 2004 - 15:24

Posts: 686

MD5 hashes can be easily crac

MD5 hashes can be easily cracked-- with the proper tools, someone could brute-force an MD5 hash in about a minute. That, and it would probably make formatting the data a little harder. Replacing the IPs with some repeated letter like x would work better from a security standpoint.

March 30, 2006 - 5:20pm

(Reply to #4) #5

Dr. Webster

Offline

Last seen: 13 hours 47 min ago

Joined: Dec 19 2003 - 17:34

Posts: 1769

I wouldn't give them the IPs

I wouldn't give them the IPs in any form, but I don't think there would be a problem with giving them the domains. You could resolve them to comcast.net, cox.net, etc., or just truncate the IPs to xxx.xxx.---.--- (truncating the IP at the subnet level).

March 30, 2006 - 5:33pm

BDub

Offline

Last seen: 3 years 12 months ago

Joined: Dec 20 2003 - 10:38

Posts: 703

If a salted hash is insufficient

I'm against xxx.xxx'ing out the IP's because they likely want to trace how users navigate around a website.

Also possible, we could give each IP that shows up it's own unique identifier (first IP in logs is replaced with "1", second with "2") but have IPs that are the same use the same identifier. This would allow them to trace a user around the site, while not having any clues to the users identity.

March 30, 2006 - 11:17pm

(Reply to #6) #7

BDub

Offline

Last seen: 3 years 12 months ago

Joined: Dec 20 2003 - 10:38

Posts: 703

Re: MD5 hashes can be easily crac

MD5 hashes can be easily cracked-- with the proper tools, someone could brute-force an MD5 hash in about a minute. That, and it would probably make formatting the data a little harder. Replacing the IPs with some repeated letter like x would work better from a security standpoint.

First off, we'd obviously salt the hash. That's not even a question. Brute forcing becomes quite a bit harder given that situation. Could you quote a source for the 'in about a minute' comment?

We could also use a hash scheme other than MD5.

Make or buy? Old or new? Advice needed! jp1971
AppleWin: mouse emulation? Harry Potter
Help with ProTerm 3.1 Anonymoose
Extracted Apple II wav sounds mikeryan
Is there a way to test the IWM on the Apple IIc+ Motherboard? tiktok4321

Anonymizing the logs

Applefritter Talk

Anonymous

Active forum topics

Recent content

Navigation

Search form

Anonymizing the logs

Applefritter Talk

Anonymous

User login

Active forum topics

Recent content

Navigation