Hash

Guava的Hash包提供了：

更灵活的hash函数
BloomFilter算法实现

Hash

使用Guava提供的哈希函数，需要如下3步，实际上采用Fluent的方式，可以将代码合并为一行。

创建HashFunction，可以选择各种算法 HashFunction hashFunction = Hashing.md5();
将原始的值put到hashFunction HashCode hashCode = hashFunction.newHasher().putXxx();
取得hashcode hashCode.hashCode()

如果需要生产HashCode的是一个对象，需要明确对象的哪些属性需要被hash。这需要实现一个Funnel，Funnel的本意是漏斗，对象的Funnel就像一个榨汁机一样，一个苹果放进去，果汁就出来了。

示例：

public class HashTest {

    class Person {
        int id;
        String firstName;
        String lastName;
        int birthYear;
        public Person(int birthYear, String firstName, int id, String lastName) {
            this.birthYear = birthYear;
            this.firstName = firstName;
            this.id = id;
            this.lastName = lastName;
        }
    }

    private Funnel<Person> personFunnel;

    @Before
    public void before() {
        personFunnel = new Funnel<Person>() {
            @Override
            public void funnel(Person from, PrimitiveSink into) {
                into.putString(from.firstName, Charsets.UTF_8);
                into.putString(from.lastName, Charsets.UTF_8);
                into.putInt(from.id);
                into.putInt(from.birthYear);
            }
        };
    }

    @Test
    public void test1() {
        System.out.println("a".hashCode());
        StringBuffer sb = new StringBuffer("a");
        System.out.println(sb.hashCode());
    }

    @Test
    public void test2() {
        HashFunction hashFunction = Hashing.md5();
        HashCode hashCode = hashFunction.newHasher().putLong(1).putString("zhangsan", Charset.forName("utf-8")).putObject(new Person(1983, "zhang", 1, "san"), personFunnel).hash();
        System.out.println(hashCode.hashCode());
        System.out.println(hashCode.toString());//cff805d850adcf9e936d76019502153a
        System.out.println(hashCode.asInt());
        System.out.println(Integer.toHexString(-670697265));//d805f8cf  刚好取前4个字节，从后向前
    }
}

输出：

-670697265
cff805d850adcf9e936d76019502153a
-670697265
d805f8cf

BloomFilter

BloomFilter算法作用是：快速的判断一条数据是否在目前已有的海量数据中。特点：

用极少的空间作为代价，换取时间

允许有一定错误的概率

原理参考：http://www.cnblogs.com/heaad/archive/2011/01/02/1924195.html

/**
 * 演示BloomFilter
 */
@Test
public void test3() {
    //第二个参数的意思是：这个filter中预期要存入多少个对象，这个值一定要往大里估，因为实际存储对象个数超过这个值
    //错误率会迅速升高
    BloomFilter<Person> personBloomFilter = BloomFilter.create(personFunnel, 10000000, 0.00001);
    personBloomFilter.put(new Person(24, "zhang", 22, "lisi"));
    personBloomFilter.put(new Person(25, "li", 21, "a"));
    personBloomFilter.put(new Person(21, "dd", 22, "lisi"));
    personBloomFilter.put(new Person(23, "fdsa", 21, "fdse"));
    personBloomFilter.put(new Person(26, "s", 22, "ac"));
    personBloomFilter.put(new Person(24, "yy", 12, "oi"));
    if (personBloomFilter.mightContain(new Person(25, "li", 21, "a"))) {
        System.out.println("contains");
    }
    if (personBloomFilter.mightContain(new Person(25, "li", 20, "a"))) {
        System.out.println("contains too");
    } else {
        System.out.println("not contain");
    }
}

输出：

contains
not contain

Guava的BloomFilter实现已经非常简单了，只要指定目标数据的数量级，错误率。Guava会自动选取合适个数的hash函数。

Hash

Hash

Hash

BloomFilter

results matching ""

No results matching ""