HBase基本操作03

接上一篇,HBase与传统数据库一个很大的不同之处:HBase可以保存多个版本的值,而不仅仅是保存最新的值。版本之间,通过timestamp属性来区分。

我们来具体看一下,首先新增一行visit100,我们多次设置列personinfo:name,每次设置后都查询一次最新的值:

hbase(main):011:0> put 'patientvisit','visit100','personinfo:name','A'
0 row(s) in 0.0890 seconds

hbase(main):012:0> get 'patientvisit','visit100','personinfo:name'
COLUMN                                        CELL                                                                                                                               
 personinfo:name                              timestamp=1454342758182, value=A                                                                                                   
1 row(s) in 0.0310 seconds

hbase(main):013:0> put 'patientvisit','visit100','personinfo:name','B'
0 row(s) in 0.0100 seconds

hbase(main):014:0> get 'patientvisit','visit100','personinfo:name'
COLUMN                                        CELL                                                                                                                               
 personinfo:name                              timestamp=1454342796166, value=B                                                                                                   
1 row(s) in 0.0230 seconds

hbase(main):015:0> put 'patientvisit','visit100','personinfo:name','B'
0 row(s) in 0.0140 seconds

hbase(main):016:0> get 'patientvisit','visit100','personinfo:name'
COLUMN                                        CELL                                                                                                                               
 personinfo:name                              timestamp=1454342802829, value=B                                                                                                   
1 row(s) in 0.0120 seconds

我们可以看到,每次设置该列的数值timestamp都有相应的变化:

timestamp value
1454342758182 A
1454342796166 B
1454342802829 B

查询的时候,默认返回最新一次的值,当然我们也可以查询对应版本的数据:
注意:如果没有该timestamp,则不返回数据

hbase(main):004:0> get 'patientvisit','visit100',COLUMN=>'personinfo:name',TIMESTAMP=>1454342758182
COLUMN                                        CELL                                                                                                                               
 personinfo:name                              timestamp=1454342758182, value=A                                                                                                   
1 row(s) in 0.0110 seconds

hbase(main):005:0> get 'patientvisit','visit100',COLUMN=>'personinfo:name',TIMESTAMP=>1454342758183
COLUMN                                        CELL                                                                                                                               
0 row(s) in 0.0130 seconds

查询的时候,我们也可以通过TIMERANGE来查询该时间段内的最新值:

hbase(main):002:0> get 'patientvisit','visit100',COLUMN=>'personinfo:name',TIMERANGE=>[1454342758182,1454342796166]
COLUMN                                        CELL                                                                                                                               
 personinfo:name                              timestamp=1454342758182, value=A                                                                                                   
1 row(s) in 0.0200 seconds

hbase(main):003:0> get 'patientvisit','visit100',COLUMN=>'personinfo:name',TIMERANGE=>[1454342758182,1454342796167]
COLUMN                                        CELL                                                                                                                               
 personinfo:name                              timestamp=1454342796166, value=B                                                                                                   
1 row(s) in 0.0240 seconds

然后再说一下版本问题,为了说明这个问题,我们要先修改一下表的SCHEMA:

hbase(main):013:0> describe 'patientvisit'
Table patientvisit is ENABLED                                                                                                                                
patientvisit                                                                                                                                                 
COLUMN FAMILIES DESCRIPTION                                                                                                                                  
{NAME => 'personinfo', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS =>
 '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                     
{NAME => 'personinfoex', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS 
=> '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                   
{NAME => 'visitinfo', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => 
'0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                      
{NAME => 'visitinfoex', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS =
> '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                    
4 row(s) in 0.0510 seconds

hbase(main):020:0> alter 'patientvisit',{NAME=>'personinfoex',VERSIONS=>5}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 3.2850 seconds

然后,对于personinfoex:money进行重复赋值

hbase(main):022:0> put 'patientvisit','visit100','personinfoex:money','100'
0 row(s) in 0.2510 seconds

hbase(main):023:0> put 'patientvisit','visit100','personinfoex:money','1000'
0 row(s) in 0.0140 seconds

hbase(main):024:0> put 'patientvisit','visit100','personinfoex:money','10000'
0 row(s) in 0.0090 seconds

hbase(main):025:0> put 'patientvisit','visit100','personinfoex:money','100000'
0 row(s) in 0.0110 seconds

hbase(main):026:0> put 'patientvisit','visit100','personinfoex:money','1000000'
0 row(s) in 0.0110 seconds

hbase(main):027:0> put 'patientvisit','visit100','personinfoex:money','10000000'
0 row(s) in 0.0160 seconds

然后,get的时候,传入VERSIONS参数:

hbase(main):029:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money'
COLUMN                                   CELL                                                                                                                
 personinfoex:money                      timestamp=1454466774726, value=10000000                                                                             
1 row(s) in 0.0690 seconds

hbase(main):031:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money',VERSIONS=>1
COLUMN                                   CELL                                                                                                                
 personinfoex:money                      timestamp=1454466774726, value=10000000                                                                             
1 row(s) in 0.0160 seconds

hbase(main):032:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money',VERSIONS=>2
COLUMN                                   CELL                                                                                                                
 personinfoex:money                      timestamp=1454466774726, value=10000000                                                                             
 personinfoex:money                      timestamp=1454466769620, value=1000000                                                                              
2 row(s) in 0.0100 seconds

hbase(main):030:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money',VERSIONS=>3
COLUMN                                   CELL                                                                                                                
 personinfoex:money                      timestamp=1454466774726, value=10000000                                                                             
 personinfoex:money                      timestamp=1454466769620, value=1000000                                                                              
 personinfoex:money                      timestamp=1454466766192, value=100000                                                                               
3 row(s) in 0.0270 seconds

hbase(main):033:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money',VERSIONS=>4
COLUMN                                   CELL                                                                                                                
 personinfoex:money                      timestamp=1454466774726, value=10000000                                                                             
 personinfoex:money                      timestamp=1454466769620, value=1000000                                                                              
 personinfoex:money                      timestamp=1454466766192, value=100000                                                                               
 personinfoex:money                      timestamp=1454466763838, value=10000                                                                                
4 row(s) in 0.0410 seconds

恩,那再总结一下,HBase就是一个有时间戳和版本管理的三层KV数据库。

Comments are closed.