Errors in Text index with FlexTokenizer

I'm using the distributed HPE Vertica 7.2.1 Virtual Machine.

For demo purposes I'm playing with text search feature.

 

I built a flex table "actors" loading the following json file:

---- actors.json --------

{"firstName":"Stefano", "lastName":"Accorsi", "age": 44}
{"firstName":"George", "lastName":"Clooney", "age": 55}
{"firstName":"Robert", "lastName":"Redford", "age": 79}
{"firstName":"Jennifer", "lastName":"Lawrence", "age": 26}
{"firstName":"Nicole", "lastName":"Kidman", "age": 49}
{"firstName":"Cate", "lastName":"Blanchett", "age": 48}
{"firstName":"Marion", "lastName":"Cotillard", "age": 41}

----------

create flex table actors();
copy actors from 'actors.json' parser fjsonparser();

select compute_flextable_keys('actors');
select build_flextable_view('actors');
update actors_keys SET data_type_guess = 'integer' where key_name = 'age';
commit;
select build_flextable_view('actors');

 

In the table (and view ) data are correct, a select returns:

 

select * from actors_view;
 age | firstname | lastname 
-----+-----------+-----------
  44 | Stefano   | Accorsi
  55 | George    | Clooney
  79 | Robert    | Redford
  26 | Jennifer  | Lawrence
  49 | Nicole    | Kidman
  48 | Cate      | Blanchett
  41 | Marion    | Cotillard
(7 rows)

 

When I create a text index on this flex table with the tokenizer FlexTokenizer some tokens are truncated or modified (see tokens in red):

 

ALTER TABLE actors ADD PRIMARY KEY (__identity__);
CREATE TEXT INDEX actors_index ON actors(__identity__, __raw__) TOKENIZER public.FlexTokenizer(long varbinary);

SELECT * FROM actors_index;
   token   | doc_id
-----------+--------
 26        |      4
 41        |      7
 44        |      1
 48        |      6
 49        |      5
 55        |      2
 79        |      3
 accorsi   |      1
 blanchett |      6
 cate      |      6
 cloonei   |      2
 cotillard |      7
 georg     |      2
 jennif    |      4
 kidman    |      5
 lawrenc   |      4
 marion    |      7
 nicol     |      5
 redford   |      3
 robert    |      3
 stefano   |      1
(21 rows)

 

Have I found a bug or I was wrong in some steps ?

 

Thanks,

 

Chiara

Comments

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file