Prepare for a benchmark dataset

Load a benchmark, say, MovieLens 1M Dataset.

This is a table benchmark dataset that contains 1 million ratings from 6000 users on 4000 movies.

[1]:
import rsdiv as rs

loader = rs.MovieLens1MDownLoader()

Get the user-item interactions (ratings):

[2]:
ratings = loader.read_ratings()
[3]:
ratings
[3]:
userId movieId rating timestamp
0 1 1193 5 2000-12-31 22:12:40
1 1 661 3 2000-12-31 22:35:09
2 1 914 3 2000-12-31 22:32:48
3 1 3408 4 2000-12-31 22:04:35
4 1 2355 5 2001-01-06 23:38:11
... ... ... ... ...
1000204 6040 1091 1 2000-04-26 02:35:41
1000205 6040 1094 5 2000-04-25 23:21:27
1000206 6040 562 5 2000-04-25 23:19:06
1000207 6040 1096 4 2000-04-26 02:20:48
1000208 6040 1097 4 2000-04-26 02:19:29

1000209 rows × 4 columns

Get the users’ information:

[4]:
users = loader.read_users()
[5]:
users
[5]:
userId gender age occupation zipcode
0 1 F 1 10 48067
1 2 M 56 16 70072
2 3 M 25 15 55117
3 4 M 45 7 02460
4 5 M 25 20 55455
... ... ... ... ... ...
6035 6036 F 25 15 32603
6036 6037 F 45 1 76006
6037 6038 F 56 1 14706
6038 6039 F 45 0 01060
6039 6040 M 25 6 11106

6040 rows × 5 columns

Get the items’ information:

[6]:
items = loader.read_items()
[7]:
items
[7]:
itemId title genres release_date
0 1 Toy Story [Animation, Children's, Comedy] 1995
1 2 Jumanji [Adventure, Children's, Fantasy] 1995
2 3 Grumpier Old Men [Comedy, Romance] 1995
3 4 Waiting to Exhale [Comedy, Drama] 1995
4 5 Father of the Bride Part II [Comedy] 1995
... ... ... ... ...
3878 3948 Meet the Parents [Comedy] 2000
3879 3949 Requiem for a Dream [Drama] 2000
3880 3950 Tigerland [Drama] 2000
3881 3951 Two Family House [Drama] 2000
3882 3952 Contender [Drama, Thriller] 2000

3883 rows × 4 columns