Prepare for a benchmark dataset

Load a benchmark, say, MovieLens 1M Dataset.

This is a table benchmark dataset that contains 1 million ratings from 6000 users on 4000 movies.

[1]:

import rsdiv as rs

loader = rs.MovieLens1MDownLoader()

Get the user-item interactions (ratings):

[2]:

ratings = loader.read_ratings()

[3]:

ratings

[3]:

	userId	movieId	rating	timestamp
0	1	1193	5	2000-12-31 22:12:40
1	1	661	3	2000-12-31 22:35:09
2	1	914	3	2000-12-31 22:32:48
3	1	3408	4	2000-12-31 22:04:35
4	1	2355	5	2001-01-06 23:38:11
...	...	...	...	...
1000204	6040	1091	1	2000-04-26 02:35:41
1000205	6040	1094	5	2000-04-25 23:21:27
1000206	6040	562	5	2000-04-25 23:19:06
1000207	6040	1096	4	2000-04-26 02:20:48
1000208	6040	1097	4	2000-04-26 02:19:29

1000209 rows × 4 columns

Get the users’ information:

[4]:

users = loader.read_users()

[5]:

users

[5]:

	userId	gender	age	occupation	zipcode
0	1	F	1	10	48067
1	2	M	56	16	70072
2	3	M	25	15	55117
3	4	M	45	7	02460
4	5	M	25	20	55455
...	...	...	...	...	...
6035	6036	F	25	15	32603
6036	6037	F	45	1	76006
6037	6038	F	56	1	14706
6038	6039	F	45	0	01060
6039	6040	M	25	6	11106

6040 rows × 5 columns

Get the items’ information:

[6]:

items = loader.read_items()

[7]:

items

[7]:

	itemId	title	genres	release_date
0	1	Toy Story	[Animation, Children's, Comedy]	1995
1	2	Jumanji	[Adventure, Children's, Fantasy]	1995
2	3	Grumpier Old Men	[Comedy, Romance]	1995
3	4	Waiting to Exhale	[Comedy, Drama]	1995
4	5	Father of the Bride Part II	[Comedy]	1995
...	...	...	...	...
3878	3948	Meet the Parents	[Comedy]	2000
3879	3949	Requiem for a Dream	[Drama]	2000
3880	3950	Tigerland	[Drama]	2000
3881	3951	Two Family House	[Drama]	2000
3882	3952	Contender	[Drama, Thriller]	2000

3883 rows × 4 columns