Prepare for a benchmark dataset
Load a benchmark, say, MovieLens 1M Dataset.
This is a table benchmark dataset that contains 1 million ratings from 6000 users on 4000 movies.
[1]:
import rsdiv as rs
loader = rs.MovieLens1MDownLoader()
Get the user-item interactions (ratings):
[2]:
ratings = loader.read_ratings()
[3]:
ratings
[3]:
| userId | movieId | rating | timestamp | |
|---|---|---|---|---|
| 0 | 1 | 1193 | 5 | 2000-12-31 22:12:40 |
| 1 | 1 | 661 | 3 | 2000-12-31 22:35:09 |
| 2 | 1 | 914 | 3 | 2000-12-31 22:32:48 |
| 3 | 1 | 3408 | 4 | 2000-12-31 22:04:35 |
| 4 | 1 | 2355 | 5 | 2001-01-06 23:38:11 |
| ... | ... | ... | ... | ... |
| 1000204 | 6040 | 1091 | 1 | 2000-04-26 02:35:41 |
| 1000205 | 6040 | 1094 | 5 | 2000-04-25 23:21:27 |
| 1000206 | 6040 | 562 | 5 | 2000-04-25 23:19:06 |
| 1000207 | 6040 | 1096 | 4 | 2000-04-26 02:20:48 |
| 1000208 | 6040 | 1097 | 4 | 2000-04-26 02:19:29 |
1000209 rows × 4 columns
Get the users’ information:
[4]:
users = loader.read_users()
[5]:
users
[5]:
| userId | gender | age | occupation | zipcode | |
|---|---|---|---|---|---|
| 0 | 1 | F | 1 | 10 | 48067 |
| 1 | 2 | M | 56 | 16 | 70072 |
| 2 | 3 | M | 25 | 15 | 55117 |
| 3 | 4 | M | 45 | 7 | 02460 |
| 4 | 5 | M | 25 | 20 | 55455 |
| ... | ... | ... | ... | ... | ... |
| 6035 | 6036 | F | 25 | 15 | 32603 |
| 6036 | 6037 | F | 45 | 1 | 76006 |
| 6037 | 6038 | F | 56 | 1 | 14706 |
| 6038 | 6039 | F | 45 | 0 | 01060 |
| 6039 | 6040 | M | 25 | 6 | 11106 |
6040 rows × 5 columns
Get the items’ information:
[6]:
items = loader.read_items()
[7]:
items
[7]:
| itemId | title | genres | release_date | |
|---|---|---|---|---|
| 0 | 1 | Toy Story | [Animation, Children's, Comedy] | 1995 |
| 1 | 2 | Jumanji | [Adventure, Children's, Fantasy] | 1995 |
| 2 | 3 | Grumpier Old Men | [Comedy, Romance] | 1995 |
| 3 | 4 | Waiting to Exhale | [Comedy, Drama] | 1995 |
| 4 | 5 | Father of the Bride Part II | [Comedy] | 1995 |
| ... | ... | ... | ... | ... |
| 3878 | 3948 | Meet the Parents | [Comedy] | 2000 |
| 3879 | 3949 | Requiem for a Dream | [Drama] | 2000 |
| 3880 | 3950 | Tigerland | [Drama] | 2000 |
| 3881 | 3951 | Two Family House | [Drama] | 2000 |
| 3882 | 3952 | Contender | [Drama, Thriller] | 2000 |
3883 rows × 4 columns